Open sourcing our Google Cloud Emulators

We just open sourced our implementations of a Google Bigtable emulator and a Google Cloud Storage (GCS) emulator: fullstorydev/emulators. Jump down below for more specific technical information, but first it’s worth asking: why did we invest in building emulators? Why should you buy into this?

Why Emulators?

I’ll define an emulator as a piece of software that runs on your local development and Continuous Integration (CI) stack and emulates the behavior of a cloud service. Here are just a few reasons, which I’ll break down in more detail:

Run a self-contained local development environment
Avoid needlessly abstracting the storage layer
Test the storage layer
Write fewer mocks
It’s easier than you think

Reason 1: Run a self-contained local development environment

At Fullstory, we’re extremely passionate about engineer productivity. We have very low tolerance for anything that causes frustration in the development process. From the earliest days, we have always worked to ensure that our entire product can run directly on engineers’ laptops. We consider this a vital part of engineering productivity. Most of our work, even tricky integration work, can be tested out locally before our code is merged or deployed even to staging. And we don’t need a network connection or managed cloud services configured per engineer.

Emulators fill a vital gap in our local stack. We write live code against online storage systems like Datastore, Bigtable, or GCS. That code needs to do something useful in local dev— ideally, that code would behave just like the real online storage systems, and even persist our local data onto disk and maintain state across builds, restarts, and reboots.

Reason 2: Avoid needlessly abstracting the storage layer

Sometimes it makes sense to abstract the storage layer in your live code. Sometimes it doesn’t. The abstraction might just add complexity and lines of code to what would otherwise be very simple. Either way, shouldn’t that decision hinge mostly on the live code? Abstracting a storage layer makes sense in many circumstances, but it doesn’t feel good to be forced into the decision by your test code.

Emulators let you run the most straightforward live code in an isolated environment. Service authors simply write code against the real storage service APIs, making emulators extremely high leverage.

Reason 3: Test the storage layer

Let’s say, for the sake of cleanliness, you went ahead and abstracted out your storage layer. Now you have a different problem. You can test your app code using mocks, but how do you test the storage layer itself?

By definition, your storage layer implementation needs to talk to a real online service— but you can’t easily connect to these services from a local development environment or CI system. The logic between your app and storage might be perfect, but if your storage layer has a bug, you’re unlikely to discover it prior to deployment.

Emulators let you test your storage layer against a service that’s as close to a real system as you can get. While an emulator can never be perfect, at least you can fix bugs and add new emulator features in a single place. With an open source emulator, we can do this for everyone.

Reason 4: Write fewer mocks

Remember how I just mentioned “mock out the storage layer” when unit testing app code? Mocks are a wonderful testing tool, but they come with several downsides:

Mocks can be wrong and give you a false sense of security.
Mock implementations are frequently copied around rather than reused.
Even in the best case, mocks create conceptual overhead and add lines of code.

Reason 5: It’s easier than you think

Many online storage systems have relatively simple interfaces and behavior. The facade of what services like Bigtable or Storage do isn’t all that complicated. The real value that these services provide usually boils down to:

Scale
Availability
Durability / Replication
Ease of use / management

If all you need from a data service is to run with reasonable uptime and durability on the local machine and store no more than a few MB or GB of data, the hard parts of the problem kind of fall away.

Google Bigtable Emulator

Our Google Bigtable Emulator is a fork of Google’s own bigtable/bttest, an excellent in-memory Bigtable emulator that works very well for unit tests. So why did we fork it? In a word: persistence.

Fullstory is an extremely data-driven product. It’s difficult to work on the product if you have no data to work with. The ability to enable on-disk persistence is an absolute must for our storage emulators.

So what did we do? We took the existing bttest implementation and abstracted out a storage layer interface. (A tad ironic if you've been reading with me up to now! 😆 ) Then we brought in the Go implementation of Level DB (syndtr/goleveldb) to serve as a backing store. This library is nice for several reasons:

It’s a lexically ordered key-value store like bigtable
It supports very efficient seeks and scans, necessary to support the bigtable API
It offers both in-memory and on-disk implementations out of the box

There was just one significant complication: the existing bttest code was written in terms of bespoke Go structs to represent tables, rows, columns, and cells. These structs could not be easily serialized, and even contained mutexes. We ended up replacing these structs by using the real Bigtable public proto objects, which natively serialize. We also had to rework the concurrency model entirely, since the old model relied on internal mutexes within objects stored in memory.

I would have liked to upstream our changes to Google rather than fork, but unfortunately the changes were significant enough that the Google engineers are understandably hesitant to integrate them into the mainline. Still, I was able to get several bug fixes committed to the upstream along the way.

Read up on how to get started and more technical details here.

Google Cloud Storage (GCS) Emulator

Our GCS emulator is probably one of the oldest emulators at Fullstory. It was written from scratch many years ago, has been improved by multiple engineers, and has been rewritten a couple of times in its life. With persistence enabled, the emulator stores files directly to disk within its data directory. Helpfully, the emulator stores files on-disk at predictable paths that match the virtual GCS file system.

So, for example, a file in bucket foo named some/path/tofile would exist on the local disk at <data_dir>/foo/some/path/tofile. This makes it super easy to inspect your GCS data locally using normal filesystem tools.

We haven’t implemented every advanced GCS feature, but we have a lot of them:

Metadata
Copy
Compose
Most conditionals

Read up on how to get started and more technical details here.

Go vs. Other Languages

We’re a Go shop, and the emulators are written in Go. It’s going to be easier to get up and running on Go, and the emulators can be run in-process (awesome for unit tests). However, you can still run the emulators as external services. (For example, we run redis and memcache out of process during our own Go unit tests, and you can easily do the same with our emulators.)

The important thing is to intercept the outbound requests: HTTP (for GCS) or GRPC (for bigtable). These need to be re-routed to the loopback address and port that the emulator is running on. For Go clients, we’ve already paved the way for you:

When writing Go unit tests, I suggest you use TestMain to set up and tear down the emulators you need on a package by package basis. Go tests run a package at a time. Setting up and tearing down emulators for every test method is probably wasted effort. You only need to ensure that the unit tests within a single package don’t interfere with each other’s expectations about the state of the underlying data. Having each test operate on its own data set is an easy way to accomplish this.

Testing our own Emulators

How do we know that our emulators behave correctly? Or, to put it another way— how can we verify that our emulator test cases have the right expectations?

We put a bit of extra effort to formulate all our test code in such a way that most of our test cases are runnable against the real service. Both the Bigtable and GCS emulators have a file named remote_test.go that allows us to run most of our test suite against real production instances Bigtable and GCS, respectively. We can be pretty confident that we wrote the test cases correctly.

Conclusion

I hope you find these tools as useful as we have. If you find any bugs or discover you need additional features, hit me up at fullstorydev/emulators with an issue or PR.

If you’re passionate about engineering productivity, or just want to work somewhere that is, we’re always looking for talented engineers. ❤️