At FullStory, our services produce hundreds of thousands of log lines a second. At the front of this pipeline is Filebeat, a tool we use to process and ship this data to our log storage and query service. The catch? Filebeat did not support our destination out of the box, so we were left with only one choice: build it ourselves. Today, we’re going to walk through how you, too, can write your own output plugin for filebeat using the Go programming language.
What is Filebeat?
Filebeat is an open source tool provided by the team at elastic.co and describes itself as a “lightweight shipper for logs”. Like other tools in the space, it essentially takes incoming data from a set of inputs and “ships” them to a single output. It supports a variety of these inputs and outputs, but generally it is a piece of the ELK (Elasticsearch, Logstash, Kibana) stack, tailing log files on your VMs and shipping them either straight into ElasticSearch or to a Logstash server for further processing.
Anatomy of an Output Plugin
The main thing you need to know when writing a custom plugin is that filebeat is really just a collection of Go packages built on libbeat, which itself is just the underlying set of shared libraries making up the beats open source project. Because of this, and how well organized the filebeat code actually is, it’s pretty easy to reverse-engineer how we might go about writing our own plugin.
For instance, we know from the documentation that filebeat supports an Elasticsearch output, and a quick grep of the code base reveals how that output is defined. Essentially, all of the bundled outputs are just plugins themselves. Using the Elasticsearch output plugin as an example, we can infer the initial skeleton for our own custom output:
init code is pretty straightforward. We register our custom output under the name
http and associate that with a factory method for creating that output. For the factory method, we take in some arguments and return the result of a function wrapping a slice of
outputs.NetworkClient objects. Of those arguments, the two we’re going to focus on are the
outputs.Observer and the
outputs.Observer tracks statistics about how our clients are fairing, which we will dive more into later. The
common.Config argument represents our output’s configuration which we can unpack into a custom struct. As for what we might want to make configurable, there’s a bit to unpack there.
Filebeat supports publishing log events to a number of clients concurrently, each receiving events in batches of a configurable size. For each log event, it will also retry publishing a certain number of times in the face of failure. Given that we are building an HTTP output, we also need a way to tell the client where it should post log event data to.
Each of these is made configurable by our struct below using tags that filebeat understands:
We can make use of this configuration like so:
As you can see above, we are building a slice of
outputs.NetworkClient clients to return from our new output method based on the number of
config.Workers. Doing this for N workers allows us to handle multiple batches of events in parallel, and filebeat will handle coordination and dispatching of log events to each of these workers internally. This means for each worker, we build an
httpClient and add it to the slice of clients.
Building the HTTP Client
Anybody who writes Go will immediately recognize that outputs.NetworkClient must be Is go duck-typed?, meaning our client must implement the following methods:
The first three methods are pretty easy to grasp.
String is required to identify your client by name and will be output in log messages and included in metrics (if you’re running the stats server).
Connect will be called before filebeat is about to publish its first batch of events to the client, while
Close will be called when filebeat is shutting down. We’re going to use these methods to manage our HTTP Client and can implement them like so:
Now that we’ve got our connection-handling setup correctly, that just leaves the
Publish method. This is where we will process batches of log events, handle errors, and initiate retries. This is also the place we’ll want to track metrics about what our plugin is doing, and we’ll use the
outputs.Observer to do just that:
Iterating through the events, we build our log entry structs (more on that in a minute). Once we’ve got all of our entries, we serialize them into JSON and attempt to POST them to our configured endpoint.
If we succeed, we observe the number of events we’ve successfully processed and use
batch.Ack() to notify filebeat that everything went fine. If something goes wrong when POSTing the data, we tell filebeat it should retry the events that make up this batch. If we fail to serialize the data into JSON at the beginning (however unlikely), we assume something must be seriously wrong with the data, so we tell filebeat to drop the entire batch.
It’s important to note that, if we return an error from our
Publish method, filebeat will teardown the entire group of clients and reconnect. Not exactly what we want if we get a 500 response from our endpoint. Since we have other ways of handling errors (retries, dropping bad data), we return
nil instead and wait for the next batch of events.
This is basically everything we need for our plugin to publish log data to a configurable URL and manage the lifecycle of the filebeat batches it receives. The only thing left to do is to figure out how to extract data out of a filebeat event and use it to build our log entries.
Accessing Event Data
In filebeat, events represent a packet of information parsed from some input. In most cases, this is going to be a log file where each event is the data parsed out of a single log line, but there are a lot of different inputs builtin that are not necessarily log related. You can also configure the pipeline to process the event data using a number of different processors that do anything from adding or deleting specific fields from the event, to enriching it with metadata the kubernetes container that is producing the log data.
No matter how the events come in, we need to know how to access the underlying data so we can build our log entry structs. Let’s say we are, in fact, using the add_kubernetes_metadata. If we were to represent the event as a JSON object, the event data would take on something like the following (some fields omitted for brevity):
We can see that the
add_kubernetes_metadata processor added the
kubernetes key with some useful metadata. There are a few more fields we omitted, like node name, container name and image, namespace, etc, but you can see we get enough info about the service producing the logs that we would easily be able to diagnose issues down to a single, troublesome pod.
There are also some standard log input fields like
@timestamp actually represents the time filebeat actually ingested the log line (not necessarily the time it was written), and
message is the raw log line itself. To get at the event data above, we could use filebeat’s event API to access it and build our log entries:
Given we know the structure of our data, we can use the
GetValue method and pass it the dot-notated path to retrieve the value. Being able to build log entries in this way gives us control over storing our log data in a way that suits our needs.
For us, nothing has come close to filebeat in terms of being able to handle our log volume, so having the ability to write a custom output plugin was a huge win for us. In this walkthrough, we’ve hopefully exposed how easy it is for you to use the framework to build outputs for any destination you wanted to ship your log data to. While it’s not a well-documented feature, it’s very powerful, and also supports extension with custom inputs and processors as well. If you’re considering filebeat but find it doesn’t support something you want in your pipeline, well, now you know it actually already does!
We wanted to provide a crash course in writing custom plugins for filebeat, so you might feel like we glossed over some of the details in this post. For one, parsing log messages is never as simple as the example above, and having filebeat parse more complicated lines deserves an article to itself. There are also a few things about building outputs that we didn’t even mention, like interpolation in configuration (or why you would even want that). We actually have a lot to say about filebeat, and really logging in general, that we think people might be interested in. Be on the lookout for more posts to come!
You Might Also Like
Why you should be using errgroup.WithContext() in your Golang server handlers