Posts Tagged Fluentd

Docker Log Aggregation and Visualization Options with the ELK Stack

elk

As a Developer and DevOps Engineer, it wasn’t that long ago, I spent a lot of time requesting logs from Operations teams for applications running in Production. Many organizations I’ve worked with have created elaborate systems for requesting, granting, and revoking access to application logs. Requesting and obtaining access to logs typically took hours or days, or simply never got approved. Since most enterprise applications are composed of individual components running on multiple application and web servers, it was necessary to request multiple logs. What was often a simple problem to diagnose and fix, became an unnecessarily time-consuming ordeal.

Hopefully, you are still not in this situation. Given the average complexity of today’s modern, distributed, containerized application platforms, accessing individual logs is simply unrealistic and ineffective. The solution is log aggregation and visualization.

Log Aggregation and Visualization

In the context of this post, log aggregation and visualization is defined as the collection, centralized storage, and ability to simultaneously display application logs from multiple, dissimilar sources. Take a typical modern web application. The frontend UI might be built with Angular, React, or Node. The UI is likely backed by multiple RESTful services, possibly built in Java Spring Boot or Python Flask, and a database or databases, such as MongoDB or MySQL. To support the application, there are auxiliary components, such as API gateways, load-balancers, and messaging brokers. These components are likely deployed as multiple instances, for performance and availability. All instances generate application logs in varying formats.

When troubleshooting an application, such as the one described above, you must often trace a user’s transaction from UI through firewalls and gateways, to the web server, back through the API gateway, to multiple backend services via load-balancers, through message queues, to databases, possibly to external third-party APIs, and back to the client. This is why log aggregation and visualization is essential.

Logging Options

Log aggregation and visualization solutions typically come in three varieties: cloud-hosted by a SaaS provider, a service provided by your Cloud provider, and self-hosted, either on-premises or in the cloud. Cloud-hosted SaaS solutions include Loggly, Splunk, Logentries, and Sumo Logic. Some of these solutions, such as Splunk, are also available as a self-hosted service. Cloud-provider solutions include AWS CloudWatch and Azure Application Insights. Most hosted solutions have reoccurring pricing models based on the volume of logs or the number of server nodes being monitored.

Self-hosted solutions include Graylog 2, Nagios Log Server, Splunk Free, and elastic’s Elastic Stack. The ELK Stack (Elasticsearch, Logstash, and Kibana), as it was previously known, has been re-branded the Elastic Stack, which now includes Beats. Beats is elastic’s lightweight shipper that send data from edge machines to Logstash and Elasticsearch.

Often, you will see other components mentioned in the self-hosted space, such as Fluentd, syslog, and Kafka. These are examples of log aggregators or datastores for logs. They lack the combined abilities to collect, store, and display multiple logs. These components are generally part of a larger log aggregation and visualization solution.

This post will explore self-hosted log aggregation and visualization of a Dockerized application on AWS, using the ELK Stack. The post details three common variations of log collection and routing to ELK, using various Docker logging drivers, along with Logspout, Fluentd, and GELF (Graylog Extended Log Format).

Docker Swarm Cluster

The post’s example application is deployed to a Docker Swarm, built on AWS, using Docker CE for AWS. Docker has automated the creation of a Swarm on AWS using Docker Cloud, right from your desktop. Creating a Swarm is as easy as inputting a few options and clicking build. Docker uses an AWS CloudFormation script to provision all the necessary AWS resources for the Docker Swarm.

swam_mode

For this post’s logging example, I built a minimally configured Docker Swarm cluster, consisting of a single Manager Node and three Worker Nodes. The four Swarm nodes, all EC2 instances, are behind an AWS ELB, inside a new AWS VPC.

Logging Diagram AWS Diagram 3D

As seen with the docker node ls command, the Docker Swarm will look similar to the following.

Sample Application Components

Multiple containerized copies of a simple Java Spring Boot RESTful Hello-World service, available on GitHub, along with the associated logging aggregators, are deployed to Worker Node 1 and Worker Node 2. We will explore each of these application components later in the post. The containerized components consist of the following:

  1. Fluentd (garystafford/custom-fluentd)
  2. Logspout (garystafford/custom-logspout)
  3. NGINX (garystafford/custom-nginx)
  4. Hello-World Service using Docker’s default JSON file logging driver
  5. Hello-World Service using Docker’s GELF logging driver
  6. Hello-World Service using Docker’s Fluentd logging driver

NGINX is used as a simple frontend API gateway, which to routes HTTP requests to each of the three logging variations of the Hello-World service (garystafford/hello-world).

A single container, running the entire ELK Stack (garystafford/custom-elk) is deployed to Worker Node 3. This is to isolate the ELK Stack from the application. Typically, in a real environment, ELK would be running on separate infrastructure for performance and security, not alongside your application. Running a docker service ls, the deployed services appear as follows.

Portainer

A single instance of Portainer (Docker Hub: portainer/portainer) is deployed on the single Manager Node. Portainer, amongst other things, provides a detailed view of Docker Swarm, showing each Swarm Node and the service containers deployed to them.

portainer

In my opinion, Portainer provides a much better user experience than Docker Enterprise Edition’s most recent Universal Control Plane (UCP). In the past, I have also used Visualizer (dockersamples/visualizer), one of the first open source solutions in this space. However, since the Visualizer project moved to Docker, it seems like the development of new features has completely stalled out. A good list of container tools can be found on StackShare.

Deployment

All the Docker service containers are deployed to the AWS-based Docker Swarm using a single Docker Compose file. The order of service startup is critical. ELK should fully startup first, followed by Fluentd and Logspout, then the three sets of Hello-World instances, and finally NGINX.

To deploy and start all the Docker services correctly, there are two scripts in the GitHub repository. First, execute the following command, sh ./stack_deploy.sh. This will deploy the Docker service stack and create an overlay network, containing all the services as configured in the docker-compose.yml file. Then, to ensure the services start in the correct sequence, execute sh ./service_update.sh. This will restart each service in the correct order, with pauses between services to allow time for startup; a bit of a hack, but effective.

Collection and Routing Examples

Below is a diagram showing all the components comprising this post’s examples, and includes the protocols and ports on which they communicate. Following, we will look at three variations of self-hosted log collection and routing options for ELK.

Logging Diagram

Example 1: Fluentd

The first example of log aggregation and visualization uses Fluentd, a Cloud Native Computing Foundation (CNCF) hosted project. Fluentd is described as ‘an open source data collector for unified logging layer.’ A container running Fluentd with a custom configuration runs globally on each Worker Node where the applications are deployed, in this case, the hello-fluentd Docker service. Here is the custom Fluentd configuration file (fluent.conf):

The Hello-World service is configured through the Docker Compose file to use the Fluentd Docker logging driver. The log entries from the Hello-World containers on the Worker Nodes are diverted from being output to JSON files, using the default JSON file logging driver, to the Fluentd container instance on the same host as the Hello-World container. The Fluentd container is listening for TCP traffic on port 24224.

Fluentd then sends the individual log entries to Elasticsearch directly, bypassing Logstash. Fluentd log entries are sent via HTTP to port 9200, Elasticsearch’s JSON interface.

Logging Diagram Fluentd

Using Fluentd as a transport method, log entries appear as JSON documents in ELK, as shown below. This Elasticsearch JSON document is an example of a single line log entry. Note the primary field container identifier, when using Fluentd, is container_id. This field will vary depending on the Docker driver and log collector, as seen in the next two logging examples.

fluentd-log.png

The next example shows a Fluentd multiline log entry. Using the Fluentd Concat filter plugin (fluent-plugin-concat), the individual lines of a stack trace from a Java runtime exception, thrown by the hello-fluentd Docker service, have been recombined into a single Elasticsearch JSON document.

fluentd-multiline

In the above log entries, note the DEPLOY_ENV and SERVICE_NAME fields. These values were injected into the Docker Compose file, as environment variables, during deployment of the Hello-World service. The Fluentd Docker logging driver applies these as env options, as shown in the example Docker Compose snippet, below, lines 5-9.

Example 2: Logspout

The second example of log aggregation and visualization uses GliderLabs’ Logspout. Logspout is described by GliderLabs as ‘a log router for Docker containers that runs inside Docker. It attaches to all containers on a host, then routes their logs wherever you want. It also has an extensible module system.’ In the post’s example, a container running Logspout with a custom configuration runs globally on each Worker Node where the applications are deployed, identical to Fluentd.

The hello-logspout Docker service is configured through the Docker Compose file to use the default JSON file logging driver. According to Docker, ‘by default, Docker captures the standard output (and standard error) of all your containers and writes them in files using the JSON format. The JSON format annotates each line with its origin (stdout or stderr) and its timestamp. Each log file contains information about only one container.

Normally, it is not necessary to explicitly set the default Docker logging driver to JSON files. However, in this case, Docker CE for AWS automatically configured each Swarm Nodes Docker daemon default logging driver to Amazon CloudWatch Logs logging driver. The default drive may be seen by running the docker info command while attached to the Docker daemon. Note line 12 in the snippet below.

The hello-fluentd Docker service containers on the Worker Nodes send log entries to individual JSON files. The Fluentd container on each host then retrieves and routes those JSON log entries to Logstash, within the ELK container running on Worker Node 3, over UDP to port 5000. Logstash, which is explicitly listening for JSON via UDP on port 5000, then outputs those log entries to Elasticsearch, via HTTP to port 9200, Elasticsearch’s JSON interface.

Logging Diagram Logspout

Using Logspout as a transport method, log entries appear as JSON documents in ELK, as shown below. Note the field differences between the Fluentd log entry above and this entry. There are a number significant variations, making it difficult to use both methods, across the same distributed application. For example, the main body of the log entry is contained in the message field using Logspout, but in the log field using Fluentd. The name of the Docker container, which serves as the primary means of identifying the container instance, is the docker.name field with Logspout, but container.name for Fluentd.

Another helpful field, provided by Logspout, is the docker.image field. This is beneficial when associating code issues to a particular code release. In this example, the Hello-World service uses the latest Docker image tag, which is not considered best practice. However, in a real production environment, the Docker tags often represents the incremental build number from the CI/CD system, which is tied to a specific build of the code.

logspout-logThe other challenge I have had with Logspout is passing the env and tag options, such as DEPLOY_ENV and SERVICE_NAME, as seen previously with the Fluentd example. Note they are blank in the above sample. It is possible, but not as straightforward as with Fluentd, and requires interacting directly with the Docker daemon on each Worker node.

Example 3: Graylog Extended Format (GELF)

The third and final example of log aggregation and visualization uses the Docker Graylog Extended Format (GELF) logging driver. According to the GELF website, ‘the Graylog Extended Log Format (GELF) is a log format that avoids the shortcomings of classic plain syslog.’ These syslog shortcomings include a maximum length of 1024 bytes, no data types, multiple dialects making parsing difficult, and no compression.

The GELF format, designed to work with the Graylog Open Source Log Management Server, work equally as well with the ELK Stack. With the GELF logging driver, there is no intermediary logging collector and router, as with Fluentd and Logspout. The hello-gelf Docker service is configured through its Docker Compose file to use the GELF logging driver. The two hello-gelf Docker service containers on the Worker Nodes send log entries directly to Logstash, running within the ELK container, running on Worker Node 3, via UDP to port 12201.

Logstash, which is explicitly listening for UDP traffic on port 12201, then outputs those log entries to Elasticsearch, via HTTP to port 9200, Elasticsearch’s JSON interface.

Logging Diagram GELF

Using the Docker Graylog Extended Format (GELF) logging driver as a transport method, log entries appear as JSON documents in ELK, as shown below. They are the most verbose of the three formats.

gelf-logAgain, note the field differences between the Fluentd and Logspout log entries above, and this GELF entry. Both the field names of the main body of the log entry and the name of the Docker container are different from both previous examples.

Another bonus with GELF, each entry contains the command field, which stores the command used to start the container’s process. This can be helpful when troubleshooting application startup issues. Often, the exact container startup command might have been injected into the Docker Compose file at deploy time by the CI Server and contained variables, as is the case with the Hello-World service. Reviewing the log entry in Kibana for the command is much easier and safer than logging into the container and executing commands to check the running process for the startup command.

Unlike Logspout, and similar to Fluentd, note the DEPLOY_ENV and SERVICE_NAME fields are present in the GELF entry. These were injected into the Docker Compose file as environment variables during deployment of the Hello-World service. The GELF Docker logging driver applies these as env options. With GELF the entry also gets the optional tag, which was passed in the Docker Compose file’s service definition, tag: docker.{{.Name}}.

Unlike Fluentd, GELF and Logspout do not easily handle multiline logs. Below is an example of a multiline Java runtime exception thrown by the hello-gelf Docker service. The stack trace is not recombined into a single JSON document in Elasticsearch, like in the Fluentd example. The stack trace exists as multiple JSON documents, making troubleshooting much more difficult. Logspout entries will look similar to GELF.

gelf-multiline

Pros and Cons

In my opinion, and based on my level of experience with each of the self-hosted logging collection and routing options, the following some of their pros and cons.

Fluentd

  • Pros
    • Part of CNCF, Fluentd is becoming the defacto logging standard for cloud-native applications
    • Easily extensible via a large number of plugins
    • Easily containerized
    • Ability to easily handle multiline log entries (ie. Java stack trace)
    • Ability to use the Fluentd container’s service name as the Fluentd address, not an IP address or DNS resolvable hostname
  • Cons
    • Using Docker’s Fluentd logging driver, if the Fluentd container is not available on the container’s host, the container logging to Fluentd will fail (major con!)

Logspout

  • Pros
    • Doesn’t require a change to the default Docker JSON file logging driver, logs are still viewable via docker logs command (big plus!)
    • Easily to add and remove functionality via Golang modules
    • Easily containerized
  • Cons
    • Inability to easily handle multiline log entries (ie. Java stack trace)
    • Logspout containers must be restarted if ELK is restarted to restart logging
    • To reach Logstash, Logspout must use a DNS resolvable hostname or IP address, not the name of the ELK container on the same overlay network (big con!)

GELF

  • Pros
    • Application containers, using Docker GELF logging driver will not fail if downstream Logspout container is unavailable
    • Docker GELF logging driver allows compression of logs for shipment to Logspout
  • Cons
    • Inability to easily handle multiline log entries (ie. Java stack trace)

Conclusion

Of course, there are other self-hosted logging collection and routing options, including elastic’s Beats, journald, and various syslog servers. Each has their pros and cons, depending on your project’s needs. After building and maintaining several self-hosted mission-critical log aggregation and visualization solutions, it is easy to see the appeal of an off-the-shelf cloud-hosted SaaS solution such as Splunk or Cloud provider solutions such as Application Insights.

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , , , , , ,

Leave a comment

Streaming Docker Logs to Elastic Stack (ELK) using Fluentd

Kibana

Introduction

Fluentd and Docker’s native logging driver for Fluentd makes it easy to stream Docker logs from multiple running containers to the Elastic Stack. In this post, we will use Fluentd to stream Docker logs from multiple instances of a Dockerized Spring Boot RESTful service and MongoDB, to the Elastic Stack (ELK).

log_message_flow_notype

In a recent post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, we built a Consul cluster using Docker swarm mode, to host distributed configurations for a Spring Boot service. We will use the resulting swarm cluster from the previous post as a foundation for this post.

Fluentd

According to the Fluentd website, Fluentd is described as an open source data collector, which unifies data collection and consumption for a better use and understanding of data. Fluentd combines all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations. Fluentd structures data as JSON as much as possible.

Logging Drivers

Docker includes multiple logging mechanisms to get logs from running containers and services. These mechanisms are called logging drivers. Fluentd is one of the ten current Docker logging drivers. According to Docker, The fluentd logging driver sends container logs to the Fluentd collector as structured log data. Then, users can utilize any of the various output plugins, from Fluentd, to write these logs to various destinations.

Elastic Stack

The ELK Stack, now known as the Elastic Stack, is the combination of Elastic’s very popular products: Elasticsearch, Logstash, and Kibana. According to Elastic, the Elastic Stack provides real-time insights from almost any type of structured and unstructured data source.

Setup

All code for this post has been tested on both MacOS and Linux. For this post, I am provisioning and deploying to a Linux workstation, running the most recent release of Fedora and Oracle VirtualBox. If you want to use AWS or another infrastructure provider instead of VirtualBox to build your swarm, it is fairly easy to switch the Docker Machine driver and change a few configuration items in the vms_create.sh script (see Provisioning, below).

Required Software

If you want to follow along with this post, you will need the latest versions of git, Docker, Docker Machine, Docker Compose, and VirtualBox installed.

Source Code

All source code for this post is located in two GitHub repositories. The first repository contains scripts to provision the VMs, create an overlay network and persistent host-mounted volumes, build the Docker swarm, and deploy Consul, Registrator, Swarm Visualizer, Fluentd, and the Elastic Stack. The second repository contains scripts to deploy two instances of the Widget Spring Boot RESTful service and a single instance of MongoDB. You can execute all scripts manually, from the command-line, or from a CI/CD pipeline, using tools such as Jenkins.

Provisioning the Swarm

To start, clone the first repository, and execute the single run_all.sh script, or execute the seven individual scripts necessary to provision the VMs, create the overlay network and host volumes, build the swarm, and deploy Consul, Registrator, Swarm Visualizer, Fluentd, and the Elastic Stack. Follow the steps below to complete this part.

When the scripts have completed, the resulting swarm should be configured similarly to the diagram below. Consul, Registrator, Swarm Visualizer, Fluentd, and the Elastic Stack containers should be distributed across the three swarm manager nodes and the three swarm worker nodes (VirtualBox VMs).

swarm_fluentd_diagram

Deploying the Application

Next, clone the second repository, and execute the single run_all.sh script, or execute the four scripts necessary to deploy the Widget Spring Boot RESTful service and a single instance of MongoDB. Follow the steps below to complete this part.

When the scripts have completed, the Widget service and MongoDB containers should be distributed across two of the three swarm worker nodes (VirtualBox VMs).

swarm_fluentd_diagram_b

To confirm the final state of the swarm and the running container stacks, use the following Docker commands.

Open the Swarm Visualizer web UI, using any of the swarm manager node IPs, on port 5001, to confirm the swarm health, as well as the running container’s locations.

Visualizer

Lastly, open the Consul Web UI, using any of the swarm manager node IPs, on port 5601, to confirm the running container’s health, as well as their placement on the swarm nodes.

Consul_1

Streaming Logs

Elastic Stack

If you read the previous post, Distributed Service Configuration with Consul, Spring Cloud, and Docker, you will notice we deployed a few additional components this time. First, the Elastic Stack (aka ELK), is deployed to the worker3 swarm worker node, within a single container. I have increased the CPU count and RAM assigned to this VM, to minimally run the Elastic Stack. If you review the docker-compose.yml file, you will note I am using Sébastien Pujadas’ sebp/elk:latest Docker base image from Docker Hub to provision the Elastic Stack. At the time of the post, this was based on the 5.3.0 version of ELK.

Docker Logging Driver

The Widget stack’s docker-compose.yml file has been modified since the last post. The compose file now incorporates a Fluentd logging configuration section for each service. The logging configuration includes the address of the Fluentd instance, on the same swarm worker node. The logging configuration also includes a tag for each log message.

Fluentd

In addition to the Elastic Stack, we have deployed Fluentd to the worker1 and worker2 swarm nodes. This is also where the Widget and MongoDB containers are deployed. Again, looking at the docker-compose.yml file, you will note we are using a custom Fluentd Docker image, garystafford/custom-fluentd:latest, which I created. The custom image is available on Docker Hub.

The custom Fluentd Docker image is based on Fluentd’s official onbuild Docker image, fluent/fluentd:onbuild. Fluentd provides instructions for building your own custom images, from their onbuild base images.

There were two reasons I chose to create a custom Fluentd Docker image. First, I added the Uken Games’ Fluentd Elasticsearch Plugin, to the Docker Image. This highly configurable Fluentd Output Plugin allows us to push Docker logs, processed by Fluentd to the Elasticsearch. Adding additional plugins is a common reason for creating a custom Fluentd Docker image.

The second reason to create a custom Fluentd Docker image was configuration. Instead of bind-mounting host directories or volumes to the multiple Fluentd containers, to provide Fluentd’s configuration, I baked the configuration file into the immutable Docker image. The bare-bones, basicFluentd configuration file defines three processes, which are Input, Filter, and Output. These processes are accomplished using Fluentd plugins. Fluentd has 6 types of plugins: Input, Parser, Filter, Output, Formatter and Buffer. Fluentd is primarily written in Ruby, and its plugins are Ruby gems.

Fluentd listens for input on tcp port 24224, using the forward Input Plugin. Docker logs are streamed locally on each swarm node, from the Widget and MongoDB containers to the local Fluentd container, over tcp port 24224, using Docker’s fluentd logging driver, introduced earlier. Fluentd

Fluentd then filters all input using the stdout Filter Plugin. This plugin prints events to stdout, or logs if launched with daemon mode. This is the most basic method of filtering.

Lastly, Fluentd outputs the filtered input to two destinations, a local log file and Elasticsearch. First, the Docker logs are sent to a local Fluentd log file. This is only for demonstration purposes and debugging. Outputting log files is not recommended for production, nor does it meet the 12-factor application recommendations for logging. Second, Fluentd outputs the Docker logs to Elasticsearch, over tcp port 9200, using the Fluentd Elasticsearch Plugin, introduced above.

log_message_flow

Additional Metadata

In addition to the log message itself, in JSON format, the fluentd log driver sends the following metadata in the structured log message: container_id, container_name, and source. This is helpful in identifying and categorizing log messages from multiple sources. Below is a sample of log messages from the raw Fluentd log file, with the metadata tags highlighted in yellow. At the bottom of the output is a log message parsed with jq, for better readability.

fluentd_logs

Using Elastic Stack

Now that our two Docker stacks are up and running on our swarm, we should be streaming logs to Elasticsearch. To confirm this, open the Kibana web console, which should be available at the IP address of the worker3 swarm worker node, on port 5601.

Kibana

For the sake of this demonstration, I increased the verbosity of the Spring Boot Widget service’s log level, from INFO to DEBUG, in Consul. At this level of logging, the two Widget services and the single MongoDB instance were generating an average of 250-400 log messages every 30 seconds, according to Kibana.

If that seems like a lot, keep in mind, these are Docker logs, which are single-line log entries. We have not aggregated multi-line messages, such as Java exceptions and stack traces messages, into single entries. That is for another post. Also, the volume of debug-level log messages generated by the communications between the individual services and Consul is fairly verbose.

Kibana_3

Inspecting log entries in Kibana, we find the metadata tags contained in the raw Fluentd log output are now searchable fields: container_id, container_name, and source, as well as log. Also, note the _type field, with a value of ‘fluentd’. We injected this field in the output section of our Fluentd configuration, using the Fluentd Elasticsearch Plugin. The _type fiel allows us to differentiate these log entries from other potential data sources.

Kibana_2.png

References

All opinions in this post are my own and not necessarily the views of my current employer or their clients.

, , , , , , ,

2 Comments