Updated January 30, 2018
Fluentd is an open source data collector for building the Unified Logging Layer, which unifies the data collection and consumption for a better use and understanding of data. Once installed on a server, it runs in the background to collect, parse, transform, analyze and store various types of data.
Fluentd was conceived by Sadayuki “Sada” Furuhashi in 2011. Sada is a co-founder of Treasure Data, Inc., a primary sponsor of the Fluentd project. In November 2016, the Cloud Native Computing Foundation (CNCF) Technical Oversight Committee (TOC) voted to accept Fluentd as the fourth hosted project after Kubernetes, Prometheus and OpenTracing.
Fluentd was created to solve log/data collection and distribution needs at scale, offering a comprehensive and reliable service to be implemented in conjunction with microservices and generic cloud monitoring tools. With 700+ plugins connecting to its many data sources and data outputs, Fluentd was the 2016 Bossie Awards winner for the best open source datacenter and cloud software.
Fluentd decouples data sources from backend systems by providing a Unified Logging Layer in between.
This layer allows developers and data analysts to utilize many types of logs as they are generated. Just as importantly, it mitigates the risk of “bad data” slowing down and misinforming your organization.
A unified logging layer lets you and your organization make better use of data and iterate more quickly on your software.
5,000+ data-driven companies rely on Fluentd to differentiate their products and services through a better use and understanding of their log data. According to a survey by Datadog, Fluentd is the 7th most used technology running in Docker container environments. Some Fluentd users collect data from thousands of machines in real-time. Thanks to its small memory footprint (30~40MB), you can save a lot of memory at scale. Adopter community companies include Atlassian, LINE, Microsoft, Nintendo, Google Cloud Platform, Docker, Kubernetes, GREE, and many others.
Fluentd was born to solve Logging problems as a whole, not only for standalone applications but also for distributed architectures where each running application and system have their own way to solve logging. It streamlines integration between all components, and the ability to move data from one place to another in a secure and reliable way.
The Unified Logging Layer’s key goal is to connect various sources of log data to various destination systems (NoSQL databases, HDFS, RDBMs, etc.). The first requirement of a Unified Logging Layer is to define an interface that all log producers and consumers implement against. As such, it is important to choose an interface that has ubiquitous support.
The Unified Logging Layer must provide reliable and scalable data transport. If all log data were to go through the Unified Logging Layer, then it’d better be able to filter, buffer and route incoming data robustly. The logging layer needs to be able to scale horizontally as well as be able to support retry-able data transfer. The logging layer should provide an easy mechanism to add new data inputs/outputs without a huge impact on its performance while also anticipating network failures and must not lose data when a network failure occurs.
The Unified Logging Layer also must be able to support new data inputs (e.g., new web services, new sensors, new middleware) and outputs (new storage servers, databases, API endpoints) with little technical difficulty. To achieve this goal, the Unified Logging Layer should have a pluggable architecture into which new data inputs and outputs can be “plugged.” Once a new data input is plugged in, no additional work should be required to send that data to all existing data outputs and vice versa.
In the open source world, the two most-popular data collectors are Logstash and Fluentd. Logstash is most known for being part of the ELK Stack while Fluentd has become increasingly used by communities of users of software such as Docker, GCP, and Elasticsearch. While there are several differences, the similarities between Logstash and Fluentd are greater than their differences. Users of either Logstash or Fluentd are miles ahead of the curve when it comes to log management.
Some of the minor differences are listed below:
Fluentd is one of the data inputs or outputs for Kafka, and Kafka is one of the data inputs or outputs for Fluentd. Kafka is primarily related to holding log data rather than moving log data. Thus, Kafka producers need to write the code to put data in Kafka, and Kafka consumers need to write the code to pull data out of Kafka.
For educational, technical and case study presentations about Fluentd, check out:
As a CNCF hosted project, Fluentd is part of a neutral community aligned with technical interests to help companies move to cloud native deployment models and help developers deliver on the promise of microservices and cloud native applications at scale. As Fluentd grows, CNCF is helping build its community, marketing and documentation efforts. CNCF also assists with marketing and documentation efforts. For more, read “Fluentd joins CNCF.”
Fluentd incubation level project, under the CNCF Graduation Criteria v1.0. The CNCF Graduation Criteria by the TOC provides every CNCF project an associated maturity level of either inception, incubating or graduated, which allows CNCF to review projects at different maturity levels to advance the development of cloud native technology and services. As an incubating project, Fluentd must have documentation that it is being used successfully in production by at least three independent end users, have a healthy number of committers, and demonstrate a substantial ongoing flow of commits and merged contributions.
No, conversely, Fluentd is more than a project, it’s a full ecosystem and integration with third party components is fundamental. The Fluentd v1.0 release also included the continuous investment in integration in Prometheus (monitoring) and Apache Kafka (data streaming) within many others.