GSOC 17: Create and implement a data model to standardize Kubernetes logs

Posted on September 5, 2017

CNCF projects highlighted in this post

The Google Summer of Code (GSOC) program 2017 has come to an end and we followed up with CNCF’s seven interns previously featured in this blog post to check in on how their summer project progressed.

Over the summer, UIET CSJM University student Amit Kumar Jaiswal worked with Red Hat mentor Miguel Perez Colino on“Create and Implement a Data Model to Standardize Kubernetes Logs,” a project aimed at building and implementing a data model for logs in a large Kubernetes cluster to process, correlate, and query to make troubleshooting easier and reduce the time in finding root causes.

Read on to learn the results of this project from Amit and Miguel.

Our GSoC project aims to standardize Kubernetes logs so the data contained in them can be easily used. Logging is one of the major challenges with any large distributed deployment on platforms such as Kubernetes, and configuring and holding a central repository for log collection becomes a need for the day-to-day operations, as well as for debugging microservices. For that final cause, the blending of Elasticsearch, Fluentd and Kibana can form a dynamic logging layer on top of Kubernetes clusters.

Many of the Kubernetes deployments require a fully containerized toolset to aggregate logs, process them, and make them available to users and administrators. In this blog we will show the efforts being made in project ViaQ to provide a full containerized Elasticsearch, Fluentd and Kibana, as well as the work being done under the CNCF umbrella, sponsored by Google Summer of Code, to improve the processing of the Kubernetes infrastructure log processing.

We will describe how to log Kubernetes using dedicated Fluentd, Elasticsearch and Kibana nodes. To do this, we will define our own pods so that when our cluster is created, the standard output and standard error output for each container is ingested into Elasticsearch and Kibana using a Fluentd agent.

Collecting logging with Fluentd

Even if some describe Fluentd as a ‘syslogd that understands JSON’, it does more than just collecting logs and redirecting them. Fluentd is a project under the CNCF umbrella and is capable of ingesting, tagging and pre-processing logs. Some notable features are:

Easy installation by rpm/deb/gem
Small footprint with 3000 lines of Ruby
Semi-Structured data logging
Easy start with small configuration
Fully pluggable architecture, and plugin distribution by Ruby gems

We will be using containerized Fluentd to gather all of the logs that are stored within individual nodes in our Kubernetes cluster (these logs can be found under the /var/log/containers directory in the cluster) and also to ingest the logs coming from journald.

Installing and configuring Kubernetes

There are many Kubernetes distributions in the market. For this project we decided to use OpenShift Origin 1.5 as the distribution of choice. We analyzed also the CoreOS distribution (that also uses journald to gather logs).

ViaQ is a thin wrapper on top of https://github.com/openshift/origin-aggregated-logging

For this, ViaQ mainly provides the common data model used in Elasticsearch and Kibana: https://github.com/ViaQ/elasticsearch-templates/

To setup and running quickly with OpenShift Logging stack, you can use either the OpenShift Container Platform (OCP) based on RHEL7, or OpenShift Origin (Origin) based on CentOS7. Ansible is used to install logging using the OpenShift Ansible logging roles.

Prerequisites :

Run a VM with CentOS 7.3 or later
At least 4GB RAM for the VM
Having OpenShift installed where we can deploy ViaQ wrapper

Walk-through to get up ViaQ Common Model running :

Check Machine Provisioning : I used a Machine that has been provisioned

Assign the machine an FQDN and IP address so that it can be reached from another machine – these are the public_hostname and public_ip

root@ith-ThinkPad-W520:/home/ith# hostname

ith-ThinkPad-W520

root@ith-ThinkPad-W520:/home/ith# hostname -f

ith-ThinkPad-W520

root@ith-ThinkPad-W520:/home/ith# ip a # IP Address i.e., accessible externally

root@ith-ThinkPad-W520:/home/ith# getent ahostsv4 `hostname`

127.0.1.1 STREAM ith-ThinkPad-W520

127.0.1.1 DGRAM

127.0.1.1 RAW

root@ith-ThinkPad-W520:/home/ith# getent hosts 127.0.0.1

127.0.0.1 localhost

Check SSH Configuration

root@ith-ThinkPad-W520:/home/ith# ll .ssh/

root@ith-ThinkPad-W520://home/ith# ssh-keygen -q -N “” -f /root/.ssh/id_rsa

Add the ssh pubkey to the user account $HOME/.ssh/authorized_keys [ So that Ansible doesn’t ask me to trust my IP address ]

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Add the ssh hostkey for localhost to your SSH known_hosts [ Add root password keys to authorized key so that it doesn’t allow for passwordless ssh ]

ssh-keyscan -H localhost >> $HOME/.ssh/known_hosts

Add the ssh hostkey for public_hostname to your SSH known_hosts

ssh-keyscan -H **public_hostname** >> $HOME/.ssh/known_hosts

Check your authorized keys :
root@ith-ThinkPad-W520:/home/ith# more .ssh/known_hosts

root@ith-ThinkPad-W520:/home/ith# more .ssh/authorized_keys

This step is only needed if not using root – enable passwordless sudo e.g. in sudoers config:

$USER ALL=(ALL) NOPASSWD:ALL

Setup ViaQ yum repository

ViaQ on Origin requires these Yum Repos. You will need to install the following packages: docker iptables-services

Grab the above link (Yum Repos) and copy the raw link of Yum Repo

Check root@ith-ThinkPad-W520:/home/ith# more etc/redhat-release

root@ith-ThinkPad-W520:/home/ith# curl -s https://raw.githubusercontent.com/ViaQ/Main/master/centos7-viaq.repo >> /etc/yum.repos.d/centos7-viaq.repo

Install OpenShift Ansible

Ansible is used to install ViaQ and OCP or Origin using OpenShift Ansible. The following packages are required: openshift-ansible openshift-ansible-callback-plugins openshift-ansible-filter-plugins openshift-ansible-lookup-plugins openshift-ansible-playbooks openshift-ansible-roles

# yum install openshift-ansible \
openshift-ansible-callback-plugins openshift-ansible-filter-plugins \
openshift-ansible-lookup-plugins openshift-ansible-playbooks \
Openshift-ansible-roles -y # Using yum

After installing Ansible packages, we need to patch Ansible

Go to openshift-ansible directory :

# cd /usr/share/ansible/openshift-ansible

Patch OpenShift Ansible

# curl https://raw.githubusercontent.com/ViaQ/Main/master/0001-Compatibility-updates-to-openshift_logging-role-for-.patch > 0001-Compatibility-updates-to-openshift_logging-role-for-.patch

# patch -p1 -b < 0001-Compatibility-updates-to-openshift_logging-role-for-.patch

Grab Ansible inventory and environment files :

Download the files vars.yaml.template and ansible-inventory-origin-36-aio

# curl https://raw.githubusercontent.com/ViaQ/Main/master/vars.yaml.template > vars.yaml.template
# curl https://raw.githubusercontent.com/ViaQ/Main/master/ansible-inventory-origin-36-aio > ansible-inventory

# more vars.yaml.template

Check Ansible Environment

You can use ansible to check some of the values, to see which values you need to edit. For example, use

root@ith-ThinkPad-W520:/home/ith# ansible -m setup localhost -a ‘filter=ansible_fqdn’

to see if ansible correctly reports your host’s FQDN, the public_hostname value from above. If so, then you do not have to edit openshift_public_hostname below. Use

root@ith-ThinkPad-W520:/home/ith# ansible -m setup localhost -a ‘filter=ansible_default_ipv4’

to see if ansible correctly reports your IP address in the “address” field, which should be the same as the public_ip value from above. If so, then you do not have to edit openshift_public_ip. You can also verify which IP address is used for external use by using the following command:

$ ip -4 route get 8.8.8.8
8.8.8.8 via 10.0.0.1 dev enp0s25 src 10.10.10.10 uid 1000

Once you have your inventory and vars.yaml, you can now copy it like :

root@ith-ThinkPad-W520:/home/ith# cp vars.yaml.template vars.yaml

Run Ansible

# cd /usr/share/ansible/openshift-ansible
# (or wherever you cloned the git repo if using git)
# ANSIBLE_LOG_PATH=/tmp/ansible.log ansible-playbook -vvv -e @/path/to/vars.yaml -i /path/to/ansible-inventory playbooks/byo/config.yml 2>&1 | tee output

where /path/to/vars.yaml is the full path and filename where you saved your vars.yaml file, and /path/to/ansible-inventory is the full path and filename where you saved your ansible-inventory file.

Check /tmp/ansible.log if there are any errors during the run. If this hangs, just kill it and run it again – Ansible is (mostly) idempotent. Same applies if there are any errors during the run – fix the machine and/or the vars.yaml and run it again.

After it completes the installation, you’ll now able to check logging components i.e., Elasticsearch, Fluentd and Kibana

Check Logging Components

[root@viaq openshift-ansible]# oc project logging

Already on project “logging” on server “https://viaq.logging.test:8443“.

To check operation of pods run below command which returns a complete list of pods that are currently defined:

[root@viaq openshift-ansible]# oc get pods

You should see the Elasticsearch, Curator, Kibana, Fluentd, and mux pods running.

Below are the given commands and these operation returns a complete list of services for Elasticsearch and Kibana that are currently defined:

[root@viaq openshift-ansible]# oc get svc

To see routes for Elasticsearch and Kibana, run below command:

[root@viaq openshift-ansible]# oc get routes

Deploy Logging Components

[root@viaq openshift-ansible]# oc logs -f logging-es-7vo926zw-2-phjrq|more

Check complete logs of all logging components here.

Also check your memory to ensure, given below is the command :

# free -h

To check basic information about pods, the oc describe operation can then be used to return detailed information about a specific pod:

[root@viaq openshift-ansible]# oc describe pod logging-curator-1-hklth

To check docker images of different pods:

[root@viaq openshift-ansible]# docker images|grep logging

Test Elasticsearch

To search Elasticsearch, first get the name of the Elasticsearch pod:

# oc project logging
# espod=`oc get pods -l component=es -o jsonpath='{.items[0].metadata.name}’`

Running Kibana

You will first need to create an OpenShift user and assign this user rights to view the application and operations logs. The install above uses the AllowAllPasswordIdentityProvider which makes it easy to create test users like this:

# oc project logging

Now using project “logging” on server “https://viaq.logging.test:8443“.

Screenshot of OpenShift Origin log in page

# oc login –username=admin –password=admin
# oc login –username=system:admin

Authentication required for https://viaq.logging.test:8443 (openshift)

# oadm policy add-cluster-role-to-user cluster-admin admin

cluster role “cluster-admin” added: “admin”

For complete log, check here.

Now you can use the admin username and password to access Kibana. Just point your web browser at https://kibana.logging.test where the logging.test part is whatever you specified in the openshift_master_default_subdomain parameter in the vars.yaml file.

Screenshot of OpenShift Origin searching to access Kibana

Screenshot of Discover Kibana logging test on OpenShift Origin

Installing and configuring Fluentd

First of all, we will need to install Fluentd (for instructions, use these installation guides).

Next, we will install the ViaQ fluent-plugin-kubernetes_metadata_filter, which allows Kubernetes to create symlinks to Docker log files in /var/log/containers/*.log (But if you installed ViaQ before then you don’t need to install above ViaQ fluentd plugin again).

Now, to view the logs from the entire cluster, we have to launch a single instance of the Fluentd agent on each of the nodes. Here is the Fluentd configuration file, which provides the path for the logging directory, in which Docker collect logs from pods and sends it to journald, then these logs get filtered in fluentd then the container name is detected.

Modification / Updation to Fluentd Config

This information is based on having a VM with OpenShift Logging stack ViaQ (EFK):

All the logs from journald get sent to fluentd. Fluentd is throttled and may lose logs if in a storm, but all the logs shall get processed under normal circumstances.

The OpenShift Origin master and node logs can be seen, in a root shell of the VM, with

# journalctl -u origin-master
# journalctl -u origin-node

Screenshot of Fluentd configmap on OpenShift Origin

To modify configuration of the fluentd pod it can be done via configmap

Check available ConfigMaps:

# oc get cm

# oc edit configmap logging-fluentd

To add new filters we need to go to the “fluent.conf” section and in the “<label @INGRESS>” section, where the filter-pre-*.conf is, add a new include file

@include configs.d/user/filter-my-filter.conf

Note: it has to be in “configs.d/user/” which is where ConfigMap drops the files.

Then we can add a new section for the new file

filter-my-filter.conf: |
<filter systemd.origin>
@type rewrite_tag_filter
rewriterule1 _SYSTEMD_UNIT ^origin-master.service journal.system.kubernetes.master
rewriterule2 _SYSTEMD_UNIT ^origin-node.service journal.system.kubernetes.node
</filter>
# do special processing on origin master logs
<filter journal.system.kubernetes.master>
@type record_transformer
enable_ruby
<record>
my_master_field ${record[‘MESSAGE’].match(/some regex/)[0]}
…
</record>
</filter>

We need to delete the fluentd pods after each changes/updates to refresh the configuration

# oc delete pods $FLUENTD_POD_NAME

If you have multiple pods, unlabel/relabel the nodes to affect multiple fluentds:

# oc label node -l logging-infra-fluentd=true logging-infra-fluentd=false
…. wait for fluentd pods to stop ….
# oc label node -l logging-infra-fluentd=false logging-infra-fluentd=true

To check if fluentd running well …

# oc logs $FLUENTD_POD_NAME

To check my fluentd filter, click here.

You may watch the video here.

The repository of this GSoC project is here.

Hyderabad, India