Guest post originally published on InfraCloud’s blog by Suramya Shah

Monitoring endpoints is an important aspect of system observability for diagnosing performance and availability issues. In this article, we will cover in detail how to achieve endpoint monitoring in Kubernetes using Blackbox Exporter and Prometheus.

What is a Prometheus Exporter?

Prometheus exporter is a translator that can fetch statistics from a non-prometheus system and can convert the statistics to Prometheus understandable metrics, i.e. the Prometheus exposition format. There are a number of Prometheus exporters that are used to export existing metrics from third-party systems to Prometheus metrics, some of them are:

Exporter NameFunction
Blackbox ExporterProbe endpoints over HTTP/S, DNS, TCP, and ICMP
Redis ExporterConnects to Redis instance and provides Redis metrics in Prometheus readable format
Node ExporterExposes hardware and OS metrics for *NIX kernels
Elasticsearch ExporterConnects to Elasticsearch instance and provides various Elasticsearch metrics in Prometheus readable format
Kube-state Metrics Exporteradd-on agent which provides metrics about various Kubernetes objects, such as pods, nodes, and deployments

Other exporters can be found in Exporters and integrations docs page.

What is Endpoint monitoring, why is it needed?

In the current context, endpoint monitoring refers to monitoring internal and external endpoints (HTTP/S, DNS, TCP, and ICMP) for various parameters including HTTP latencies, DNS lookup latencies, SSL certificate expiry information, TLS version.

In a Kubernetes system, not just the external endpoints that need to be monitored, internal endpoints are also required to be monitored for latency and other parameters. These metrics are an important piece of the infrastructure to ensure continuity of service and compliance with some security certifications.

WhiteBox vs BlackBox monitoring

Whitebox monitoring refers to monitoring the internals of the system including application logs, metrics from handlers. Blackbox monitoring on the other hand includes monitoring the behavior from outside that affects users like server down, page not working, or degradation of site performance.

What is Blackbox Exporter?

Blackbox Exporter is used to probe endpoints like HTTPS, HTTP, TCP, DNS, and ICMP. After you define the endpoint, Blackbox Exporter generates metrics that can be visualized using tools like Grafana. One of the most important feature of Blackbox Exporter is measuring the response time of endpoints.

The following diagram shows the flow of Blackbox Exporter monitoring an endpoint.

Blackbox exporter flow diagram

Here is a default module defined in the Blackbox Exporter config:

modules:
  http_2xx:
    http:
      fail_if_not_ssl: true
      ip_protocol_fallback: false
      method: GET
      no_follow_redirects: false
      preferred_ip_protocol: ip4
      valid_http_versions:
        - HTTP/1.1
        - HTTP/2.0
      valid_status_codes:
        - 200
        - 204
    prober: http
    timeout: 15s

The above module is http_2xx. It works on HTTP Probe offered by Blackbox Exporter. Here we have also added valid_status_codes for the probe to return success for the endpoints returning various status codes. You can accordingly configure your blackbox.yml to make the probe return success/failure based on your configurations. Other configuration parameters can be found below:

ParameterFunction
valid_status_codes:List of status codes for the probe to return success for your applications
method: GETUse of HTTP GET to access the endpoint
no_follow_redirects: falseDo not follow HTTP redirects
preferred_ip_protocol: ip4Use IPv4 protocol
valid_http_versionsBoth HTTP/1.1 and HTTP/2.0 are valid
timeout: 15sTimeout after 15s if no response received
fail_if_not_ssl: trueThe probe will fail if the endpoint is not SSL secured

You can have a look at the detailed example for more scenarios in this example.yml. With some config changes on the Prometheus side, the Blackbox Exporter then sends metrics relevant to the configs applied, we will see this in more detail in the coming sections.

Why do we need Blackbox Exporter?

There are various tools available for monitoring endpoints like Datadog, Freshping, Uptime.com, etc. In a production infrastructure with multiple services and endpoints, the regular endpoint monitoring solutions monetarily cost us hefty amounts even for small probe checks, Blackbox Exporter in this case is an open-source alternative to available solutions and is maintained by the Prometheus community.

One point to be noted is that most exporters accept static configurations and expose metrics, Blackbox Exporter works a little differently. Inside the config, you define modules, then Prometheus can query each of the modules for a set of targets. As a response to that query, Blackbox Exporter generates metrics for the queried endpoint. This means we do not have to manually change the endpoints, Prometheus and Blackbox Exporter takes care of generating the endpoints dynamically with the help of Prometheus’ kubernetes_sd_configs functionality.

In small deployments and infrastructure, a static list is easy to maintain, it is very easy to forget about updating that list once you have multiple clusters and environments, maintenance of such monitoring solutions becomes very cumbersome.

Installing and Configuring Blackbox Exporter in Kubernetes

How do you install Blackbox Exporter?

We will be using the prometheus-community/prometheus-blackbox-exporter Helm chart to install Blackbox Exporter, it can be found in prometheus-blackbox-exporter.

You can add the modules for blackbox.yml in the values.yml config section:

config:
  modules:
    http_2xx:
      prober: http
      timeout: 5s
      http:
        valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
        follow_redirects: true
        preferred_ip_protocol: "ip4"
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm install prometheus-blackbox prometheus-community/prometheus-blackbox-exporter -f values.yaml

How to configure Prometheus for Blackbox Exporter?

Prometheus is a prerequisite for Blackbox Exporter, we will be using Prometheus Operator to install Prometheus. You can install/know more about Prometheus Operator from InfraCloud’s prometheus-operator-helm-guide.

We will be editing the prometheus.yml by adding the configurations. If you have installed the Prometheus Operator using kube-prometheus-stack chart, then you can add the configuration under additionalScrapeConfigs[] in values.yml.

Note: Prometheus Operator also has a Probe Custom Resource, which can be used to configure Prometheus with Blackbox Exporter. At the time of writing this article, it only supports dynamic discovery for Ingress resource.

We will majorly be adding configs for the following in Prometheus for our endpoint monitoring.

In the kube-prometheus-stack’s values.yaml, add the following blocks under additionalScrapeConfigs[] section:

1. Add Prometheus config to probe external targets

We can probe certain static targets from Prometheus with the help of Blackbox Exporter using static_configs.

- job_name: 'blackbox-external-targets'
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
    - targets:
      - https://www.google.com
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: prometheus-blackbox-prometheus-blackbox-exporter:9115

The __param_target label tells Prometheus to set the target query param to the given value, which is the target’s address, in this case, i.e. google.com.

In Kubernetes, external targets can be used in some scenarios to test the third-party service’s performance, in checking some performance tools latency issues.

But in a Kubernetes system where resources and endpoints come and go over time, the probing which can be highly useful is the dynamic probing of resources including pods, services, and ingress.

Using Kubernetes service discovery configs in Prometheus, we can achieve the dynamic probing of endpoints. Kubernetes service discovery configurations allow fetching scrape targets from Kubernetes’ API and always stays synchronized with the cluster state. You can find the list of available roles that can be configured to discover targets in the kubernetes_sd_config section of the documentation.

2. Add Prometheus config to probe services

    - job_name: "blackbox-kubernetes-services"
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      # Example relabel to probe only some services that have "example.io/should_be_probed = true" annotation
      #  - source_labels: [__meta_kubernetes_service_annotation_example_io_should_be_probed]
      #    action: keep
      #    regex: true      
        - source_labels: [__address__]
          target_label: __param_target
        - target_label: __address__
          replacement:  prometheus-blackbox-prometheus-blackbox-exporter:9115
        - source_labels: [__param_target]
          target_label: instance
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          target_label: kubernetes_service_name

Here we can use [__meta_kubernetes_service_annotation_example_io_should_be_probed] to only check those services that have the annotation example.io/should_be_probed = true.

3. Add Prometheus config to probe ingress

    - job_name: "blackbox-kubernetes-ingresses"
      metrics_path: /probe
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: ingress
      relabel_configs:
      # Example relabel to probe only some ingresses that have "example.io/should_be_probed = true" annotation
      #  - source_labels: [__meta_kubernetes_ingress_annotation_example_io_should_be_probed]
      #    action: keep
      #    regex: true
        - source_labels:
            [
              __meta_kubernetes_ingress_scheme,
              __address__,
              __meta_kubernetes_ingress_path,
            ]
          regex: (.+);(.+);(.+)
          replacement: ${1}://${2}${3}
          target_label: __param_target
        - target_label: __address__
          replacement: prometheus-blackbox-prometheus-blackbox-exporter:9115
        - source_labels: [__param_target]
          target_label: instance
        - action: labelmap
          regex: __meta_kubernetes_ingress_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_ingress_name]
          target_label: ingress_name

4. Add Prometheus config to probe pods

    - job_name: "blackbox-kubernetes-pods"
      metrics_path: /probe    
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
      - role: pod        
      relabel_configs:
      # Example relabel to scrape only pods that have
      # "example.io/should_be_scraped = true" annotation.
      #  - source_labels: [__meta_kubernetes_pod_annotation_example_io_should_be_scraped]
      #    action: keep
      #    regex: true
        - source_labels: [__address__]
          target_label: __param_target
        - target_label: __address__
          replacement:  prometheus-blackbox-prometheus-blackbox-exporter:9115
        - source_labels: [__param_target]
          replacement: ${1}/health
          target_label: instance          
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          target_label: kubernetes_pod_name     

Considering our application which exposes health of the application at /health endpoint, we have used replacement directive for source_label.

Verify the generated metrics in Prometheus

Verify targets in Prometheus

Once the changes are applied and the resources for the Blackbox Exporter are deployed, we can verify the status of targets in Prometheus. We can check whether the Blackbox Exporter is up with the registered targets by navigating to the Status tab and then selecting Targets in the Prometheus UI.

Here you can see we are using https://www.google.com as an external target for reference with its state UP. We can also check if metrics are getting populated by looking for metrics starting with probe_

Prometheus probe metrics

Here you can see the list of some of the generated probe_ metrics:

Metric nameFunction
probe_duration_secondsReturns how long the probe took to complete in seconds
probe_http_status_codeResponse HTTP status code
probe_http_versionReturns the version of HTTP of the probe response
probe_successDisplays whether or not the probe was a success
probe_dns_lookup_time_secondsReturns the time taken for probe DNS lookup in seconds
probe_ip_protocolSpecifies whether probe ip protocol is IP4 or IP6
probe_ssl_earliest_cert_expiry metricReturns earliest SSL cert expiry in unixtime
probe_tls_version_infoContains the TLS version used
probe_failed_due_to_regexIndicates if probe failed due to regex
probe_http_content_lengthLength of HTTP content response
probe_http_versionReturns the version of HTTP of the probe response

Monitoring configured endpoints using Grafana

We can now use the generated metrics with Grafana to create our custom dashboards. However, there are some already available dashboards on Grafana that can be imported to visualize data from the generated metrics, some of them are:

How did Blackbox Exporter reveal our infrastructure discrepancies?

In this section we will see how we used Blackbox Exporter to probe external targets and some other scenarios where it can be used to probe ingress,pods and services:

External target

We were facing increased latencies in our application performance, and we were not sure which endpoint is causing the problem. We monitored the response time of all endpoints which were being used in the application flow, and with the help of BlackBox Exporter we found our latency causing endpoints.

Let us assume a scenario where we have two Kubernetes services A and B. Service A calls another service B, service B then calls some external endpoints for response, processes it, and sends it back to service A. The external endpoints can be user facing endpoints, third party services, or database endpoints. For reference, we are monitoring google.com as an external endpoint here.

External target example in Grafana for Blackbox Exporter

In the above dashboard example, we can see that we can now monitor the website performance by measuring its response time using probe_http_duration_seconds metrics generated by the Blackbox Exporter, and look for the spike in the external targets that caused the latency in services A and B.

Ingress

Ingress example in Grafana for Blackbox Exporter

In this scenario we will be focusing on the problem regarding certificate expiry, if we want to monitor when our domain certificate is going to expire, we can achieve this by using probe_ssl_earliest_cert_expiry metric generated by Blackbox Exporter for our ingress resources. We can also use it to monitor if DNS resolution is working, or if there is any latency/issues from the loadbalancer side.

Pods

Pods example in Grafana for Blackbox Exporter

We can probe pods and create health dashboard for our applications, in the above example we can see we are probing the pod on /health endpoint by using probe_http_status_code metrics generated by Blackbox Exporter.

Service

Service example in Grafana for Blackbox Exporter

We can also monitor if services are properly configured and are responding to probe checks.

Benefits of Blackbox Exporter

In the above examples, we can see how we can use Blackbox Exporter for some common scenarios in a Kubernetes cluster. Overall, we can say Blackbox Exporter can be used in the following scenarios:

Conclusion

In this article, we covered the following points:

This post introduced you to Blackbox Exporter for endpoint monitoring of a Kubernetes cluster for infrastructure reliability. I hope you found this post informative and engaging. For more posts like this one, do subscribe to our weekly newsletter. I’d love to hear your thoughts on this post, so do start a conversation on LinkedIn :).

References and further reading