Apache Kafka Automation, Part 3: Monitoring with Open-Source Tools

In part 1 of this series, I covered how to automate Zookeeper and deploy it as a service, in part 2, I discussed how we can deploy Apache Kafka on AWS in a fully automated and self-healing manner. Now let’s talk about operation! Now that we have a fully automated streaming platform, we need to make sure that our Kafka brokers and Zookeeper nodes are healthy and that we have enough of them in our cluster.

With Control Center, one of the Confluent Platform Enterprise components, you get a dashboard for monitoring and managing Kafka, Zookeeper, and other Confluent components. However, the Control Center is not free! Fortunately, there are alternatives. In this post, I will demonstrate how you can create a monitoring dashboard using open source tools and libraries.

Requirements

Here is a list of the tools and libraries we’ll use:

Prometheus: an open-source toolkit for system monitoring and alerting originally built at SoundCloud.
Node Exporter: collects hardware and operating system metrics using pluggable metric collectors. It gathers details like memory, disk, and CPU utilization.
Grafana: an open-source metric analytics & visualization suite. It is most commonly used for visualizing time series data for infrastructure and application analytics, but it is also used in other domains such as industrial sensors, home automation, weather, and process control.

Let’s dive into each tool and explore how to set them up.

Prometheus

For automating the process of installation, we can use this Chef cookbook, however, for simplicity, I will demonstrate how to install using shell commands. Before we do that, we need an EC2 instance with Ubuntu 16.04. Create one and ssh into it. Now you are ready to install and configure Prometheus:

Installing Prometheus

From the Prometheus download page, get the latest stable version:

$ wget https://github.com/prometheus/prometheus/releases/download/v2.11.1/prometheus-2.11.1.linux-amd64.tar.gz

# compare the checksum
$ sha256sum prometheus-2.11.1.linux-amd64.tar.gz

# untar 
$ tar xvf prometheus-2.11.1.linux-amd64.tar.gz

# create prometheus user
$ sudo useradd --no-create-home --shell /bin/false prometheus

# create the necessary directories with proper ownership
$ sudo mkdir /etc/prometheus
$ sudo chown prometheus:prometheus /etc/prometheus
$ sudo mkdir /var/lib/prometheus
$ sudo chown prometheus:prometheus /var/lib/prometheus

# copy the binary files to the bin folder
$ sudo cp prometheus-2.11.1.linux-amd64/prometheus /usr/local/bin/
$ sudo cp prometheus-2.11.1.linux-amd64/promtool /usr/local/bin/

# change the ownership to prometheus user
$ sudo chown prometheus:prometheus /usr/local/bin/prometheus
$ sudo chown prometheus:prometheus /usr/local/bin/promtool

# set consoles and console_libraries
$ sudo cp -r prometheus-2.11.1.linux-amd64/consoles /etc/prometheus
$ sudo chown -R prometheus:prometheus /etc/prometheus/consoles
$ sudo cp -r prometheus-2.11.1.linux-amd64/console_libraries /etc/prometheus
$ sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries

# cleanup
rm -rf rm -rf prometheus-2.11.1.linux-amd64*

Configuring Prometheus

Create prometheus.yml file in the folder /etc/prometheus and copy this content:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

Change the ownership of the file to the newly created user prometheus:

$ sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml

Running Prometheus

Create prometheus.service file in the folder /etc/systemd/system and copy this content:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target

Reload, enable, and start the prometheus.service by running:

$ sudo systemctl daemon-reload
$ sudo systemctl start prometheus
$ sudo systemctl enable prometheus

Now, if you browse to the host with port 9090 (http://<ec2_ip_address>:9090), you should see the welcome/landing page.

Please note: Prometheus does not have built-in authentication; instead you can use nginx to add basic HTTP authentication.

Node Exporter

For collecting metrics from any EC2 instances such as Kafka brokers and Zookeeper nodes, we need to install Node Exporter. To do that, install the package prometheus-node-exporter from the apt repository manually by running:

$ sudo apt update
$ sudo apt install prometheus-node-exporter

Or using Chef in Kafka and Zookeeper cookbooks:

package 'prometheus-node-exporter' do
  action :install
end

To verify it is present:

$ curl http://:9100/metrics

At this point, we can assume that the node exporter is running on all Zookeeper and Kafka brokers. Next, I am going to cover the Zookeeper and Kafka configuration for Prometheus and Node Exporter.

Metrics

There are two kinds of Metrics we should collect from Kafka brokers:

Internal metrics: JMX specific metrics, using the default reporter, though we can add in any pluggable reporter. Examples include: PartitionCount, UnderReplicatedPartitions, and OfflinePartitionsCount. Check here for the full list.
Node Exporter metrics: Hardware and operating system specific metrics, such as CPU, network, and memory utilization.

Kafka Configuration

First, confirm the Node Exporter is installed and functional on port 9100.

Second, setup the JMX internal reporter:

Step 1: download the java agent jar file from the maven repository and copy it under the /opt/prometheus/ folder. Chef example:

# Prometheus jmx exporter
directory '/opt/prometheus'

prometheus_agent = 'https://repo1.maven.org/maven2/io/prometheus/jmx/' \
                    'jmx_prometheus_javaagent/0.6/' \
                    'jmx_prometheus_javaagent-0.6.jar'
remote_file '/opt/prometheus/jmx_prometheus_javaagent-0.6.jar' do
  source prometheus_agent
end

Step 2: create the configuration yaml file with the content from here and set the JAVA_OPS. This is the Chef source code:

cookbook_file '/opt/prometheus/kafka.yml' do
  source 'prometheus-kafka.yml'
end

cookbook_file '/etc/default/kafka' do
  source 'systemd/kafka_environment'
  mode '0644'
end

Step 3: set the environment file in the /etc/systemd/system/kafka.service file:

...
EnvironmentFile=/etc/default/kafka
...

Now you should be able to get JMX metrics from any broker, if you curl with host ip and port 7071 with the url /metrics.

Zookeeper Configuration

This should be very similar to the Kafka Configuration in the previous section. The only difference is the use of the Zookeeper environment file, instead of KAFKA_OPTS. In Zookeeper, we have EXTRA_ARGS.

Note that I tried to follow the naming conventions for Kafka and Zookeeper. For instance, with Prometheus yaml file, here we have prometheus.yml.

Monitoring Other Components

For monitoring other Kafka/Confluent components like kafka-consumers, ksql, or kafka-connect we can follow the same pattern if the services are hosted on EC2 instances.

However, if you are a microservice enthusiast, you probably want to host these components on Elastic Container Service (ECS) or Kubernetes containers. If that is the case, here is how you can achieve that.

Kafka Connect Hosted on ECS

Download the jmx_prometheus_javaagent inside the container.
Create Prometheus config.yml and copy it inside the container.
Set the KAFKA_OPTS and JMX_PORT environment variable for the container:

...
ContainerDefinitions:
  PortMappings:
    - ContainerPort: !Ref Port
    - ContainerPort: 6001
  Environment:
    - Name: KAFKA_OPTS
      Value: -javaagent:/kafka/jmx_prometheus_javaagent.jar=7071:/kafka/config.yml
    - Name: JMX_PORT
      Value: 6001
...

Monitoring other Kafka or Confluent components such as KSQL, Schema Registry, Producers, Consumers, and so on, would be very similar to the above, depending on the host, either EC2 instances or containers.

Collecting Metrics

We have all the /metrics endpoints. We need to configure the Prometheus to call these endpoints periodically and collect the data. To do that, edit /etc/prometheus/prometheus.yml from Prometheus server with this content:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: prometheus
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: kafka
    scrape_interval: 5s
    static_configs:
      - targets:
          - broker-1-ip:7071
          - broker-2-ip:7071
          - broker-3-ip:7071
          - broker-1-ip:9100
          - broker-2-ip:9100
          - broker-3-ip:9100
  - job_name: zookeeper
    scrape_interval: 5s
    static_configs:
      - targets:
          - zookeeper-1-ip:7071
          - zookeeper-2-ip:7071
          - zookeeper-3-ip:7071
          - zookeeper-1-ip:9100
          - zookeeper-2-ip:9100
          - zookeeper-3-ip:9100
  - job_name: debezium
    scrape_interval: 5s
    static_configs:
      - targets:
          - debezium-host-ip:7071
  - job_name: kafka-consumer
    scrape_interval: 5s
    static_configs:
      - targets:
          - kafka-consumer-host-ip:7071

If you want to monitor more services, just add a new job with the proper properties in exporter.

Now that we can gather metrics, it’s time for visualization!

## Grafana

In this section, I am going to walk through setting up a visualization for what we have done.

Step 1: install Grafana on the local machine. Or, arguably a better way, use this docker image:

$ docker run -d --name=grafana -p 3000:3000 grafana/grafana

Step 2: navigate to localhost:3000 with default credentials (admin/admin).

Step 3: from the left menu bar, navigate to Configuration -> Data Sources, click on the Add data source button, and add Prometheus as a data source. Add your previously configured Prometheus url and set it as the default.

Step 4: from the left menu bar, navigate to Dashboard -> Manage, click on the Import button, copy the content of the kafka-overview.json file, and paste it in the paste JSON box. Then click Load.

Step 5: Showtime! Check your Kafka Overview dashboard:

Kafka details:

Zookeeper details:

Conclusion

As you can see, this gadget looks very professional and is very functional. Here is the list of repositories that can help you create CI/CD pipelines for deploying the Kafka ecosystem on AWS: