Apache Kafka Automation, Part 3: Monitoring with Open-Source Tools
In part 1 of this series, I covered how to automate Zookeeper and deploy it as a service, in part 2, I discussed how we can deploy Apache Kafka on AWS in a fully automated and self-healing manner. Now let's talk about operation! Now that we have a fully automated streaming platform, we need to make sure that our Kafka brokers and Zookeeper nodes are healthy and that we have enough of them in our cluster.
With Control Center, one of the Confluent Platform Enterprise components, you get a dashboard for monitoring and managing Kafka, Zookeeper, and other Confluent components. However, the Control Center is not free! Fortunately, there are alternatives. In this post, I will demonstrate how you can create a monitoring dashboard using open source tools and libraries.
Here is a list of the tools and libraries we'll use:
- Prometheus: an open-source toolkit for system monitoring and alerting originally built at SoundCloud.
- Node Exporter: collects hardware and operating system metrics using pluggable metric collectors. It gathers details like memory, disk, and CPU utilization.
- Grafana: an open-source metric analytics & visualization suite. It is most commonly used for visualizing time series data for infrastructure and application analytics, but it is also used in other domains such as industrial sensors, home automation, weather, and process control.
Let's dive into each tool and explore how to set them up.
For automating the process of installation, we can use this Chef cookbook, however, for simplicity, I will demonstrate how to install using shell commands. Before we do that, we need an EC2 instance with Ubuntu 16.04. Create one and ssh into it. Now you are ready to install and configure Prometheus:
From the Prometheus download page, get the latest stable version:
$ wget https://github.com/prometheus/prometheus/releases/download/v2.11.1/prometheus-2.11.1.linux-amd64.tar.gz # compare the checksum $ sha256sum prometheus-2.11.1.linux-amd64.tar.gz # untar $ tar xvf prometheus-2.11.1.linux-amd64.tar.gz # create prometheus user $ sudo useradd --no-create-home --shell /bin/false prometheus # create the necessary directories with proper ownership $ sudo mkdir /etc/prometheus $ sudo chown prometheus:prometheus /etc/prometheus $ sudo mkdir /var/lib/prometheus $ sudo chown prometheus:prometheus /var/lib/prometheus # copy the binary files to the bin folder $ sudo cp prometheus-2.11.1.linux-amd64/prometheus /usr/local/bin/ $ sudo cp prometheus-2.11.1.linux-amd64/promtool /usr/local/bin/ # change the ownership to prometheus user $ sudo chown prometheus:prometheus /usr/local/bin/prometheus $ sudo chown prometheus:prometheus /usr/local/bin/promtool # set consoles and console_libraries $ sudo cp -r prometheus-2.11.1.linux-amd64/consoles /etc/prometheus $ sudo chown -R prometheus:prometheus /etc/prometheus/consoles $ sudo cp -r prometheus-2.11.1.linux-amd64/console_libraries /etc/prometheus $ sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries # cleanup rm -rf rm -rf prometheus-2.11.1.linux-amd64*
Create prometheus.yml file in the folder /etc/prometheus and copy this content:
global: scrape_interval: 15s scrape_configs: - job_name: 'prometheus' scrape_interval: 5s static_configs: - targets: ['localhost:9090']
Change the ownership of the file to the newly created user prometheus:
$ sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
Create prometheus.service file in the folder /etc/systemd/system and copy this content:
[Unit] Description=Prometheus Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus \ --config.file /etc/prometheus/prometheus.yml \ --storage.tsdb.path /var/lib/prometheus/ \ --web.console.templates=/etc/prometheus/consoles \ --web.console.libraries=/etc/prometheus/console_libraries [Install] WantedBy=multi-user.target
Reload, enable, and start the prometheus.service by running:
$ sudo systemctl daemon-reload $ sudo systemctl start prometheus $ sudo systemctl enable prometheus
Now, if you browse to the host with port 9090 (http://<ec2_ip_address>:9090), you should see the welcome/landing page.
Please note: Prometheus does not have built-in authentication; instead you can use nginx to add basic HTTP authentication.
For collecting metrics from any EC2 instances such as Kafka brokers and Zookeeper nodes, we need to install Node Exporter. To do that, install the package prometheus-node-exporter from the apt repository manually by running:
$ sudo apt update $ sudo apt install prometheus-node-exporter
package 'prometheus-node-exporter' do action :install end
To verify it is present:
$ curl http://<host-ip-address>:9100/metrics
At this point, we can assume that the node exporter is running on all Zookeeper and Kafka brokers. Next, I am going to cover the Zookeeper and Kafka configuration for Prometheus and Node Exporter.
There are two kinds of Metrics we should collect from Kafka brokers:
- Internal metrics: JMX specific metrics, using the default reporter, though we can add in any pluggable reporter. Examples include: PartitionCount, UnderReplicatedPartitions, and OfflinePartitionsCount. Check here for the full list.
- Node Exporter metrics: Hardware and operating system specific metrics, such as CPU, network, and memory utilization.
First, confirm the Node Exporter is installed and functional on port 9100.
Second, setup the JMX internal reporter:
Step 1: download the java agent jar file from the maven repository and copy it under the /opt/prometheus/ folder. Chef example:
# Prometheus jmx exporter directory '/opt/prometheus' prometheus_agent = 'https://repo1.maven.org/maven2/io/prometheus/jmx/' \ 'jmx_prometheus_javaagent/0.6/' \ 'jmx_prometheus_javaagent-0.6.jar' remote_file '/opt/prometheus/jmx_prometheus_javaagent-0.6.jar' do source prometheus_agent end
cookbook_file '/opt/prometheus/kafka.yml' do source 'prometheus-kafka.yml' end cookbook_file '/etc/default/kafka' do source 'systemd/kafka_environment' mode '0644' end
... EnvironmentFile=/etc/default/kafka ...
Now you should be able to get JMX metrics from any broker, if you curl with host ip and port 7071 with the url /metrics.
This should be very similar to the Kafka Configuration in the previous section. The only difference is the use of the Zookeeper environment file, instead of KAFKA_OPTS. In Zookeeper, we have EXTRA_ARGS.
Note that I tried to follow the naming conventions for Kafka and Zookeeper. For instance, with Prometheus yaml file, here we have prometheus.yml.
Monitoring Other Components
For monitoring other Kafka/Confluent components like kafka-consumers, ksql, or kafka-connect we can follow the same pattern if the services are hosted on EC2 instances.
However, if you are a microservice enthusiast, you probably want to host these components on Elastic Container Service (ECS) or Kubernetes containers. If that is the case, here is how you can achieve that.
Kafka Connect Hosted on ECS
... ContainerDefinitions: PortMappings: - ContainerPort: !Ref Port - ContainerPort: 6001 Environment: - Name: KAFKA_OPTS Value: -javaagent:/kafka/jmx_prometheus_javaagent.jar=7071:/kafka/config.yml - Name: JMX_PORT Value: 6001 ...
Monitoring other Kafka or Confluent components such as KSQL, Schema Registry, Producers, Consumers, and so on, would be very similar to the above, depending on the host, either EC2 instances or containers.
We have all the /metrics endpoints. We need to configure the Prometheus to call these endpoints periodically and collect the data. To do that, edit /etc/prometheus/prometheus.yml from Prometheus server with this content:
global: scrape_interval: 15s scrape_configs: - job_name: prometheus scrape_interval: 5s static_configs: - targets: ['localhost:9090'] - job_name: kafka scrape_interval: 5s static_configs: - targets: - broker-1-ip:7071 - broker-2-ip:7071 - broker-3-ip:7071 - broker-1-ip:9100 - broker-2-ip:9100 - broker-3-ip:9100 - job_name: zookeeper scrape_interval: 5s static_configs: - targets: - zookeeper-1-ip:7071 - zookeeper-2-ip:7071 - zookeeper-3-ip:7071 - zookeeper-1-ip:9100 - zookeeper-2-ip:9100 - zookeeper-3-ip:9100 - job_name: debezium scrape_interval: 5s static_configs: - targets: - debezium-host-ip:7071 - job_name: kafka-consumer scrape_interval: 5s static_configs: - targets: - kafka-consumer-host-ip:7071
If you want to monitor more services, just add a new job with the proper properties in exporter.
Now that we can gather metrics, it's time for visualization!
$ docker run -d --name=grafana -p 3000:3000 grafana/grafana
Step 2: navigate to localhost:3000 with default credentials (admin/admin).
Step 3: from the left menu bar, navigate to Configuration -> Data Sources, click on the Add data source button, and add Prometheus as a data source. Add your previously configured Prometheus url and set it as the default.
Step 4: from the left menu bar, navigate to Dashboard -> Manage, click on the Import button, copy the content of the kafka-overview.json file, and paste it in the paste JSON box. Then click Load.
Step 5: Showtime! Check your Kafka Overview dashboard:
As you can see, this gadget looks very professional and is very functional. Here is the list of repositories that can help you create CI/CD pipelines for deploying the Kafka ecosystem on AWS: