Skip to content

Managing Kafka, Docker and Kubernetes

Amir M. Mir edited this page Mar 30, 2022 · 7 revisions

This page describes essential and useful commands for Kafka, Docker and Kubernetes, which we have been using in the FASTEN project.

Content

Kafka

Setting up

The Confluent Kafka distribution v5.4 is setup at /opt/kafka/bin. The following commands assume that you have added this path to your $PATH. To do that: export PATH=$PATH:/opt/kafka/bin

In the default FASTEN installation, Kafka is being served by a dedicated cluster, running both Kafka (port 9092) and Zookeeper (port 2181). The names of the servers are: delft, samos and goteborg. You can point the following commands to any of those servers.

Listing all the Kafka topics

kafka-topics --list --zookeeper samos:2181

Creating a Kafka topic

kafka-topics --create --zookeeper samos:2181 --replication-factor 2 --partitions 9 --topic fasten.test

This creates a Kafka topic the name of fasten.test and 9 partitions. Note that the name of a Kafka topic should start with the prefix fasten. if you're a FASTEN developer.

Things to consider

  1. The number of partitions dictate how many parallel consumers can read from it. For topics that need high parallelism downstream (e.g., call graph generation), consider at least 30 partitions.

  2. As we run a 3 node cluster, consider making the number of partitions divisible by 3, to fully consume cluster resources.

  3. Too many partitions will mean that Kafka will spend too much time rebalancing consumer groups if a processing node fails. To avoid cascading failures consider increasing the max.poll.interval.ms in your consumer group configurations. Also, choose the number of partitions wisely.

  4. Avoid replication for ephemeral topics, i.e., one-off topics, debug topics etc.

Deleting a Kafka topic

kafka-topics --zookeeper samos:2181 --delete --topic topic-name

Replace topic-name with the name of the Kafka topic you want to delete.

Info of a Kafka topic

kafka-topics --describe --zookeeper samos:2181 --topic topic-name

Replace topic-name with the name of the Kafka topic you want to know about its number of partitions, leaders and etc.

Altering the number of partitions of a Kafka topic

kafka-topics --alter --zookeeper samos:2181 --topic fasten.test --partitions 20

This increases the number of partitions of the topic fasten.test to 20. Change the example topic to your desired Kafka topic as well as the number of partitions. This creates empty partitions. To move Kafka records to another topic with even partitions, perform the following steps: 1- Write Kafka records to a temporary file:

kafkacat -b samos:9092/kafka -C -t topic-name -o beginning -e -q | jq '(.groupId + ":" + .artifactId + ":" + .version + "|") + (. | tostring)' | sed 's/\\//g' | sed -e 's/^"//' -e 's/"$//' > file-name.txt

2- Delete the previous topic and create a new one with arbitrary partitions. 3- Read the file from step 1 and produce records to the newly created topic:

kafka-console-producer --broker-list samos:9092 --topic topic-name --property 'parse.key=true' --property 'key.separator=|' < file-name.txt

Listing Kafka consumer groups

kafka-consumer-groups --list --bootstrap-server samos:9092

Checking the status of a consumer group

kafka-consumer-groups --bootstrap-server samos:9092 --describe --group my-consumer-group 

Replace the my-consumer-group with the name of your consumer group.

Resetting the offsets of a consumer group

Resetting to a particular offset value

First, we need to find latest message that was successfully processed. We do so by analysing the latest messages in all partitions of the output topic. As per FASTEN requirements, this will contain portions of the input message.

kafkacat -C -q -b samos:9092 -t <output_topic> -o -1 -e -f '%p %o %T %s\n'|sort -r -k3|tail -n 1

We can then serially scan the topic we want to reset the index for for a unique characteristic of the latest output message.

kafkacat -C -q -b samos:9092 -t input_topic -f '%p %o %T %s\n' |grep ...

This will give us the partition, offset and timestamp of the message we want to reset out offsets to.

75 12626 1601626238839 {"input":{"date":1481642191, ....

The third value is a UTC-based ms-resolution timestamp. This needs to be converted to the following format: yyyy-MM-ddTHH:mm:ss.xxx

Then we can run the following on our consumer group

kafka-consumer-groups --bootstrap-server samos:9092 --group <consumer_group> --reset-offsets --all-topics --to-datetime <ts>

For extra confidence, we can look up the partition-offset number to be exactly the same as the ones identified by grepping above.

Resetting offsets to the beginning

kafka-consumer-groups --bootstrap-server samos:9092  --reset-offsets --to-earliest --group my-consumer-group --topic my-topic --execute

Replace the my-consumer-group and my-topic with the name of your consumer group and your topic, respectively.

Viewing the records of a Kafka topic

kafka-console-consumer --bootstrap-server samos:9092 --from-beginning --topic topic-name

Replace the topic-name with the name of the topic that you want to view its records.

Getting the number of records in a Kafka topic

kafka-run-class kafka.tools.GetOffsetShell --broker-list samos:9092 --time -1 --offsets 1 --topic topic-name  | awk -F ':' '{sum += $3} END {print sum}'

Replace the topic-name with the name of the topic that you want to get the number of records in.

Getting the date of the last message in a Kafka topic

date +'%Y-%m-%d %H:%M:%S' -d "@$(expr $(kafkacat -C -q -b samos:9092 -t $topic -p 0 -o -1 -e -f '%T') / 1000)"

Replace the topic-name with the name of the topic that you want to get the info about. By default the following command gets a date of the last message in partition 0, you can change this behavior by changing -p flag. The command works in most of the cases, but can throw an error if the topic is empty and in a couple of other cases

Overview of FASTEN-related Kafka topics

You can obtain a high-level overview of FASTEN-related Kafka topics by running the following script.

#!/usr/bin/env bash

echo topic,partitions,records,latest
for topic in $(kafka-topics --list --zookeeper samos:2181|grep fasten); do

  partitions=$(kafka-topics --describe --zookeeper samos:2181 --topic $topic| grep Replicas|wc -l)

  records=$(kafka-run-class kafka.tools.GetOffsetShell --broker-list samos:9092 --time -1 --offsets 1 --topic $topic  | awk -F ':' '{sum += $3} END {print sum}')
  
  latest=$(date +'%Y-%m-%d %H:%M:%S' -d "@$(expr $(kafkacat -C -q -b samos:9092 -t $topic -p 0 -o -1 -e -f '%T') / 1000)")

  echo $topic,$partitions,$records,$latest
done 2>/dev/null

Docker

Listing Docker images

docker images

Listing Running Docker Containers

docker ps -a

Building a Docker image

docker build -t image-name -f docker-file .

Replace the image-name with your desired name and change docker-file to the path of your Docker file.

Starting a Docker container

docker run -d image-name

Replace image-name with the name of the Docker image that you want to run.

Stopping a Docker container

docker stop container-id

Replace the container-id with the container ID of the Docker container you want to stop.

Tagging a Docker image

This command is useful when you want to publish your Docker image on Dockerhub.

docker tag image-tag yourhubusername/image-name

Replace image-tag and yourhubusername with the tag of your Docker image and your Dockerhub username, respectively. Use the docker images to find the tag.

Pushing a Docker image

If you want to publish your docker image on Dockerhub, use the following command:

docker push yourhubusername/image-name

Replace the yourhubusername and image-name with your username on Dockerhub and the name of your Docker image, respectively. Note that you have to tag your docker image before pushing it to the Dockerhub.

Removing a Docker image

docker rmi -f image-id

Replace image-id with the ID of the Docker image that you want to remove.

Viewing logs of a Docker container

docker logs container-id

Replace the container-id with the container ID of your Docker container. Also, you can add -f arg to the above command to view the logs in real-time.

Viewing the resource usage of a Docker container

docker stats container-id

Replace the container-id with the container ID of your Docker container.

Using Bash inside a Docker container

docker exec -it container-id bin/bash

Replace the container-id with the container ID of your Docker container.

Kubernetes

Listing pods

kubectl get pods -n fasten

This shows pods of the fasten namespace.

Listing nodes

kubectl get nodes

Add -o wide to the above command if you want to get more info about the nodes. Also, use --show-labels to see the labels of the nodes.

Listing namespaces

kubectl get namespaces

Listing deployments

kubectl get deployments -n fasten

This shows all the deployments within the namespace of fasten.

Viewing info of a node

kubectl describe node node-name

Replace the node-name with the name of your desired node.

Making a node unschedulable

kubectl cordon node-name

Replace the node-name with the name of your desired node.

Creating a deployment

kubectl apply -f deploy-file

Replace deploy-file with your deployment manifest file. See here for an example of deployment manifest.

Deleting a deployment

kubectl delete deployment deploy-name

Replace the deploy-name with the name of your deployment.

Scaling a deployment

kubectl scale deployment deploy-name --replicas=10

Replace the deploy-name with the name of your deployment. Note that you can change the value of --replicas to either increase or decrease the number of pods of your deployment.

Deleting a pod

kubectl delete pod pod-name

Replace the pod-name with the name of the pod that you want to delete.

Deleting pods from a specified node

kubectl drain node-name

Replace the node-name with the name of your desired node. This kills all the user pods on the specified node.

Cleaning evicted pods

kubectl -n fasten delete pods --field-selector=status.phase!=Running

This cleans all the evicted pods within the FASTEN namespace.

Viewing logs of a pod

kubectl logs pod-name

Replace the pod-name with the name of your pod. Also, add -f to the command to see the logs of a pod in real-time.

Viewing detailed description of the resources of a pod

kubectl describe pod pod-name

Replace the pod-name with the name of your pod.

Viewing resource usage of nodes

kubectl top nodes

Using Bash inside a pod

kubectl exec --stdin --tty pod-name -- /bin/bash

Replace the pod-name with the name of your pod.

Clone this wiki locally