Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

Error: Error connecting to node kafka-1.kafka-svc.kafka-kraft.svc.cluster.local:9093 #3

Open
0ni0nrings opened this issue Oct 25, 2022 · 7 comments

Comments

@0ni0nrings
Copy link

Description

When deploying kakfa brokers using the guide as-is, the pods fails to stay running and terminate with CrashLoopBackOff

✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • minikube version: v1.27.1
  • kafka_version 2.8.2
  • scala_version 2.13

Reproduction Code [Required]

https://github.com/IBM/kraft-mode-kafka-on-kubernetes

Expected behavior

Once the the resources are deployed, the pods should continue to stay in running state and kafka cluster should be available for consuming and producing events.

namespace/kafka-kraft unchanged
persistentvolume/kafka-pv-volume unchanged
persistentvolumeclaim/kafka-pv-claim unchanged
service/kafka-svc unchanged
statefulset.apps/kafka unchanged

Actual behavior

NAME      READY   STATUS             RESTARTS      AGE
kafka-0   0/1     CrashLoopBackOff   6 (88s ago)   16m
kafka-1   0/1     CrashLoopBackOff   6 (57s ago)   15m
kafka-2   0/1     CrashLoopBackOff   6 (27s ago)   15m

Additional context

% kubectl -n kafka-kraft get service -o wide
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE   SELECTOR
kafka-svc   ClusterIP   None         <none>        9092/TCP   23m   app=kafka-app

% kubectl -n kafka-kraft get pods -o wide                                                
NAME      READY   STATUS             RESTARTS        AGE   IP           NODE       NOMINATED NODE   READINESS GATES
kafka-0   0/1     CrashLoopBackOff   7 (3m ago)      23m   172.17.0.3   minikube   <none>           <none>
kafka-1   0/1     CrashLoopBackOff   7 (2m25s ago)   23m   172.17.0.4   minikube   <none>           <none>
kafka-2   0/1     CrashLoopBackOff   7 (102s ago)    23m   172.17.0.5   minikube   <none>           <none>
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  24m                   default-scheduler  Successfully assigned kafka-kraft/kafka-0 to minikube
  Normal   Pulled     23m                   kubelet            Successfully pulled image "strike3test/kafka-kraft" in 37.006310552s
  Normal   Pulled     22m                   kubelet            Successfully pulled image "strike3test/kafka-kraft" in 2.632217268s
  Normal   Pulled     21m                   kubelet            Successfully pulled image "strike3test/kafka-kraft" in 14.303419081s
  Normal   Pulled     19m                   kubelet            Successfully pulled image "strike3test/kafka-kraft" in 2.642474672s
  Normal   Pulling    17m (x5 over 24m)     kubelet            Pulling image "strike3test/kafka-kraft"
  Normal   Created    17m (x5 over 23m)     kubelet            Created container kafka-container
  Normal   Pulled     17m                   kubelet            Successfully pulled image "strike3test/kafka-kraft" in 2.566483671s
  Normal   Started    17m (x5 over 23m)     kubelet            Started container kafka-container
  Warning  BackOff    3m43s (x50 over 21m)  kubelet            Back-off restarting failed container
% kubectl logs kafka-0 -n kafka-kraft
Log directory /mnt/kafka/0 is already formatted. Use --ignore-formatted to ignore this directory and format the others.
[2022-10-25 22:08:44,479] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2022-10-25 22:08:44,735] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2022-10-25 22:08:44,915] INFO [Log partition=@metadata-0, dir=/mnt/kafka/0] Recovering unflushed segment 0 (kafka.log.Log)
[2022-10-25 22:08:44,916] INFO [Log partition=@metadata-0, dir=/mnt/kafka/0] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
[2022-10-25 22:08:44,940] INFO [ProducerStateManager partition=@metadata-0] Writing producer snapshot at offset 14 (kafka.log.ProducerStateManager)
[2022-10-25 22:08:44,958] INFO [Log partition=@metadata-0, dir=/mnt/kafka/0] Loading producer state till offset 14 with message format version 2 (kafka.log.Log)
[2022-10-25 22:08:44,959] INFO [ProducerStateManager partition=@metadata-0] Loading producer state from snapshot file 'SnapshotFile(/mnt/kafka/0/@metadata-0/00000000000000000014.snapshot,14)' (kafka.log.ProducerStateManager)
[2022-10-25 22:08:45,009] INFO [raft-expiration-reaper]: Starting (kafka.raft.TimingWheelExpirationService$ExpiredOperationReaper)
[2022-10-25 22:08:45,159] INFO [RaftManager nodeId=0] Completed transition to ResignedState(localId=0, epoch=150, voters=[0, 1, 2], electionTimeoutMs=1286, unackedVoters=[1, 2], preferredSuccessors=[]) (org.apache.kafka.raft.QuorumState)
[2022-10-25 22:08:45,186] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2022-10-25 22:08:45,194] INFO [kafka-raft-io-thread]: Starting (kafka.raft.KafkaRaftManager$RaftIoThread)
[2022-10-25 22:08:45,194] INFO [kafka-raft-outbound-request-thread]: Starting (kafka.raft.RaftSendThread)
[2022-10-25 22:08:45,196] INFO Starting controller (kafka.server.ControllerServer)
[2022-10-25 22:08:45,342] WARN [RaftManager nodeId=0] Error connecting to node kafka-2.kafka-svc.kafka-kraft.svc.cluster.local:9093 (id: 2 rack: null) (org.apache.kafka.clients.NetworkClient)
java.net.UnknownHostException: kafka-2.kafka-svc.kafka-kraft.svc.cluster.local
	at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)
	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1519)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1378)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1306)
	at org.apache.kafka.clients.DefaultHostResolver.resolve(DefaultHostResolver.java:27)
	at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:111)
	at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:513)
	at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:467)
	at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:172)
	at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:985)
	at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311)
	at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:103)
	at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99)
	at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
	at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:919)
	at kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99)
	at kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73)
	at kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:94)
	at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
[2022-10-25 22:08:45,345] WARN [RaftManager nodeId=0] Error connecting to node kafka-1.kafka-svc.kafka-kraft.svc.cluster.local:9093 (id: 1 rack: null) (org.apache.kafka.clients.NetworkClient)
java.net.UnknownHostException: kafka-1.kafka-svc.kafka-kraft.svc.cluster.local
@keithharvey
Copy link

Had to expose 9093 on the kafka-svc after noticing it was trying to connect to kafka-1.kafka-svc.kafka-kraft.svc.cluster.local:9093, which wasn't mapped. My guess is doughgle/kafka-kraft changed kafka versions so the controller port changed from kafka 2.8 -> 3.1

@Saveriu
Copy link

Saveriu commented Jan 13, 2023

Hi @0ni0nrings do you have any update on this issue?

@PRAYAG-97
Copy link

PRAYAG-97 commented Feb 10, 2023

@0ni0nrings @Saveriu
If you create image with the help of given docker file and try to deploy with single replica instance it will work fine without any error but when you increase the instance of kafka broker then it gives error in pods other then kafka-0
$ kubectl logs -f kafka-1 -n kafka-kraft
[2023-02-08 18:09:35,830] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2023-02-08 18:09:36,156] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2023-02-08 18:09:36,226] WARN No meta.properties file under dir /mnt/kafka/1/meta.properties (kafka.server.BrokerMetadataCheckpoint)
[2023-02-08 18:09:36,229] ERROR Exiting Kafka due to fatal exception (kafka.Kafka$)
kafka.common.KafkaException: No meta.properties found in /mnt/kafka/1 (have you run kafka-storage.sh to format the directory?)
at kafka.server.BrokerMetadataCheckpoint$.$anonfun$getBrokerMetadataAndOfflineDirs$2(BrokerMetadataCheckpoint.scala:164)
at scala.collection.immutable.List.foreach(List.scala:333)
at kafka.server.BrokerMetadataCheckpoint$.getBrokerMetadataAndOfflineDirs(BrokerMetadataCheckpoint.scala:153)
at kafka.server.KafkaRaftServer$.initializeLogDirs(KafkaRaftServer.scala:151)
at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:53)
at kafka.Kafka$.buildServer(Kafka.scala:79)
at kafka.Kafka$.main(Kafka.scala:87)
at kafka.Kafka.main(Kafka.scala)

and you will get java.net.UnknownHostException: kafka-2.kafka-svc.kafka-kraft.svc.cluster.local error in kafka-0 pod because outher pods are down and kafka-0 is trying to connect with them.

i have made some changes in entrypoint.sh file and created new image and it worked for me to run multiple instance.

@zMcKracken
Copy link

Hi,
I have the same problem deploying Kafka with kraft in kube cluster using bitnami helm distro (version 23.0.5 ).
Setting 1 replica works fine, but once I set more than 1 replica it keeps logging the OP error.
Is there any news or workaround for more then 1 replica?

@PRAYAG-97
Copy link

Hi @zMcKracken
i have made some changes in entrypoint.sh file and created new image and it worked for me to run multiple instance.

@Firas-Zarai
Copy link

Firas-Zarai commented Oct 31, 2023

**PRAYAG-97 ** commented

@PRAYAG-97 can you give us the changes you made ?
thanks in advance

@PRAYAG-97
Copy link

PRAYAG-97 commented Nov 7, 2023

@Firas-Zarai
I have made changes in entrypoint.sh file
And Please let me know if there is other solution.
Docker file:
FROM openjdk:11

ENV KAFKA_VERSION=3.4.1
ENV SCALA_VERSION=2.13
ENV KAFKA_HOME=/opt/kafka
ENV PATH=${PATH}:${KAFKA_HOME}/bin

LABEL name="kafka" version=${KAFKA_VERSION}

RUN wget -O /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz https://downloads.apache.org/kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz
&& tar xfz /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz -C /opt
&& rm /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz
&& ln -s /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION} ${KAFKA_HOME}
&& rm -rf /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz

COPY ./entrypoint.sh /
RUN ["chmod", "+x", "/entrypoint.sh"]
ENTRYPOINT ["/entrypoint.sh"]

Entrypoint.sh

#!/bin/bash

NODE_ID=${HOSTNAME:6}
LISTENERS="PLAINTEXT://:9092,CONTROLLER://:9093"
ADVERTISED_LISTENERS="PLAINTEXT://kafka-$NODE_ID.$SERVICE.$NAMESPACE.svc.cluster.local:9092"

CONTROLLER_QUORUM_VOTERS=""
for i in $( seq 0 $REPLICAS); do
if [[ $i != $REPLICAS ]]; then
CONTROLLER_QUORUM_VOTERS="$CONTROLLER_QUORUM_VOTERS$i@kafka-$i.$SERVICE.$NAMESPACE.svc.cluster.local:9093,"
else
CONTROLLER_QUORUM_VOTERS=${CONTROLLER_QUORUM_VOTERS::-1}
fi
done

mkdir -p $SHARE_DIR/$NODE_ID

CLUSTER_ID="q9D3LywhQWyj-d6AqBASmA"

if [[ ! -f "$SHARE_DIR/cluster_id" && "$NODE_ID" = "0" ]]; then
#CLUSTER_ID=$(kafka-storage.sh random-uuid)
echo $CLUSTER_ID > $SHARE_DIR/cluster_id
else
#CLUSTER_ID=$(cat $SHARE_DIR/cluster_id)
echo $CLUSTER_ID > $SHARE_DIR/cluster_id
fi

sed -e "s+^node.id=.+node.id=$NODE_ID+"
-e "s+^controller.quorum.voters=.
+controller.quorum.voters=$CONTROLLER_QUORUM_VOTERS+"
-e "s+^listeners=.+listeners=$LISTENERS+"
-e "s+^advertised.listeners=.
+advertised.listeners=$ADVERTISED_LISTENERS+"
-e "s+^log.dirs=.*+log.dirs=$SHARE_DIR/$NODE_ID+"
/opt/kafka/config/kraft/server.properties > server.properties.updated
&& mv server.properties.updated /opt/kafka/config/kraft/server.properties

kafka-storage.sh format -t $CLUSTER_ID -c /opt/kafka/config/kraft/server.properties

exec kafka-server-start.sh /opt/kafka/config/kraft/server.properties

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants