Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addon: expose /metrics endpoints for Prometheus #49

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
41cdfd6
Exposes /metrics endpoints for Prometheus scraping
solsson Jul 28, 2017
ffb89dd
Adds pod that can be used to estimate resource limits
solsson Jul 29, 2017
51bbedb
CPU limit on metrics export won't actually save any cycles
solsson Jul 29, 2017
8ec2045
The test that caught the performance problem
solsson Jul 29, 2017
11feb28
Demonstrates OOMKilled with current resource limits
solsson Jul 29, 2017
02c4b3e
Same base image as kafka and latest exporter source
solsson Jul 29, 2017
a563220
zoo is fast now, <0.02s compared to >1s for the others
solsson Jul 29, 2017
e8a61e7
The 1s response time from kafka might be due to ...
solsson Jul 29, 2017
6d4ffc3
interesting
solsson Jul 29, 2017
72cfc77
Adapts to Java 8+, but still guessing the numbers
solsson Jul 31, 2017
2b822b1
Don't touch Xss as it has <1MB defaults according to docs
solsson Jul 31, 2017
6d3b935
Let's focus on the two numbers that seem to matter
solsson Jul 31, 2017
a4b51f4
Endpont works again, with similar scrape times as Xmx=80m without Met…
solsson Jul 31, 2017
8dbd7a1
Reverse order of containers to benefit from "Defaulting container nam…
solsson Aug 1, 2017
d7d5044
Uses prometheus/jmx_exporter parent-0.10 tag
solsson Aug 5, 2017
b6c85eb
Had 10 OOMKilled/hour with 100Mi so let's increase request,
solsson Aug 7, 2017
db52a3c
Uses JMX config from config map, so we can experiment
solsson Aug 8, 2017
37e58e9
Scrape less, and improve scrape time further ...
solsson Oct 6, 2017
d4b95d2
For performance, again thanks to @yacut #49
solsson Oct 22, 2017
0994950
Gets you JVM metrics from zoo, lots and lots of it
solsson Aug 8, 2017
e35d077
Still not getting anything zookeeper-specific
solsson Aug 8, 2017
42d1b1a
Adds directives from kafka's rules, now for pzoo too.
solsson Aug 8, 2017
253633f
Zookeeper metrics conf contributed by @yacut #61
solsson Nov 3, 2017
b0e6145
Merge pull request #61 from Yolean/metrics-jmx-zookeeper
solsson Nov 3, 2017
dc1c1da
Upgrade to jmx-exporter 0.1.0
solsson Nov 3, 2017
4c35576
Upgrade test also to jmx-exporter 0.1.0
solsson Nov 3, 2017
6a26cf3
Adapts test instructions to debian based jre image
solsson Nov 3, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions 10broker-config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -256,3 +256,22 @@ data:
# Change to DEBUG to enable audit log for the authorizer
log4j.logger.kafka.authorizer.logger=WARN, authorizerAppender
log4j.additivity.kafka.authorizer.logger=false

jmx-kafka-prometheus.yml: |+
lowercaseOutputName: true
jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:5555/jmxrmi
ssl: false
whitelistObjectNames: ["kafka.server:*","java.lang:*"]
rules:
- pattern : kafka.server<type=ReplicaFetcherManager, name=MaxLag, clientId=(.+)><>Value
- pattern : kafka.server<type=BrokerTopicMetrics, name=(BytesInPerSec|BytesOutPerSec|MessagesInPerSec), topic=(.+)><>OneMinuteRate
- pattern : kafka.server<type=KafkaRequestHandlerPool, name=RequestHandlerAvgIdlePercent><>OneMinuteRate
- pattern : kafka.server<type=Produce><>queue-size
- pattern : kafka.server<type=ReplicaManager, name=(PartitionCount|UnderReplicatedPartitions)><>(Value|OneMinuteRate)
- pattern : kafka.server<type=controller-channel-metrics, broker-id=(.+)><>(.*)
- pattern : kafka.server<type=socket-server-metrics, networkProcessor=(.+)><>(.*)
- pattern : kafka.server<type=Fetch><>queue-size
- pattern : kafka.server<type=SessionExpireListener, name=(.+)><>OneMinuteRate
- pattern : java.lang<type=OperatingSystem><>SystemCpuLoad
- pattern : java.lang<type=Memory><HeapMemoryUsage>used
- pattern : java.lang<type=OperatingSystem><>FreePhysicalMemorySize
25 changes: 25 additions & 0 deletions 50kafka.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ spec:
labels:
app: kafka
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "5556"
spec:
terminationGracePeriodSeconds: 30
initContainers:
Expand All @@ -31,6 +33,8 @@ spec:
env:
- name: KAFKA_LOG4J_OPTS
value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
- name: JMX_PORT
value: "5555"
ports:
- containerPort: 9092
command:
Expand Down Expand Up @@ -59,6 +63,27 @@ spec:
mountPath: /etc/kafka
- name: data
mountPath: /var/lib/kafka/data
- name: metrics
image: solsson/kafka-prometheus-jmx-exporter@sha256:40a6ab24ccac0ed5acb8c02dccfbb1f5924fd97f46c0450e0245686c24138b53
command:
- java
- -Xmx64M
- -XX:MaxMetaspaceSize=32m
- -jar
- jmx_prometheus_httpserver.jar
- "5556"
- /etc/kafka/jmx-kafka-prometheus.yml
ports:
- containerPort: 5556
resources:
requests:
cpu: 0m
memory: 100Mi
limits:
memory: 150Mi
volumeMounts:
- name: config
mountPath: /etc/kafka
volumes:
- name: config
configMap:
Expand Down
44 changes: 44 additions & 0 deletions test/jmx-selftest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Sets up a pod that monitors itself, to test resource usage etc.
# testpod=$(kubectl -n test-kafka get pods -l test-target=jmx-exporter -o=jsonpath={.items[*].metadata.name})
# kubectl exec -n test-kafka $testpod -- apt-get update
# kubectl exec -n test-kafka $testpod -- apt-get install -y --no-install-recommends curl
# kubectl exec -n test-kafka $testpod -- curl http://localhost:5556/metrics
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jmx-selftest
namespace: test-kafka
spec:
replicas: 1
template:
metadata:
labels:
test-target: jmx-exporter
test-type: readiness
# Uncomment to test with Prometheus
#annotations:
# prometheus.io/scrape: "true"
# prometheus.io/port: "5556"
spec:
containers:
- name: monitor
image: solsson/kafka-prometheus-jmx-exporter@sha256:40a6ab24ccac0ed5acb8c02dccfbb1f5924fd97f46c0450e0245686c24138b53
command:
- java
- -Dcom.sun.management.jmxremote.ssl=false
- -Dcom.sun.management.jmxremote.authenticate=false
- -Dcom.sun.management.jmxremote.port=5555
- -jar
- jmx_prometheus_httpserver.jar
- "5556"
- example_configs/httpserver_sample_config.yml
ports:
- name: jmx
containerPort: 5555
- name: slashmetrics
containerPort: 5556
# Test run, again and again
readinessProbe:
httpGet:
path: /metrics
port: 5556
107 changes: 107 additions & 0 deletions test/metrics.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# kubectl apply -f test/metrics.yml && kubectl scale --replicas=0 deploy/metrics && kubectl scale --replicas=1 deploy/metrics
# kubectl exec metrics-... -- tail -f /tmp/loglast | egrep 'time_total|^jmx_scrape_duration_seconds|^java_lang_memory_heapmemoryusage_used|^java_lang_memory_nonheapmemoryusage_used'
---
kind: ConfigMap
metadata:
name: metrics
namespace: test-kafka
apiVersion: v1
data:

curl-format.txt: |-
\n
# ------ curl stats ------\n
time_namelookup %{time_namelookup}\n
time_connect %{time_connect}\n
time_appconnect %{time_appconnect}\n
time_pretransfer %{time_pretransfer}\n
time_redirect %{time_redirect}\n
time_starttransfer %{time_starttransfer}\n
\n
time_total{url="%{url_effective}"} %{time_total}\n
\n
http_code{url="%{url_effective}"} %{http_code}\n
size_download{url="%{url_effective}"} %{size_download}\n
content_type %{content_type}\n
# ----- curl complete -----\n
\n

setup.sh: |-
touch /tmp/testlog
tail -f /tmp/testlog

continue.sh: |-
exit 0

run.sh: |-
exec >> /tmp/testlog
exec 2>&1

date -u -Iseconds | tee /tmp/loglast

curl -w "@/test/curl-format.txt" -s --max-time $MAX_RESPONSE_TIME \
http://kafka-0.broker.kafka.svc.cluster.local:5556/metrics \
| tee -a /tmp/loglast \
| grep http_code \
| grep 200

curl -w "@/test/curl-format.txt" -s --max-time $MAX_RESPONSE_TIME \
http://zoo-0.zoo.kafka.svc.cluster.local:5556/metrics \
| tee -a /tmp/loglast \
| grep http_code \
| grep 200

curl -w "@/test/curl-format.txt" -s --max-time $MAX_RESPONSE_TIME \
http://pzoo-0.pzoo.kafka.svc.cluster.local:5556/metrics \
| tee -a /tmp/loglast \
| grep http_code \
| grep 200

exit 0

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: metrics
namespace: test-kafka
spec:
replicas: 1
template:
metadata:
labels:
test-target: kafka
test-type: readiness
spec:
containers:
- name: testcase
image: solsson/curl@sha256:8c0c5d669b3dd67932da934024252af59fb9d0fa0e5118b5a737b35c5e1487bf
env:
- name: MAX_RESPONSE_TIME
value: "3"
# Test set up
command:
- /bin/bash
- -e
- /test/setup.sh
# Test run, again and again
readinessProbe:
exec:
command:
- /bin/bash
- -e
- /test/run.sh
# Test quit on nonzero exit
livenessProbe:
exec:
command:
- /bin/bash
- -e
- /test/continue.sh
volumeMounts:
- name: config
mountPath: /test
volumes:
- name: config
configMap:
name: metrics
26 changes: 26 additions & 0 deletions zookeeper/10zookeeper-config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,29 @@ data:
# Suppress connection log messages, three lines per livenessProbe execution
log4j.logger.org.apache.zookeeper.server.NIOServerCnxnFactory=WARN
log4j.logger.org.apache.zookeeper.server.NIOServerCnxn=WARN

jmx-zookeeper-prometheus.yaml: |+
lowercaseOutputName: true
jmxUrl: service:jmx:rmi:///jndi/rmi://localhost:5555/jmxrmi
ssl: false
whitelistObjectNames: ["org.apache.ZooKeeperService:*","java.lang:*"]
rules:
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d)><>(\\w+)"
name: "zookeeper_$2"
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d), name1=replica.(\\d)><>(\\w+)"
name: "zookeeper_$3"
labels:
replicaId: "$2"
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d), name1=replica.(\\d), name2=(\\w+)><>(\\w+)"
name: "zookeeper_$4"
labels:
replicaId: "$2"
memberType: "$3"
- pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d), name1=replica.(\\d), name2=(\\w+), name3=(\\w+)><>(\\w+)"
name: "zookeeper_$4_$5"
labels:
replicaId: "$2"
memberType: "$3"
- pattern : java.lang<type=OperatingSystem><>SystemCpuLoad
- pattern : java.lang<type=Memory><HeapMemoryUsage>used
- pattern : java.lang<type=OperatingSystem><>FreePhysicalMemorySize
25 changes: 25 additions & 0 deletions zookeeper/50pzoo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ spec:
app: zookeeper
storage: persistent
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "5556"
spec:
terminationGracePeriodSeconds: 10
initContainers:
Expand All @@ -29,6 +31,8 @@ spec:
env:
- name: KAFKA_LOG4J_OPTS
value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
- name: JMX_PORT
value: "5555"
command:
- ./bin/zookeeper-server-start.sh
- /etc/kafka/zookeeper.properties
Expand All @@ -54,6 +58,27 @@ spec:
mountPath: /etc/kafka
- name: data
mountPath: /var/lib/zookeeper/data
- name: metrics
image: solsson/kafka-prometheus-jmx-exporter@sha256:40a6ab24ccac0ed5acb8c02dccfbb1f5924fd97f46c0450e0245686c24138b53
command:
- java
- -Xmx64M
- -XX:MaxMetaspaceSize=32m
- -jar
- jmx_prometheus_httpserver.jar
- "5556"
- /etc/kafka/jmx-zookeeper-prometheus.yaml
ports:
- containerPort: 5556
resources:
requests:
cpu: 0m
memory: 100Mi
limits:
memory: 150Mi
volumeMounts:
- name: config
mountPath: /etc/kafka
volumes:
- name: config
configMap:
Expand Down
25 changes: 25 additions & 0 deletions zookeeper/51zoo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ spec:
app: zookeeper
storage: ephemeral
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "5556"
spec:
terminationGracePeriodSeconds: 10
initContainers:
Expand All @@ -32,6 +34,8 @@ spec:
env:
- name: KAFKA_LOG4J_OPTS
value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
- name: JMX_PORT
value: "5555"
command:
- ./bin/zookeeper-server-start.sh
- /etc/kafka/zookeeper.properties
Expand All @@ -57,6 +61,27 @@ spec:
mountPath: /etc/kafka
- name: data
mountPath: /var/lib/zookeeper/data
- name: metrics
image: solsson/kafka-prometheus-jmx-exporter@sha256:40a6ab24ccac0ed5acb8c02dccfbb1f5924fd97f46c0450e0245686c24138b53
command:
- java
- -Xmx64M
- -XX:MaxMetaspaceSize=32m
- -jar
- jmx_prometheus_httpserver.jar
- "5556"
- /etc/kafka/jmx-zookeeper-prometheus.yaml
ports:
- containerPort: 5556
resources:
requests:
cpu: 0m
memory: 100Mi
limits:
memory: 150Mi
volumeMounts:
- name: config
mountPath: /etc/kafka
volumes:
- name: config
configMap:
Expand Down