This demo presents 3 ways of collecting metrics from distributed systems based on microservices:
- Prometheus
- OpenTelemetry metrics
- OpenTelemetry metrics generated from spans (using
spanmetrics
processor)
It also includes OpenTelemetry tracing demonstration (see Tools
section).
There are 2 variants of this demo:
- Basic – in which requests are sent directly to the microservices.
- Extended – utilizing HAProxy, which works as a loadbalancer for microservices and therefore allows horizontal scaling of the microservices.
A simple microservice has been created for demonstration purposes. It has two functions: (1) calling other microservices
and
(2) simulating internal processing (using a dummy loop). A list of microservices that should be called by given
microservice can be set in its SERVICES_TO_CALL
environmental variable (see docker-compose.yaml
).
The demo microservice provides the following HTTP API endpoints:
-
/api/action
– action endpoint that calls services listed inSERVICES_TO_CALL
variable and runs a dummy loop that simulates internal processing. In particular, whenSERVICES_TO_CALL
is empty, only internal processing is executed by given microservice.This endpoint returns
OK
after receiving a response from all called microservices orERROR
if at least one microservice does not return success or at least one request times out. -
/api/health
– health check endpoint that always returnsOK
.
Let's assume we have prepared the following configuration:
- 3 microservices: app1 (available publicly at port
8081
), app2 and app3. - Variable
SERVICES_TO_CALL
for app1 is set to:app2:8080, app3:8080
. - Value of
SERVICES_TO_CALL
for app2 and app3 is empty.
After calling the action endpoint for app1 (http://localhost:8081/api/action), app1 will call internally both app2 and app3. As a result app2 and app3 will execute their 'internal processing' (dummy loop) and respond to app1, app1 will also execute its 'internal processing', and finally return the response.
It is also possible to change the complexity of dummy loops in each microservice individually by supplying a complexity parameter for each of them in the following way: http://localhost:8081/api/action?config=app1:100000,app2:100000,app3:100000
Please note that other scenarios can be easily prepared by creating more microservices and adjusting their SERVICES_TO_CALL
variables.
APP_NAME
– name of the app (e.g.app1
)SERVICES_TO_CALL
– list of services (address and port) that should be called by given service. Be careful, avoid loops! (e.g.app2:8080,app3:8080
)OTEL_AGENT
– address and port of OpenTelemetry agent (e.g.otel-agent:4317
)
This demo consists of the following frontend tools:
- Jaeger (distributed tracing) – http://localhost:16686
- Prometheus (monitoring) – http://localhost:9090
- Grafana (data visualization) – http://localhost:3000
- Locust (load testing tool) – http://localhost:8089
- HAProxy stats (status of the internal loadbalancer) – http://localhost:1936
Additional resources:
- OpenTelemetry metrics in Prometheus format – http://localhost:8889/metrics
Basic variant:
Start
cd deployment
docker-compose -f docker-compose_basic.yaml up -d
Rebuild the images
docker-compose -f docker-compose_basic.yaml up -d --build
Extended variant (with HAProxy loadbalancer):
Start
cd deployment
docker-compose -f docker-compose.yaml up -d
Rebuild the images
docker-compose -f docker-compose.yaml up -d --build
The following metrics are available in Prometheus for each microservice:
- Native Prometheus metrics
<APP_NAME>_operation_latency_bucket{status="OK|ERROR", type="internal-only|total"}
Labels:type=total
– whole request processing time (internal processing + external service calls)type=internal-only
– internal processing time (only dummy loop)
- OpenTelemetry metrics (metrics collected with OpenTelemetry and exported for Prometheus)
otel_<APP_NAME>_operation_latency_bucket{status="OK|ERROR", type="internal-only|total"}
Labels:type=total
– whole request processing time (internal processing + external service calls)type=internal-only
– internal processing time (only dummy loop)
- OpenTelemetry metrics generated from spans (using
spanmetrics
processor, exported for Prometheus). See Jaeger for more information about labels.
otel_latency_bucket{service_name="<APP_NAME>", operation="internal-processing|/api/action|GET, status_code=STATUS_CODE_OK|STATUS_CODE_ERROR"}
<APP_NAME>
– name of the application (APP_NAME
variable), e.g. app1
Click to open a Prometheus example for app1
- DNS discovery may cause delays for Prometheus metric updates and HAProxy nodes list.
- Metrics buckets should be adjusted (see
bucketsConfig
inapp/monitoring/common.go
andlatency_histogram_buckets
inotel-collector-config.yaml
). - In current configuration OpenTelemetry will trace all requests.