-
Notifications
You must be signed in to change notification settings - Fork 23
[Web service] prometheus metrics exporter
Hosted by laitos web server, the endpoint serves metrics information collected from the following sources in the prometheus-exporter format:
- All web service handlers: time to first byte, processing duration, size of response.
- Program resource usage: CPU time consumed, number of context switches, time spent on run queue and wait queue.
- All web proxy requests: time to first byte, connection duration, size of response.
Under the JSON key HTTPHandlers
, add a string property called PrometheusMetricsEndpoint
, value being the URL location of the service.
Keep the location a secret to yourself and make it difficult to guess. Here is an example:
{ ... "HTTPHandlers": { ... "PrometheusMetricsEndpoint": "/my-precious-metrics", ... }, ... }
Modify the laitos program launch command by adding the parameter -prominteg
to it. The parameter works as the master switch to turn on
all points of integration with prometheus:
sudo ./laitos -prominteg -config <CONFIG FILE> -daemons ...,httpd,...
The service is hosted by web server, therefore remember to run web server.
Install prometheus and edit its configuration file (often located at
/etc/prometheus/prometheus.yml
), tell prometheus to periodically download the exporter data from this endpoint:
... scrape_configs: - job_name: 'laitos' scrape_interval: 20s scrape_timeout: 5s scheme: https # or https metrics_path: '/my-precious-metrics' static_configs: - targets: ['laitos-server.example.com:443', 'another-laitos-server.example.com:80']
Visit prometheus web UI (or Grafana dashboard if they are integrated), and try out the following equations for plotting program resource usage:
- Percentage of involuntary context switches, 3-minutes running average:
(sum(rate(laitos_proc_num_involuntary_switches[3m])) by (instance) / (sum(rate(laitos_proc_num_involuntary_switches[3m])) by (instance) + sum(rate(laitos_proc_num_voluntary_switches[3m])) by (instance))) * 100
- Seconds of CPU time spent by laitos server (including children) in user and kernel mode, 3-minutes running average:
sum(rate(laitos_proc_num_kernel_mode_sec_incl_children[3m]) + rate(laitos_proc_num_user_mode_sec_incl_children[3m])) by (instance)
- Percentage of time spent as runnable according to OS scheduler (higher is better), 3-minutes running average:
(sum(rate(laitos_proc_num_run_sec[3m])) by (instance) / (sum(rate(laitos_proc_num_run_sec[3m])) by (instance) + sum(rate(laitos_proc_num_wait_sec[3m])) by (instance))) * 100
And try out these for plotting web server stats:
- Time-to-first-byte across all handlers at 95% quantile, 3-minutes running average:
histogram_quantile(0.95, sum(rate(laitos_httpd_response_time_to_first_byte_seconds_bucket[3m])) by (le, instance))
- Processing duration (including IO) across all handlers at 95% quantile, 3-minutes running average:
histogram_quantile(0.95, sum(rate(laitos_httpd_handler_duration_seconds_bucket[3m])) by (le, instance))
- Size of HTTP response across all handlers at 95% quantile, 3-minutes running average:
histogram_quantile(0.95, sum(rate(laitos_httpd_response_size_bytes_bucket[3m])) by (le, instance))
And try out these for plotting web proxy stats:
- Number of proxy requests per minute, 1-minute running average:
sum(rate(laitos_httpproxy_response_size_bytes_count[1m])) by (instance)
- Bytes transferred to proxy clients per minute, 1-minute running average:
sum(rate(laitos_httpproxy_response_size_bytes_sum[1m])) by (instance)
- Top 10 proxy destinations by data transfer (total MBs over 3hrs):
topk(10, sum by (host) (rate(laitos_httpproxy_response_size_bytes_sum[180m]))) * 180 * 60 / 1048576
- Top 10 proxy destinations by num of connections (total over 3 hours):
topk(10, sum by (host) (rate(laitos_httpproxy_response_size_bytes_count[180m]))) * 180 * 60
- Top 10 proxy destinations by connection duration (total seconds over 3 hours):
topk(10, sum by (host) (rate(laitos_httpproxy_handler_duration_seconds_sum[180m]))) * 180 * 60
- Size of proxy response across all destinations at 90% quantile, 3-minutes running average:
histogram_quantile(0.90, sum(rate(laitos_httpproxy_response_size_bytes_bucket[3m])) by (le, instance))
- Time-to-first-byte across all proxy destinations at 50% quantile, 3-minutes running average:
histogram_quantile(0.50, sum(rate(laitos_httpproxy_response_time_to_first_byte_seconds_bucket[3m])) by (le, instance))
- Processing duration (including IO) across all proxy destinations at 50% quantile, 3-minutes running average:
histogram_quantile(0.50, sum(rate(laitos_httpproxy_handler_duration_seconds_bucket[3m])) by (le, instance))
Table of Contents
- Home
- Get started
- Component list
- Tips for running on public cloud
- Tips for using apps over satellite
- laitos terminal
Daemon Components
- DNS server
- Mail server
- Web server
- Web proxy server
- Telnet server
- Telegram chat-bot
- Simple IP services server
- SNMP server
- System maintenance
- Phone home telemetry
Web Service Components
- Twilio telephone/SMS hook
- Microsoft chat bot hook
- The Things Network LORA tracker integration
- Recurring commands
- App command form
- Simple app command execution API
- GitLab browser
- Temporary file storage
- Simple web proxy
- Desktop on a page (virtual machine)
- Read telemetry records
- Program health report
- System process explorer
- Prometheus metrics exporter
- HTTP request inspector
- HTTP request logger
Apps