Skip to content

Commit

Permalink
feat(proxy-wasm) metrics
Browse files Browse the repository at this point in the history
This commit adds support for metrics in the form of counters, gauges and
histograms.

This commit adds support for storing metrics in WasmX shared-memory
key-value store facility.

The workflow users are expected to perform follows from Proxy-Wasm metrics ABI
itself: users define metrics before using them; when a metric is defined a
numeric ID is returned which can be used later for reading or updating a metric.
If a metric is defined and the system is out of metrics memory, then the
metric definition fails as eviction support hasn't been implemented.

The implemented design, described at [1], allows users to perform most metric
updates without synchronizing Nginx workers, i.e. without the aid of locks.

Users can refer to [2] for a description of how metrics are represented
in memory and how to estimate the size of the shared-memory used for
metrics storage.

Two configuration directives, `slab_size` and `max_metric_name_length`,
are added to configure the size of the shared-memory zone dedicated to
metrics and the maximum length of a metric name, respectively.

[1] docs/adr/005-metrics.md
[2] docs/METRICS.md
  • Loading branch information
casimiro committed May 22, 2024
1 parent 7543fe8 commit 5c8ebf9
Show file tree
Hide file tree
Showing 40 changed files with 3,439 additions and 78 deletions.
9 changes: 7 additions & 2 deletions config
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ NGX_WASMX_INCS="\
$ngx_addon_dir/src/common \
$ngx_addon_dir/src/common/proxy_wasm \
$ngx_addon_dir/src/common/shm \
$ngx_addon_dir/src/common/metrics \
$ngx_addon_dir/src/common/lua"

NGX_WASMX_DEPS="\
Expand All @@ -141,7 +142,9 @@ NGX_WASMX_DEPS="\
$ngx_addon_dir/src/common/proxy_wasm/ngx_proxy_wasm_properties.h \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm.h \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_kv.h \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_queue.h"
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_queue.h \
$ngx_addon_dir/src/common/metrics/ngx_wa_histogram.h \
$ngx_addon_dir/src/common/metrics/ngx_wa_metrics.h"

NGX_WASMX_SRCS="\
$ngx_addon_dir/src/ngx_wasmx.c \
Expand All @@ -155,7 +158,9 @@ NGX_WASMX_SRCS="\
$ngx_addon_dir/src/common/proxy_wasm/ngx_proxy_wasm_util.c \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm.c \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_kv.c \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_queue.c"
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_queue.c \
$ngx_addon_dir/src/common/metrics/ngx_wa_histogram.c \
$ngx_addon_dir/src/common/metrics/ngx_wa_metrics.c"

# wasm

Expand Down
53 changes: 52 additions & 1 deletion docs/DIRECTIVES.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ By alphabetical order:
- [cache_config](#cache-config)
- [compiler](#compiler)
- [flag](#flag)
- [max_metric_name_length](#max_metric_name_length)
- [module](#module)
- [proxy_wasm](#proxy_wasm)
- [proxy_wasm_isolation](#proxy_wasm_isolation)
Expand All @@ -16,6 +17,7 @@ By alphabetical order:
- [resolver_timeout](#resolver_timeout)
- [shm_kv](#shm_kv)
- [shm_queue](#shm_queue)
- [slab_size](#slab_size)
- [socket_buffer_size](#socket_buffer_size)
- [socket_buffer_reuse](#socket_buffer_reuse)
- [socket_connect_timeout](#socket_connect_timeout)
Expand Down Expand Up @@ -57,6 +59,9 @@ By context:
- [tls_trusted_certificate](#tls_trusted_certificate)
- [tls_verify_cert](#tls_verify_cert)
- [tls_verify_host](#tls_verify_host)
- `metrics{}`
- [max_metric_name_length](#max_metric_name_length)
- [slab_size](#slab_size)
- `wasmtime{}`
- [cache_config](#cache-config)
- [flag](#flag)
Expand Down Expand Up @@ -205,6 +210,24 @@ wasm {

[Back to TOC](#directives)

max_metric_name_length
---------

**usage** | `max_metric_name_length <length>;`
------------:|:----------------------------------------------------------------
**contexts** | `metrics{}`
**default** | `256`
**example** | `max_metric_name_length 512;`

Set the maximum allowed length of a metric name.

> Notes
See [Metrics] for a complete description of how metrics are represented in
memory.

[Back to TOC](#directives)

module
------

Expand Down Expand Up @@ -525,6 +548,33 @@ policy, and writes will fail when the allocated memory slab is full.

[Back to TOC](#directives)

slab_size
---------

**usage** | `slab_size <size>;`
------------:|:----------------------------------------------------------------
**contexts** | `metrics{}`
**default** | `5m`
**example** | `slab_size 12m;`

Set the `size` of the shared memory slab dedicated to metrics storage. The value
must be at least 3 * pagesize, e.g. `15k` on Linux.

> Notes
The space in memory occupied by a metric depends on its name length, type and
the number of worker processes running. As an example, if all metric names are
64 chars long and 4 workers are running, `5m` can accommodate 20k counters, 20k
gauges, or up to 16k histograms.

See the [max_metric_name_length](#max_metric_name_length) directive to configure
the max name length in chars for metrics.

See [Metrics] for a complete description of how metrics are represented in
memory.

[Back to TOC](#directives)

socket_buffer_reuse
-------------------

Expand Down Expand Up @@ -939,7 +989,8 @@ the `http{}` contexts.

[Contexts]: USER.md#contexts
[Execution Chain]: USER.md#execution-chain
[SLRU eviction algorithm]: SLRU.md
[Metrics]: METRICS.md
[OpenResty]: https://openresty.org/en/
[resolver]: https://nginx.org/en/docs/http/ngx_http_core_module.html#resolver
[resolver_timeout]: https://nginx.org/en/docs/http/ngx_http_core_module.html#resolver_timeout
[SLRU eviction algorithm]: SLRU.md
97 changes: 97 additions & 0 deletions docs/METRICS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Metrics

## Introduction

In the context of ngx_wasm_module, in accordance with Proxy-Wasm, a metric is
either a counter, a gauge or a histogram.

A counter is an unsigned 64-bit int that can only be incremented.
A gauge is an unsigned 64-bit int that can take arbitrary values.

## Histograms

A histogram represents ranges frequencies of a variable and can be defined as a
set of pairs of range and counter. For example, the distribution of response
time of HTTP requests, can be represented as a histogram with ranges `[0, 1]`,
`(1, 2]`, `(2, 4]` and `(4, Inf]`. The 1st range's counter, would be the number
of requests with response time less or equal to 1ms; the 2nd range's counter,
requests with response time between 1ms and 2ms; the 3rd range's counter,
requests with response time between 2ms and 4ms; and the last range's counter,
requests with response time bigger than 4ms.

### Binning

The above example demonstrates a histogram with ranges, or bins, whose upper
bound grows in powers of 2, i.e. 2^0, 2^1 and 2^2. This is usually called
logarithmic binning and is indeed how histograms bins are represented in the
ngx_wasm_module. This binning strategy implicates that when a value `v` is
recorded, it is matched with the smallest power of two that's bigger than `v`;
this value is the upper bound of the bin associated with `v`; if the histogram
contain, or can contain, such bin, its counter is incremented; if not, the bin
with the next smallest upper bound bigger than `v` has its counter incremented.

### Update and expansion

Histograms are created with 5 bins, 1 initialized and 4 uninitialized. If a
value `v` is recorded and its bin isn't part of the initialized bins, one of the
uninitialized bins is initialized with the upper bound associated with `v` and
its counter is incremented. If the histogram is out of uninitialized bins, it
can be expanded, up to 18 bins, to accommodate the additional bin for `v`. The
bin initialized upon histogram creation has upper bound 2^32 and its counter is
incremented if it's the only bin whose upper bound is bigger than the recorded
value.

## Memory consumption

The space in memory occupied by a metric contains its name, value and the
underlying structure representing them in the key-value store. While the
key-value structure has a fixed size of 96 bytes, the sizes of name and value
vary.

The size in memory of the value of a counter or gauge is 8 bytes plus 16 bytes
per worker process. The value size grows according to the number of workers
because metric value is segmented across them. Each worker has its own segment
of the value to write updates to. When a metric is retrieved, the segments are
consolidated and returned as a single metric. This storage strategy allows
metric updates to be performed without the aid of locks at the cost of 16 bytes
per worker.

Histograms' values also have a baseline size of 8 bytes plus 16 bytes per
worker. However, histograms need extra space per worker for bins storage. Bins
storage costs 4 bytes plus 8 bytes per bin. So a 5-bin histogram takes 8 bytes
plus (16 + 4 + 5*8), 60 bytes per worker.

As such, in a 4-workers setup, a counter or gauge whose name is 64 chars long
takes 168 bytes, a 5-bin histogram with the same name takes 408 bytes and a
18-bin histogram with the same name takes 824 bytes.

### Shared memory allocation

Nginx employs an allocation model for shared memory that enforces allocation
size to be a power of 2 and greater than 8; nonconforming values are rounded up,
see [Nginx shared memory].

This means that an allocation of 168 bytes, for instance, ends up taking 256
bytes from the shared memory. This should be taken into account when estimating
the space required for a group of metrics.

### Prefixing

The name of a metric is always prefixed with `pw.{filter_name}.` to avoid naming
conflicts between Proxy-Wasm filters. This means that a metric named `a_counter`
by the filter `a_filter` ends up named as `pw.a_filter.a_counter`.
The maximum length of a metric name, configured via `max_metric_name_length`,
is enforced on the prefixed name and might need to be increased in some cases.

## Nginx Reconfiguration

If Nginx is reconfigured with a different number of workers or a different size
for the metrics shared memory zone, existing metrics need to be reallocated into
a brand new shared memory zone. This is due to the metric values being segmented
across workers.

As such, it's important to ensure a new size of the metrics' shared memory zone
is enough to accommodate existing metrics and that the value of
`max_metric_name_len` isn't less than any existing metric name.

[Nginx shared memory]: https://nginx.org/en/docs/dev/development_guide.html#shared_memory
8 changes: 4 additions & 4 deletions docs/PROXY_WASM.md
Original file line number Diff line number Diff line change
Expand Up @@ -536,10 +536,10 @@ SDK ABI `0.2.1`) and their present status in ngx_wasm_module:
`proxy_enqueue_shared_queue` | :heavy_check_mark: | No automatic eviction mechanism if the queue is full.
`proxy_resolve_shared_queue` | :x: |
*Stats/metrics* | |
`proxy_define_metric` | :x: |
`proxy_get_metric` | :x: |
`proxy_record_metric` | :x: |
`proxy_increment_metric` | :x: |
`proxy_define_metric` | :heavy_check_mark: |
`proxy_get_metric` | :heavy_check_mark: |
`proxy_record_metric` | :heavy_check_mark: |
`proxy_increment_metric` | :heavy_check_mark: |
*Custom extension points* | |
`proxy_call_foreign_function` | :x: |

Expand Down
Loading

0 comments on commit 5c8ebf9

Please sign in to comment.