Skip to content

Commit

Permalink
feat(proxy-wasm) implement initial metrics facilities
Browse files Browse the repository at this point in the history
This commit adds support for storing metrics in WasmX shared memory
key-value store facility.

The workflow users are expected to perform follows the Proxy-Wasm
metrics ABI itself: users define metrics before using them. When a
metric is defined, a numeric ID is returned which can later be used for
reading or updating its respective metric. If the system is out of
metrics memory when defining a new metric, the metric definition fails
as eviction support has not been implemented.

The implemented design, described in [1], allows users to perform most
metric updates without synchronizing Nginx workers, i.e. without the
cost of shared memory locks.

Users can refer to [2] for a description of how metrics are represented
in memory and how to estimate the size of the shared memory used for
metrics storage.

Two configuration directives, `slab_size` and `max_metric_name_length`,
are added to configure the size of the shared memory zone dedicated to
metrics and the maximum length of a metric name, respectively.

[1] docs/adr/005-metrics.md
[2] docs/METRICS.md

Signed-off-by: Thibault Charbonnier <[email protected]>
  • Loading branch information
casimiro authored and thibaultcha committed Jun 19, 2024
1 parent 7cc7af8 commit e62940e
Show file tree
Hide file tree
Showing 46 changed files with 3,937 additions and 91 deletions.
9 changes: 7 additions & 2 deletions config
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ NGX_WASMX_INCS="\
$ngx_addon_dir/src/common \
$ngx_addon_dir/src/common/proxy_wasm \
$ngx_addon_dir/src/common/shm \
$ngx_addon_dir/src/common/metrics \
$ngx_addon_dir/src/common/lua"

NGX_WASMX_DEPS="\
Expand All @@ -141,7 +142,9 @@ NGX_WASMX_DEPS="\
$ngx_addon_dir/src/common/proxy_wasm/ngx_proxy_wasm_properties.h \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm.h \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_kv.h \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_queue.h"
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_queue.h \
$ngx_addon_dir/src/common/metrics/ngx_wa_metrics.h \
$ngx_addon_dir/src/common/metrics/ngx_wa_histogram.h"

NGX_WASMX_SRCS="\
$ngx_addon_dir/src/ngx_wasmx.c \
Expand All @@ -155,7 +158,9 @@ NGX_WASMX_SRCS="\
$ngx_addon_dir/src/common/proxy_wasm/ngx_proxy_wasm_util.c \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm.c \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_kv.c \
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_queue.c"
$ngx_addon_dir/src/common/shm/ngx_wasm_shm_queue.c \
$ngx_addon_dir/src/common/metrics/ngx_wa_metrics.c \
$ngx_addon_dir/src/common/metrics/ngx_wa_histogram.c"

# wasm

Expand Down
57 changes: 56 additions & 1 deletion docs/DIRECTIVES.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ By alphabetical order:
- [cache_config](#cache-config)
- [compiler](#compiler)
- [flag](#flag)
- [max_metric_name_length](#max_metric_name_length)
- [module](#module)
- [proxy_wasm](#proxy_wasm)
- [proxy_wasm_isolation](#proxy_wasm_isolation)
Expand All @@ -16,6 +17,7 @@ By alphabetical order:
- [resolver_timeout](#resolver_timeout)
- [shm_kv](#shm_kv)
- [shm_queue](#shm_queue)
- [slab_size](#slab_size)
- [socket_buffer_size](#socket_buffer_size)
- [socket_buffer_reuse](#socket_buffer_reuse)
- [socket_connect_timeout](#socket_connect_timeout)
Expand Down Expand Up @@ -57,6 +59,9 @@ By context:
- [tls_trusted_certificate](#tls_trusted_certificate)
- [tls_verify_cert](#tls_verify_cert)
- [tls_verify_host](#tls_verify_host)
- `metrics{}`
- [max_metric_name_length](#max_metric_name_length)
- [slab_size](#slab_size)
- `wasmtime{}`
- [cache_config](#cache-config)
- [flag](#flag)
Expand Down Expand Up @@ -205,6 +210,29 @@ wasm {

[Back to TOC](#directives)

max_metric_name_length
----------------------

**usage** | `max_metric_name_length <length>;`
------------:|:----------------------------------------------------------------
**contexts** | `metrics{}`
**default** | `256`
**example** | `max_metric_name_length 512;`

Set the maximum allowed length of a metric name.

The configured value cannot be lower than `6` due to internal metrics storage in
memory.

> Notes
Configuring this value allows for predictable memory usage when configuring the
metrics [slab_size](#slab_size).

See [Metrics] for a complete description of how metrics are stored in memory.

[Back to TOC](#directives)

module
------

Expand Down Expand Up @@ -525,6 +553,32 @@ policy, and writes will fail when the allocated memory slab is full.

[Back to TOC](#directives)

slab_size
---------

**usage** | `slab_size <size>;`
------------:|:----------------------------------------------------------------
**contexts** | `metrics{}`
**default** | `5m`
**example** | `slab_size 12m;`

Set the `size` of the shared memory slab dedicated to metrics storage. The value
must be at least 3 * pagesize, e.g. `15k` on Linux.

> Notes
The space in memory occupied by a metric depends on its name length, type and
the number of worker processes running. As an example, if all metric names are
64 chars long and 4 workers are running, `5m` can accommodate 20k counters, 20k
gauges, or up to 16k histograms.

See the [max_metric_name_length](#max_metric_name_length) directive to configure
the maximum allowed length of metrics names.

See [Metrics] for a complete description of how metrics are stored in memory.

[Back to TOC](#directives)

socket_buffer_reuse
-------------------

Expand Down Expand Up @@ -939,7 +993,8 @@ the `http{}` contexts.

[Contexts]: USER.md#contexts
[Execution Chain]: USER.md#execution-chain
[SLRU eviction algorithm]: SLRU.md
[Metrics]: METRICS.md
[OpenResty]: https://openresty.org/en/
[resolver]: https://nginx.org/en/docs/http/ngx_http_core_module.html#resolver
[resolver_timeout]: https://nginx.org/en/docs/http/ngx_http_core_module.html#resolver_timeout
[SLRU eviction algorithm]: SLRU.md
139 changes: 139 additions & 0 deletions docs/METRICS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Metrics

This document elaborates on the types of metrics available in ngx_wasm_module,
how they are stored in memory, and how to estimate the amount of [slab_size]
memory necessary for your use-case.

## Table of Contents

- [Types of Metrics](#types-of-metrics)
- [Name Prefixing](#name-prefixing)
- [Histogram Binning Strategy](#histogram-binning-strategy)
- [Histogram Update and Expansion](#histogram-update-and-expansion)
- [Memory Consumption](#memory-consumption)
- [Shared Memory Allocation](#shared-memory-allocation)
- [Nginx Reconfiguration](#nginx-reconfiguration)

## Types of Metrics

In accordance with Proxy-Wasm specifications, a "metric" is either a counter, a
gauge, or a histogram.

- A counter is an unsigned 64-bit int that can only be incremented.
- A gauge is an unsigned 64-bit int that can take arbitrary values.
- A histogram represents range frequencies of a variable and can be defined as a
set of pairs of ranges and counters.
For example, the distribution of response time of HTTP requests can be
represented as a histogram with ranges `[0, 1]`, `(1, 2]`, `(2, 4]`, and `(4,
Inf]`. The 1st range counter would be the number of requests with response
time less or equal to 1ms; the 2nd range counter represents requests with
response time between 1ms and 2ms; the 3rd range counter are requests with
response time between 2ms and 4ms; and the last range counter are requests
with response time bigger than 4ms.

[Back to TOC](#table-of-contents)

## Name Prefixing

To avoid naming conflicts between Proxy-Wasm filters, the name of a metric is
always prefixed with: `pw.{filter_name}.{metric_name}`. This means that a metric
named `a_counter` inserted by `a_filter` will have its name stored as:
`pw.a_filter.a_counter`.

Thus, the maximum length of a metric name configured via
[max_metric_name_length] is enforced on the prefixed name and may need to be
increased in some cases.

[Back to TOC](#table-of-contents)

## Histogram Binning Strategy

The above example demonstrates a histogram with ranges (or bins) whose
upper-bound grows in powers of 2, i.e. `2^0`, `2^1`, and `2^2`. This is usually
called "logarithmic binning" and is how histograms bins are represented in
ngx_wasm_module.

This binning strategy implies that when a value `v` is recorded, it is matched
with the smallest power of two that is bigger than `v`. This value is the
*upper-bound* of the bin associated with `v`. If the histogram contains or can
contain such a bin, that bin's counter is incremented. If not, the bin with the
next smallest upper-bound bigger than `v` has its counter incremented instead.

[Back to TOC](#table-of-contents)

## Histogram Update and Expansion

Histograms are created with 5 bins: 1 initialized and 4 uninitialized.

The bin initialized upon histogram creation has upper-bound `2^32` and its
counter is incremented if it is the only bin whose upper-bound is bigger than
the recorded value.

If a value `v` is recorded and its bin is not part of the initialized bins, a
new bin with the upper-bound associated with `v` is initialized, and its counter
is incremented.

If the histogram is out of uninitialized bins, it can be expanded up to 18
bins so as to accommodate the additional bins for other ranges of `v`.

[Back to TOC](#table-of-contents)

## Memory Consumption

The space occupied by a metric in memory contains:

1. Its name.
2. Its value.
3. And the underlying structure representing the metric in the shared key-value
store memory (see [slab_size]).

While the key-value structure has a fixed size of **96 bytes**, the sizes of
name and value vary.

In memory, the value of a counter or gauge occupies 8 bytes + 16 bytes per
worker process. The value size grows according to the number of workers because
metric values are segmented across them: Each worker has its own segment of the
value to write updates to. When a metric is retrieved, the segments are
consolidated and returned as a single metric value. This storage strategy allows
metric updates to be performed without the aid of shared memory read/write locks
at the cost of 16 bytes per worker.

Histogram values also have a baseline size of 8 bytes + 16 bytes per worker
process. However, histograms also need extra space per worker for bins storage.
Bins storage costs 4 bytes + 8 bytes per bin. Thus, a 5-bin histogram takes: 8
bytes + (16 + 4 + 5*8), so 60 bytes per worker.

As such, in a 4-workers setup, a counter or gauge whose name is 64 chars long
occupies 168 bytes, and a 5-bin histogram with the same name length occupies 408
bytes. A 18-bin histogram with the same length name occupies 824 bytes.

[Back to TOC](#table-of-contents)

## Shared Memory Allocation

Nginx employs a shared memory allocation model that enforces allocation size to
be a power of 2 greater than 8; nonconforming values are rounded up, see [Nginx
shared memory].

For instance, this means that an allocation of 168 bytes ends up occupying 256
bytes of shared memory. This should be taken into consideration when estimating
the total space required for a group of metrics.

[Back to TOC](#table-of-contents)

## Nginx Reconfiguration

If Nginx is reconfigured with a different number of workers or a different
[slab_size] value, existing metrics need to be reallocated into a new
shared memory zone at reconfiguration time. This is due to the metric values
being segmented across workers.

As such, it is important to make sure that the new [slab_size] value is large
enough to accommodate existing metrics, and that the value of
[max_metric_name_length] is not less than any existing metric name.

[Back to TOC](#table-of-contents)

[Nginx shared memory]: https://nginx.org/en/docs/dev/development_guide.html#shared_memory
[slab_size]: DIRECTIVES.md#slab_size
[max_metric_name_length]: DIRECTIVES.md#max_metric_name_length
8 changes: 4 additions & 4 deletions docs/PROXY_WASM.md
Original file line number Diff line number Diff line change
Expand Up @@ -536,10 +536,10 @@ SDK ABI `0.2.1`) and their present status in ngx_wasm_module:
`proxy_enqueue_shared_queue` | :heavy_check_mark: | No automatic eviction mechanism if the queue is full.
`proxy_resolve_shared_queue` | :x: |
*Stats/metrics* | |
`proxy_define_metric` | :x: |
`proxy_get_metric` | :x: |
`proxy_record_metric` | :x: |
`proxy_increment_metric` | :x: |
`proxy_define_metric` | :heavy_check_mark: |
`proxy_get_metric` | :heavy_check_mark: |
`proxy_record_metric` | :heavy_check_mark: |
`proxy_increment_metric` | :heavy_check_mark: |
*Custom extension points* | |
`proxy_call_foreign_function` | :x: |

Expand Down
Loading

0 comments on commit e62940e

Please sign in to comment.