Releases: apollographql/router
v1.48.1
🐛 Fixes
Improve error message produced when a subgraph response doesn't include an expected content-type
header value (Issue #5359)
To improve a common debuggability challenge when a subgraph response doesn't contain an expected content-type
header value, the error message produced will include additional details about the error.
Some examples of the improved error message:
-
HTTP fetch failed from 'test': subgraph response contains invalid 'content-type' header value "application/json,application/json"; expected content-type: application/json or content-type: application/graphql-response+json
-
HTTP fetch failed from 'test': subgraph response does not contain 'content-type' header; expected content-type: application/json or content-type: application/graphql-response+json
By @IvanGoncharov in #5223
Update apollo-compiler
for two small improvements (PR #5347)
Updated our underlying apollo-rs
dependency on our apollo-compiler
crate to bring in two nice improvements:
-
Fix validation performance bug
Adds a cache in fragment spread validation, fixing a situation where validating a query with many fragment spreads against a schema with many interfaces could take multiple seconds to validate.
-
Remove ariadne byte/char mapping
Generating JSON or CLI reports for apollo-compiler diagnostics used a translation layer between byte offsets and character offsets, which cost some computation and memory proportional to the size of the source text. The latest version of
ariadne
allows us to remove this translation.
By @goto-bus-stop in #5347
📃 Configuration
Rename the telemetry selector which obtains the GraphOS operation id (PR #5337)
Renames a misnamed trace_id
selector introduced in v1.48.0 to the value which it actually represents which is an Apollo GraphOS operation ID, rather than a trace ID. Apologies for the confusion! Unfortunately, we aren't able to produce an Apollo GraphOS trace ID at this time.
If you want to access this operation ID selector, here is an example of how to apply it to your tracing spans:
telemetry:
instrumentation:
spans:
router:
"studio.operation.id":
studio_operation_id: true
This can be useful for more easily locating the operation in GraphOS' Insights feature and finding applicable traces in Studio.
v1.48.1-rc.0
1.48.1-rc.0
v1.48.0
Important
If you have enabled Distributed query plan caching, this release has a Federation version bump, which will result in ordinary/expected changes to the hashing algorithm used for the distributed cache keys. On account of this, you should anticipate additional cache regeneration cost when updating to this version while the new hashing algorithm comes into service.
🚀 Features
Demand control preview (PR #5317)
⚠️ This is a preview for an Enterprise feature of the Apollo Router. It requires an organization with a GraphOS Enterprise plan. If your organization doesn't currently have an Enterprise plan, you can test out this functionality with a free Enterprise trial.As a preview feature, it's subject to our Preview launch stage expectations and configuration and performance may change in future releases.
Demand control allows you to control the cost of operations in the router, potentially rejecting requests that are too expensive that could bring down the Router or subgraphs.
# Demand control enabled, but in measure mode.
preview_demand_control:
enabled: true
# `measure` or `enforce` mode. Measure mode will analyze cost of operations but not reject them.
mode: measure
strategy:
# Static estimated strategy has a fixed cost for elements and when set to enforce will reject
# requests that are estimated as too high before any execution takes place.
static_estimated:
# The assumed returned list size for operations. This should be set to the maximum number of items in graphql list
list_size: 10
# The maximum cost of a single operation.
max: 1000
Telemetry is emitted for demand control, including the estimated cost of operations and whether they were rejected or not.
Full details will be included in the documentation for demand control which will be finalized before the next release.
By @BrynCooke in #5317
Ability to include Apollo Studio trace ID on tracing spans (Issue #3803), (Issue #5172)
Add support for a new trace ID selector kind, the apollo
trace ID, which represents the trace ID on Apollo GraphOS Studio.
An example configuration using trace_id: apollo
:
telemetry:
instrumentation:
spans:
router:
"studio.trace.id":
trace_id: apollo
Add ability for router to deal with query plans with contextual rewrites (PR #5097)
Adds the ability for the router to execute query plans with context rewrites. A context is generated by the @fromContext
directive, and each context maps values in the collected data JSON onto a variable that's used as an argument to a field resolver. To learn more, see Saving and referencing data with contexts.
🐛 Fixes
Fix custom attributes for spans and histogram when used with response_event
(PR #5221)
This release fixes multiple issues related to spans and selectors:
- Custom attributes based on response_event in spans are properly added.
- Histograms using response_event selectors are properly updated.
- Static selectors that set a static value are now able to take a Value.
- Static selectors that set a static value are now set at every stage.
- The
on_graphql_error
selector is available on the supergraph stage. - The status of a span can be overridden with the
otel.status_code
attribute.
As an example of using these fixes, the configuration below uses spans with static selectors to mark spans as errors when GraphQL errors occur:
telemetry:
instrumentation:
spans:
router:
attributes:
otel.status_code:
static: error
condition:
eq:
- true
- on_graphql_error: true
supergraph:
attributes:
otel.status_code:
static: error
condition:
eq:
- true
- on_graphql_error: true
Fix instrument incrementing on aborted request when condition is not fulfilled (PR #5215)
Previously when a telemetry instrument was dropped it would be incremented even if the associated condition was not fulfilled. For instance:
telemetry:
instrumentation:
instruments:
router:
http.server.active_requests: false
http.server.request.duration: false
"custom_counter":
description: "count of requests"
type: counter
unit: "unit"
value: unit
# This instrument should not be triggered as the condition is never true
condition:
eq:
- response_header: "never-received"
- static: "true"
In the case where a request was started, but the client aborted the request before the response was sent, the response_header
would never be set to "never-received"
,
and the instrument would not be triggered. However, the instrument would still be incremented.
Conditions are now checked for aborted requests, and the instrument is only incremented if the condition is fulfilled.
By @BrynCooke in #5215
🛠 Maintenance
Send query planner and lifecycle metrics to Apollo (PR #5267, PR #5270)
To enable the performance measurement of the router's new query planner implementation, the router transmits to Apollo the following new metrics:
apollo.router.query_planning.*
provides metrics on the query planner that help improve the query planning implementation.apollo.router.lifecycle.api_schema
provides feedback on the experimental Rust-based API schema generation.apollo.router.lifecycle.license
provides metrics on license expiration that help improve the reliability of the license check mechanism.
These metrics don't leak any sensitive data.
By @BrynCooke in #5267, @goto-bus-stop
📚 Documentation
Add Rhai API constants reference
The Rhai API documentation now includes a list of available constants that are available in the Rhai runtime.
🧪 Experimental
GraphQL instruments (PR #5215, PR #5257)
This PR adds experimental GraphQL instruments to telemetry.
The new instruments are configured in the following:
telemetry:
instrumentation:
instruments:
graphql:
# The number of times a field was executed (counter)
field.execution: true
# The length of list fields (histogram)
list.length: true
# Custom counter of field execution where field name = name
"custom_counter":
description: "count of name field"
type: counter
unit: "unit"
value: field_unit
attributes:
graphql.type.name: true
graphql.field.type: true
graphql.field.name: true
condition:
eq:
- field_name: string
- "name"
# Custom histogram of list lengths for topProducts
"custom_histogram":
description: "histogram of review length"
type: histogram
unit: "unit"
attributes:
graphql.type.name: true
graphql.field.type: true
graphql.field.name: true
value:
field_custom:
list_length: value
condition:
eq:
- field_name: string
- "topProducts"
Using the new instruments consumes significant performance resources from the router. Their performance will be improved in a future release.
Large numbers of metrics may also be generated by using the instruments, so make sure to not incur excessively large APM costs.
⚠ Use these instruments only in development. Don't use them in production.
By @BrynCooke in #5215 and #5257
v1.48.0-rc.0
1.48.0-rc.0
v1.47.0
🚀 Features
Support telemetry selectors with errors (Issue #5027)
The router now supports telemetry selectors that take into account the occurrence of errors. This capability enables you to create metrics, events, or span attributes that contain error messages.
For example, you can create a counter for the number of timed-out requests for subgraphs:
telemetry:
instrumentation:
instruments:
subgraph:
requests.timeout:
value: unit
type: counter
unit: request
description: "subgraph requests containing subgraph timeout"
attributes:
subgraph.name: true
condition:
eq:
- "request timed out"
- error: reason
The router also can now compute new attributes upon receiving a new event in a supergraph response. With this capability, you can fetch data directly from the supergraph response body:
telemetry:
instrumentation:
instruments:
acme.request.on_graphql_error:
value: event_unit
type: counter
unit: error
description: my description
condition:
eq:
- MY_ERROR_CODE
- response_errors: "$.[0].extensions.code"
attributes:
response_errors:
response_errors: "$.*"
Add support for status_code
response to Rhai (Issue #5042)
The router now supports response.status_code
on the Response
interface in Rhai.
Examples using the response status code:
- Converting a response status code to a string:
if response.status_code.to_string() == "200" {
print(`ok`);
}
- Converting a response status code to a number:
if parse_int(response.status_code.to_string()) == 200 {
print(`ok`);
}
Add gt and lt operators for telemetry conditions (PR #5048)
The router supports greater than (gt
) and less than (lt
) operators for telemetry conditions. Similar to the eq
operator, the configuration for both gt
and lt
takes two arguments as a list. The gt
operator checks that the first argument is greater than the second, and the lt
operator checks that the first argument is less than the second. Other conditions such as gte
, lte
, and range
can be made from combinations of gt
, lt
, eq
, and all
.
By @tninesling in #5048
Expose busy timer APIs (PR #4989)
The router supports public APIs that native plugins can use to control when the router's busy timer is run.
The router's busy timer measures the time spent working on a request outside of waiting for external calls, like coprocessors and subgraph calls. It includes the time spent waiting for other concurrent requests to be handled (the wait time in the executor) to show the actual router overhead when handling requests.
The public methods are Context::enter_active_request
and Context::busy_time
. The result is reported in the apollo_router_processing_time
metric
For details on using the APIs, see the documentation for enter_active_request
.
🐛 Fixes
Reduce JSON schema size and Router memory footprint (PR #5061)
As we add more features to the Router the size of the JSON schema for the router configuration file continutes to grow. In particular, adding conditionals to telemetry in v1.46.0 significantly increased this size of the schema. This has a noticeable impact on initial memory footprint, although it does not impact service of requests.
The JSON schema for the router configuration file has been optimized from approximately 100k lines down to just over 7k.
This reduces the startup time of the Router and a smaller schema is more friendly for code editors.
By @BrynCooke in #5061
Prevent query plan cache collision when planning options change (Issue #5093)
The router's hashing algorithm has been updated to prevent cache collisions when the router's configuration changes.
Important
If you have enabled Distributed query plan caching, this release changes the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.
The router supports multiple options that affect the generated query plans, including:
defer_support
generate_query_fragments
experimental_reuse_query_fragments
experimental_type_conditioned_fetching
experimental_query_planner_mode
If distributed query plan caching is enabled, changing any of these options results in different query plans being generated and cached.
This could be problematic in the following scenarios:
- The router configuration changes and a query plan is loaded from cache which is incompatible with the new configuration.
- Routers with different configurations share the same cache, which causes them to cache and load incompatible query plans.
To prevent these from happening, the router now creates a hash for the entire query planner configuration and includes it in the cache key.
5xx internal server error responses returned as GraphQL structured errors (PR #5159)
Previously, the router returned internal server errors (5xx class) as plaintext to clients. Now in this release, the router returns these 5xx errors as structured GraphQL (for example, {"errors": [...]}
).
Internal server errors are returned upon unexpected or unrecoverable disruptions to the GraphQL request lifecycle execution. When these occur, the underlying error messages are logged at an ERROR
level to the router's logs.
By @BrynCooke in #5159
Custom telemetry events not created when logging is disabled (PR #5165)
The router has been fixed to not create custom telemetry events when the log level is set to off
.
An example configuration with level
set to off
for a custom event:
telemetry:
instrumentation:
events:
router:
# Standard events
request: info
response: info
error: info
# Custom events
my.disabled_request_event:
message: "my event message"
level: off # Disabled because we set the level to off
on: request
attributes:
http.request.body.size: true
Ensure that batch entry contexts are correctly preserved (PR #5162)
Previously, the router didn't use contexts correctly when processing batches. A representative context was chosen (the first item in a batch of items) and used to provide context functionality for all the generated responses.
The router now correctly preserves request contexts and uses them during response creation.
Validate enum values in input variables (Issue #4633)
The router now validates enum values provided in JSON variables. Invalid enum values result in GRAPHQL_VALIDATION_FAILED
errors.
Strip dashes from trace_id
in CustomTraceIdPropagator
(Issue #4892)
The router now strips dashes from trace IDs to ensure conformance with OpenTelemetry.
In OpenTelemetry, trace IDs are 128-bit values represented as hex strings without dashes, and they're based on W3C's trace ID format.
This has been applied within the router to trace_id
in CustomTraceIdPropagator
.
Note, if raw trace IDs from headers are represented by uuid4 and contain dashes, the dashes should be stripped so that the raw trace ID value can be parsed into a valid trace_id
.
By @kindermax in #5071
v1.47.0-rc.0
1.47.0-rc.0
v1.46.0
🚀 Features
Entity cache preview: support queries with private scope (PR #4855)
This feature is part of the work on subgraph entity caching, currently in preview.
The router now supports caching responses marked with private
scope. This caching currently works only on subgraph responses without any schema-level information.
For details about the caching behavior, see PR #4855
Add support of custom events defined by YAML for telemetry (Issue #4320)
Users can now configure telemetry events via YAML
to log that something has happened (e.g. a request had errors of a particular type) without reaching for Rhai or a custom plugin.
Events may be triggered on conditions and can include information in the request/response pipeline as attributes.
Here is an example of configuration:
telemetry:
instrumentation:
events:
router:
# Standard events
request: info
response: info
error: info
# Custom events
my.event:
message: "my event message"
level: info
on: request
attributes:
http.response.body.size: false
# Only log when the x-log-request header is `log`
condition:
eq:
- "log"
- request_header: "x-log-request"
supergraph:
# Custom event configuration for supergraph service ...
subgraph:
# Custom event configuration for subgraph service .
Ability to ignore auth prefixes in the JWT plugin
The router now supports a configuration to ignore header prefixes with the JWT plugin. Given that many application headers use the format of Authorization: <scheme> <token>
, this option enables the router to process requests for specific schemes within the Authorization
header while ignoring others.
For example, you can configure the router to process requests with Authorization: Bearer <token>
defined while ignoring others such as Authorization: Basic <token>
:
authentication:
router:
jwt:
header_name: authorization
header_value_prefix: "Bearer"
ignore_mismatched_prefix: true
If the header prefix is an empty string, this option is ignored.
Support conditions on custom attributes for spans and a new selector for GraphQL errors (Issue #4336)
The router now supports conditionally adding attributes on a span and the new on_graphql_error
selector that is set to true if the response body contains GraphQL errors.
An example configuration using condition
in attributes
and on_graphql_error
:
telemetry:
instrumentation:
spans:
router:
attributes:
otel.status_description:
static: "there was an error"
condition:
any:
- not:
eq:
- response_status: code
- 200
- eq:
- on_graphql_error
- true
🐛 Fixes
Federation v2.7.5 (PR #5064)
This brings in a query planner fix released in v2.7.5 of Apollo Federation. Notably, from its changelog:
-
Fix issue with missing fragment definitions due to
generateQueryFragments
. (#2993)An incorrect implementation detail in
generateQueryFragments
caused certain queries to be missing fragment definitions, causing the operation to be invalid and fail early in the request life-cycle (before execution). Specifically, subsequent fragment "candidates" with the same type condition and the same length of selections as a previous fragment weren't correctly added to the list of fragments. An example of an affected query is:query { t { ... on A { x y } } t2 { ... on A { y z } } }
In this case, the second selection set would be converted to an inline fragment spread to subgraph fetches, but the fragment definition would be missing
By @garypen in #5064
Use supergraph schema to extract authorization info (PR #5047)
The router now uses the supergraph schema to extract authorization info, as authorization information may not be available on the query planner's subgraph schemas. This reverts the authorization changes made in PR #4975.
By @tninesling in #5047
Filter fetches added to batch during batch creation (PR #5034)
Previously, the router didn't filter query hashes when creating batches. This could result in failed queries because the additional hashes could incorrectly make a query appear to be committed when it wasn't actually registered in a batch.
This release fixes this issue by filtering query hashes during batch creation.
Use subgraph.name
attribute instead of apollo.subgraph.name
(PR #5012)
In the router v1.45.0, subgraph name mapping didn't work correctly in the Datadog exporter.
The Datadog exporter does some explicit mapping of attributes and was using a value apollo.subgraph.name
that the latest versions of the router don't use. The correct choice is subgraph.name
.
This release updates the mapping to reflect the change and fixes subgraph name mapping for Datadog.
📚 Documentation
Document traffic shaping default configuration (PR #4953)
The documentation for configuring traffic shaping has been updated to clarify that it's enabled by default with preset values. This setting has been the default since PR #3330, which landed in v1.23.0.
🧪 Experimental
Experimental type conditioned fetching (PR #4748)
This release introduces an experimental configuration to enable type-conditioned fetching.
Previously, when querying a field that was in a path of two or more unions, the query planner wasn't able to handle different selections and would aggressively collapse selections in fetches. This resulted in incorrect plans.
Enabling the experimental_type_conditioned_fetching
option can fix this issue by configuring the query planner to fetch with type conditions.
experimental_type_conditioned_fetching: true # false by default
By @o0Ignition0o in #4748
v1.46.0-rc.3
1.46.0-rc.3
v1.46.0-rc.1
1.46.0-rc.1
v1.46.0-rc.0
1.46.0-rc.0