-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Opentelemetry plugin] Specify api.ft.com metrics for underneath APIs #1191
Comments
Hiya, status update on where I've got to so far. This is hard! I'm encountering two issues: HTTP vs UndiciI set up a test app locally which makes requests via both Undici and node-fetch – these are the two most common ways we make client requests from our apps. I can get the Undici request to log but I don't seem to be able to hook into the HTTP (node-fetch) request at all. How is this working?? @sjparkinson this is making me question everything - are we definitely getting metrics from node-fetch? 😂 Here's what I've been trying (I've also tried every other option under getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-http': {
requestHook: (span, request) => {
console.log('HTTP REQUEST HOOK', request.url);
}
},
'@opentelemetry/instrumentation-undici': {
requestHook: (span, request) => {
console.log('UNDICI REQUEST HOOK', request.path);
}
}
// ...
}) When I make the following requests: const nFetch = require('node-fetch');
const bulbasaur = await fetch('https://pokeapi.co/api/v2/pokemon/bulbasaur'); // native fetch
const squirtle = await nFetch('https://pokeapi.co/api/v2/pokemon/squirtle'); // node-fetch I get only this log:
I might be missing something, maybe there's a different instrumentation library that handles HTTP client requests? Seems unlikely though. Custom attributes on existing metricsTurns out that the instrumentations for Node.js are quite strict. Ignoring the HTTP issue I decided to try and add custom metrics to native fetches. Attributes would get added to the If I try adding a custom attribute that isn't explicitly supported or even if I add the one experimental attribute ( getNodeAutoInstrumentations({
// ...
'@opentelemetry/instrumentation-undici': {
startSpanHook(request) {
return {
['url.template']: '/hello-world'
};
}
}
// ...
}) The metric that gets exported is like this: {
// ...
attributes: [
{
'http.response.status_code': 200,
'http.request.method': 'GET',
'server.address': 'pokeapi.co',
'server.port': 443,
'url.scheme': 'https'
}
]
} However if I reuse one of the metrics that's already exported then it works: getNodeAutoInstrumentations({
// ...
'@opentelemetry/instrumentation-undici': {
startSpanHook(request) {
return {
['server.address']: 'why.wont.you.work'
};
}
}
// ...
}) This exports the metric: {
// ...
attributes: [
{
'http.response.status_code': 200,
'http.request.method': 'GET',
'server.address': 'why.wont.you.work',
'server.port': 443,
'url.scheme': 'https'
}
]
} Very frustrating. If this is a hard limitation at the moment then I'm not sure what we can do to get paths into our metrics. |
I wonder if that is true for attributes in the semantic conventions like
No idea! Though I have wondered if some systems are a bit light on client request metrics, worth a browse of https://grafana.ft.com/d/HzGduSwIz/node-js?orgId=1&var-workspace=I7o8Aa1Sz&var-job=next-myft-api&var-environment=production&var-cloud_provider=heroku&var-cloud_region=All&var-http_method=All&var-http_route=All&var-http_errors=All&viewPanel=13 for a system with well known dependencies. But my understanding for node-fetch is that https://github.com/open-telemetry/opentelemetry-js/blob/main/experimental/packages/opentelemetry-instrumentation-http/src/http.ts is covering it, as it's a wrapper. |
It's a bit light on detail but this issue suggests that load order might be relevant. |
Some thoughts on this (assuming it is possible). There's a helpful list of API endpoints under Tyk at https://apigateway.in.ft.com/check-apis, 179 of them today. I think there are two options for adding the detail we're looking for:
I don't really have a strong list of pros and cons, but my gut feeling is 2 would be most intuitive, and perhaps a little easier to maintain (let's try not to reinvent the next-metrics list right?). |
This creates a test app for the work in #1191. It illustrates the issues I'm seeing and makes it easier for others to try things out. You can test this wth the following: In one terminal window, run: ``` node ./packages/opentelemetry/test/end-to-end/scripts/run.js ``` This starts a test app on port 4001 which has OpenTelemetry metrics enabled. I've overridden the OpenTelemetry package internals to log metrics to `stdout` instead of sending to a collector. Once you've done this, run the following: ``` curl http://localhost:4001 ``` You should see some metrics get logged. Note two things about them: 1. There is never a metric logged for the node-fetch request 2. No custom attributes come through on the logged metrics
@sjparkinson I've opened a draft PR here which includes a bunch of manual edits and a test app to illustrate the issue: #1202.
|
Reading through how the SDK wraps and how node-fetch is written, I wonder if it's related to node-fetch using (I'm sort of seeing this as maybe a good opportunity on figuring out how to write an auto-instrumentation package.) |
Or, reading https://github.com/open-telemetry/opentelemetry-js/blob/main/experimental/packages/opentelemetry-instrumentation/README.md#limitations, is it related to node-fetch using ESM imports? |
possibly for the latest node-fetch but I'm explicitly using a |
I am also finding the same with a vanilla express app. Traces are captured for the POST request by the Not sure why though. Something to fix somewhere! I have asked in the #otel-js CNCF channel, join link if you're interested. Can't say I always get support, but worth asking. |
I have however been able to use node-fetch@v2 fine with the http example at https://github.com/open-telemetry/opentelemetry-js/tree/main/examples/http. Bit of a fiddle but I installed node-fetch in the client and was able to get the expected trace. diff --git a/examples/http/client.js b/examples/http/client.js
index 168d43392..0cf4c675f 100644
--- a/examples/http/client.js
+++ b/examples/http/client.js
@@ -2,7 +2,7 @@
const api = require('@opentelemetry/api');
const tracer = require('./tracer')('example-http-client');
-const http = require('http');
+const fetch = require('node-fetch');
/** A function which makes requests and handles response. */
function makeRequest() {
@@ -10,18 +10,11 @@ function makeRequest() {
// the span, which is created to track work that happens outside of the
// request lifecycle entirely.
tracer.startActiveSpan('makeRequest', (span) => {
- http.get({
- host: 'localhost',
- port: 8080,
- path: '/helloworld',
- }, (response) => {
- const body = [];
- response.on('data', (chunk) => body.push(chunk));
- response.on('end', () => {
- console.log(body.toString());
+ fetch('http://localhost:8080/helloworld')
+ .then((response) => {
+ response.text();
span.end();
});
- });
});
// The process must live for at least the interval past any traces that
diff --git a/examples/http/package.json b/examples/http/package.json
index 64a05ed43..3d8638469 100644
--- a/examples/http/package.json
+++ b/examples/http/package.json
@@ -5,10 +5,10 @@
"description": "Example of HTTP integration with OpenTelemetry",
"main": "index.js",
"scripts": {
- "zipkin:server": "cross-env EXPORTER=zipkin node ./server.js",
- "zipkin:client": "cross-env EXPORTER=zipkin node ./client.js",
- "jaeger:server": "cross-env EXPORTER=jaeger node ./server.js",
- "jaeger:client": "cross-env EXPORTER=jaeger node ./client.js",
+ "zipkin:server": "EXPORTER=zipkin node ./server.js",
+ "zipkin:client": "EXPORTER=zipkin node ./client.js",
+ "jaeger:server": "EXPORTER=jaeger node ./server.js",
+ "jaeger:client": "EXPORTER=jaeger node ./client.js",
"align-api-deps": "node ../../scripts/align-api-deps.js"
},
"repository": {
@@ -37,7 +37,8 @@
"@opentelemetry/resources": "1.26.0",
"@opentelemetry/sdk-trace-base": "1.26.0",
"@opentelemetry/sdk-trace-node": "1.26.0",
- "@opentelemetry/semantic-conventions": "1.27.0"
+ "@opentelemetry/semantic-conventions": "1.27.0",
+ "node-fetch": "^2.7.0"
},
"homepage": "https://github.com/open-telemetry/opentelemetry-js/tree/main/examples/http",
"devDependencies": { So it does seem like an issue with auto-instrimentation as the issue Jon linked to suggests? |
"HTTP vs Undici" from my original comment is resolved in v2.0.11 of the package but we still need to work out how we want to augment the metrics with our paths. "Custom attributes on existing metrics" is still blocking and we'll either need to find a workaround or try to get the OpenTelemetry maintainers to support custom attributes. |
We'd like to have added a hook in the Opentelemetry plugin that adds specificity to the
api.ft.com
net_peer_name
attribute. So we can select the metrics specific to an API that lives inside the API GatewayWhat problem does this feature solve?
Due to the ongoing Graphite migration, we need to replace some of the system's Graphite based alerts with Opentelemetry.
Some of these alerts do need to trigger when a specific service is not working as it should. For example, let's see this alert in
next-myft-api
Graphite health check
The metrics inside
next.heroku.myft-api.*.fetch.insights-api-topic-recommendations
correspond to the requests done to the service in this URLhttps://api.ft.com/snr/v1/insights/topic-recommendations/
, which lives inside of the API Gateway. But when creating the Opentelemetry Grafana panel for this alert, we find thatnet_peer_name
doesn't allow us to selecttopic-recommendations
, onlyapi.ft.com
because this label is domain based.Link to Grafana panel
This means that if
next-myft-api
uses many services that are inside API Gateway (which it does), all the metrics are bundled together and we can't tell where the 5xx or 4xx are coming fromIdeal solution
As suggested by Sam Parkinson here, we could add a hook in the Opentelemetry plugin to add the first part of the path to the
net_peer_name
attribute. In this case it would beapi.ft.com/snr
, which works for us becausenext-myft-api
doesn't use any other API fromsnr
. But maybe we could consider adding the last part of the path, eg :api.ft.com/topic-recommendations
Anyways, this is a problem shared in many repositories (and probably teams), and it'd be amazing to have this solved in the Opentelemetry plugin
Alternatives
An alternative is to code this OTel hook in every system that needs this granularity.
The text was updated successfully, but these errors were encountered: