Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for event loop utilization to the core metrics service #153717

Merged
merged 7 commits into from
Mar 29, 2023

Conversation

lukeelmers
Copy link
Member

@lukeelmers lukeelmers commented Mar 24, 2023

This updates the metrics service with a collector for event loop utilization, based on some prior art by @Bamieh ❤️

To learn more about ELU, I recommend this article from nodesource.

Things included:

  • Introduces ELU metrics collector
  • Updates ops metrics logger to include utilization
  • Updates Kibana status page with a tile for utilization
    • This wasn't strictly necessary, happy to drop it if folks think it isn't useful
  • /api/status, /api/stats are automatically updated as a result of this change

Things not included:

  • I didn't add a utilization threshold tracker as @Bamieh had put in his original POC.
    • Main reasoning for this is that it's easy enough to add later, and I think we need to see this metric in production for awhile before we get a good idea of what a helpful threshold would be for logging purposes. I'm still happy to go back and add this if someone feels strongly about it.

Note to reviewers: I restructured one of the test files a bit, so reviewing will be easier if you view the diff with Hide whitespace enabled.

Screenshot 2023-03-24 at 4 10 55 PM

Copy link
Member Author

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self review

Comment on lines +63 to +64
defaultMessage: 'Heap used out of {heapTotal}',
values: { heapTotal: numeral(metrics.process.memory.heap.size_limit).format('0.00 b') },
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid restructuring the whole status page, I combined the two separate heap tiles into one that includes both heapUsedInBytes and sizeLimit.

},
}),
value: metrics.process.event_loop_utilization.utilization,
type: 'float',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're autoformatting floats to two decimal places... it's nice to see the utilization values with a bit more precision, but I don't know if it's worth a one-off implementation just for this metric. I tried updating all floats to five decimal places, but it blew up the layout, so I decided to leave it alone since this is a nice-to-have item anyway 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's fine ihmo. That page has been a pain every time we want to change/add/remove anything...

@@ -55,6 +55,11 @@ export function getEcsOpsMetricsLog(metrics: OpsMetrics) {
).format('0.000')} }`
: '';

const eventLoopUtilizationVal = process?.event_loop_utilization;
const eventLoopUtilizationMsg = eventLoopUtilizationVal
? ` utilization: ${numeral(process?.event_loop_utilization.utilization).format('0.00000')}`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including active/idle here didn't seem useful enough to justify the increased message length. But it is easy to add if anyone feels otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure either to see how they could be useful. The utilization ratio seems the only valuable info

@lukeelmers lukeelmers force-pushed the feat/elu branch 3 times, most recently from efe9de7 to 934f88b Compare March 27, 2023 20:41
@lukeelmers lukeelmers added v8.8.0 release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting labels Mar 27, 2023
@lukeelmers lukeelmers marked this pull request as ready for review March 27, 2023 21:50
@lukeelmers lukeelmers requested a review from a team as a code owner March 27, 2023 21:50
@lukeelmers lukeelmers added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Epic:KBNA-8605 labels Mar 27, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@lukeelmers lukeelmers self-assigned this Mar 27, 2023
@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me (once the tests are fixed!)

},
}),
value: metrics.process.event_loop_utilization.utilization,
type: 'float',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's fine ihmo. That page has been a pain every time we want to change/add/remove anything...

@@ -55,6 +55,11 @@ export function getEcsOpsMetricsLog(metrics: OpsMetrics) {
).format('0.000')} }`
: '';

const eventLoopUtilizationVal = process?.event_loop_utilization;
const eventLoopUtilizationMsg = eventLoopUtilizationVal
? ` utilization: ${numeral(process?.event_loop_utilization.utilization).format('0.00000')}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure either to see how they could be useful. The utilization ratio seems the only valuable info

Copy link
Member

@Bamieh Bamieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It is good that you skipped using the ELU class I have from the POC because it is very specific to the collector (to avoid double logging and double incrementing the counters). Using the nodeJS eventLoopUtilization method directly as you did looks correct to me 👍

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers lukeelmers enabled auto-merge (squash) March 29, 2023 20:17
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
core 352.2KB 352.4KB +285.0B
Unknown metric groups

API count

id before after diff
@kbn/core-metrics-server 54 55 +1

ESLint disabled in files

id before after diff
@kbn/core-metrics-collectors-server-internal 2 1 -1

ESLint disabled line counts

id before after diff
securitySolution 432 435 +3

Total ESLint disabled count

id before after diff
@kbn/core-metrics-collectors-server-internal 3 2 -1
securitySolution 512 515 +3
total +2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @lukeelmers

@lukeelmers lukeelmers merged commit 9517d06 into elastic:main Mar 29, 2023
@lukeelmers lukeelmers deleted the feat/elu branch March 29, 2023 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants