forked from apache/solr
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAI-5048: Some enhancements for node level metrics #220
Merged
patsonluk
merged 27 commits into
fs/branch_9_3
from
noble/patson/aggregate_prometheus_metrics
Oct 2, 2024
Merged
SAI-5048: Some enhancements for node level metrics #220
patsonluk
merged 27 commits into
fs/branch_9_3
from
noble/patson/aggregate_prometheus_metrics
Oct 2, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nd CoresMetricsApiCaller 2. CoresMetricsApiCaller will only fetch metrics that are missing from AggregateMetricsApiCaller 3. Added capability to get property other than counter (for median duration etc) - Added support of key and expr on top of prefix/property for metrics api querying 4. Some fixes to label and unit test cases 5. Added metricType to CoreMetric to retain correct Prometheus metrics type
* Combine both core and solr.node metrics into AggregateMetricsApiCaller to speed things up * Fixed metrics api property query param * Fixed metrics api property query param * Avoid NPE * javadoc code cleanup re-arranged CoreMetric enum ordering to fit the test case
…s_metrics' into noble/patson/aggregate_prometheus_metrics
* Try making 2 calls and see if it's faster * Try making 2 calls and see if it's faster * Fixed unit test cases
nginthfs
approved these changes
Sep 24, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sorry for the delay, but LGTM! Tested it locally myself and performance / metrics look good!
This was referenced Oct 10, 2024
magibney
pushed a commit
that referenced
this pull request
Oct 11, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This is built on top of #216 proposed by Noble. For diff please refer to this
Some enhancements:
AggregateMetricsApiCaller
andCoreMetricsApiCaller
.CoreMetricsApiCaller
would make the api call based on whatever is not covered in theAggregateMetricsApiCaller
Added support of key and expr on top of prefix/property for metrics api queryingTurns out the expr is very slow on playpen, we are going to support extra property by using multiple&property=...
insteadsolr.node
) will have suffix[node aggregated]
appended to the# HELP line
. for example# HELP top_level_requests_get_duration_p99 top-level gets p99 duration[node aggregated]
. This makes it easier to confirm the metrics are obtained from the expected sourceTest
Deployed on to playpen:
Remarks
aggregateNodeLevelMetricsEnabled
consistently. Otherwise, we might get partial results<indexConfig aggregateNodeLevelMetricsEnabled="true">
, and<updateHandler class="solr.DirectUpdateHandler2" aggregateNodeLevelMetricsEnabled="true">
as wellOther than the above, we have changed old gauges
UPDATE.updateHandler.autoCommits
andUPDATE.updateHandler.softAutoCommits
to meters to enable node aggregated metrics. However for the remaining ones, gauge makes more sense and it's okay to leave them as is, as we do not currently use them in our grafana dashboards anyway.5. There was an attempt to combine the 2 callers into one and make only one API call. However. it's actually taking longer as we asking for both core and solr.node with several properties (+p50, p95 and p99). That QTime increased from 101ms -> 130ms during our playpen tests.