Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource usage trackers and resource usage collector service #9890

Merged
merged 23 commits into from
Oct 16, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
13c5c90
Add node performance trackers and performance collector service
bharath-techie Sep 7, 2023
f861207
addressing comments
bharath-techie Sep 8, 2023
5fb7d51
Addressing comments
bharath-techie Sep 11, 2023
f867acf
Merge branch 'main' of github.com:opensearch-project/OpenSearch into …
bharath-techie Sep 11, 2023
cdc11f0
Merge branch 'main' of github.com:opensearch-project/OpenSearch into …
bharath-techie Sep 15, 2023
03b58cb
renaming stats class and injecting settings to trackers
bharath-techie Sep 24, 2023
39f4e0c
Merge branch 'main' of github.com:opensearch-project/OpenSearch into …
bharath-techie Sep 27, 2023
320a71f
addressing review comments
bharath-techie Sep 27, 2023
4174432
addressing comments
bharath-techie Sep 28, 2023
12a191f
addressing comments
bharath-techie Sep 29, 2023
7ef52fe
addressing comments
bharath-techie Sep 29, 2023
6a01876
addressing comments, removing perf collector dependency in node perfo…
bharath-techie Oct 5, 2023
e7b0c49
Merge branch 'main' of github.com:opensearch-project/OpenSearch into …
bharath-techie Oct 5, 2023
768eea4
Addressing comments
bharath-techie Oct 6, 2023
4e82164
addressing comment
bharath-techie Oct 9, 2023
ef42c3d
Merge branch 'main' of github.com:opensearch-project/OpenSearch into …
bharath-techie Oct 9, 2023
6d71414
changing elapsed_time to timestamp
bharath-techie Oct 9, 2023
1ab5ec4
addressing comments
bharath-techie Oct 9, 2023
bc3a2cc
Refactoring test back to *Tests format as *IT is not recognized
bharath-techie Oct 12, 2023
6c02a91
Merge branch 'main' of github.com:opensearch-project/OpenSearch into …
bharath-techie Oct 12, 2023
757f9d2
Renaming files and packages to ResourceUsage
bharath-techie Oct 13, 2023
690c168
Merge branch 'main' of github.com:opensearch-project/OpenSearch into …
bharath-techie Oct 13, 2023
bf7f65b
Merge branch 'main' of github.com:opensearch-project/OpenSearch into …
bharath-techie Oct 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Add events correlation engine plugin ([#6854](https://github.com/opensearch-project/OpenSearch/issues/6854))
- Introduce new dynamic cluster setting to control slice computation for concurrent segment search ([#9107](https://github.com/opensearch-project/OpenSearch/pull/9107))
- Implement on behalf of token passing for extensions ([#8679](https://github.com/opensearch-project/OpenSearch/pull/8679))
- [Admission control] Add Performance collector service and node performance stats tracker ([#9890](https://github.com/opensearch-project/OpenSearch/pull/9890))
- Implement Visitor Design pattern in QueryBuilder to enable the capability to traverse through the complex QueryBuilder tree. ([#10110](https://github.com/opensearch-project/OpenSearch/pull/10110))
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
- Provide service accounts tokens to extensions ([#9618](https://github.com/opensearch-project/OpenSearch/pull/9618))
- Configurable merge policy for index with an option to choose from LogByteSize and Tiered merge policy ([#9992](https://github.com/opensearch-project/OpenSearch/pull/9992))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
import org.opensearch.monitor.os.OsStats;
import org.opensearch.monitor.process.ProcessStats;
import org.opensearch.node.AdaptiveSelectionStats;
import org.opensearch.node.NodesPerformanceStats;
import org.opensearch.script.ScriptCacheStats;
import org.opensearch.script.ScriptStats;
import org.opensearch.search.backpressure.stats.SearchBackpressureStats;
Expand Down Expand Up @@ -142,6 +143,9 @@ public class NodeStats extends BaseNodeResponse implements ToXContentFragment {
@Nullable
private SearchPipelineStats searchPipelineStats;

@Nullable
private NodesPerformanceStats nodesPerformanceStats;

public NodeStats(StreamInput in) throws IOException {
super(in);
timestamp = in.readVLong();
Expand Down Expand Up @@ -198,6 +202,11 @@ public NodeStats(StreamInput in) throws IOException {
} else {
searchPipelineStats = null;
}
if (in.getVersion().onOrAfter(Version.V_3_0_0)) { // make it 2.12 when we backport
nodesPerformanceStats = in.readOptionalWriteable(NodesPerformanceStats::new);
} else {
nodesPerformanceStats = null;
}
}

public NodeStats(
Expand All @@ -216,6 +225,7 @@ public NodeStats(
@Nullable DiscoveryStats discoveryStats,
@Nullable IngestStats ingestStats,
@Nullable AdaptiveSelectionStats adaptiveSelectionStats,
@Nullable NodesPerformanceStats nodesPerformanceStats,
@Nullable ScriptCacheStats scriptCacheStats,
@Nullable IndexingPressureStats indexingPressureStats,
@Nullable ShardIndexingPressureStats shardIndexingPressureStats,
Expand All @@ -241,6 +251,7 @@ public NodeStats(
this.discoveryStats = discoveryStats;
this.ingestStats = ingestStats;
this.adaptiveSelectionStats = adaptiveSelectionStats;
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
this.nodesPerformanceStats = nodesPerformanceStats;
this.scriptCacheStats = scriptCacheStats;
this.indexingPressureStats = indexingPressureStats;
this.shardIndexingPressureStats = shardIndexingPressureStats;
Expand Down Expand Up @@ -344,6 +355,11 @@ public AdaptiveSelectionStats getAdaptiveSelectionStats() {
return adaptiveSelectionStats;
}

@Nullable
public NodesPerformanceStats getNodesPerformanceStats() {
return nodesPerformanceStats;
}

@Nullable
public ScriptCacheStats getScriptCacheStats() {
return scriptCacheStats;
Expand Down Expand Up @@ -430,6 +446,9 @@ public void writeTo(StreamOutput out) throws IOException {
if (out.getVersion().onOrAfter(Version.V_2_9_0)) {
out.writeOptionalWriteable(searchPipelineStats);
}
if (out.getVersion().onOrAfter(Version.V_3_0_0)) { // make it 2.12 when we backport
out.writeOptionalWriteable(nodesPerformanceStats);
}
}

@Override
Expand Down Expand Up @@ -520,7 +539,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
if (getSearchPipelineStats() != null) {
getSearchPipelineStats().toXContent(builder, params);
}

if (getNodesPerformanceStats() != null) {
getNodesPerformanceStats().toXContent(builder, params);
}
return builder;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,8 @@ public enum Metric {
WEIGHTED_ROUTING_STATS("weighted_routing"),
FILE_CACHE_STATS("file_cache"),
TASK_CANCELLATION("task_cancellation"),
SEARCH_PIPELINE("search_pipeline");
SEARCH_PIPELINE("search_pipeline"),
PERFORMANCE_STATS("performance_stats");

private String metricName;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,8 @@ protected NodeStats nodeOperation(NodeStatsRequest nodeStatsRequest) {
NodesStatsRequest.Metric.WEIGHTED_ROUTING_STATS.containedIn(metrics),
NodesStatsRequest.Metric.FILE_CACHE_STATS.containedIn(metrics),
NodesStatsRequest.Metric.TASK_CANCELLATION.containedIn(metrics),
NodesStatsRequest.Metric.SEARCH_PIPELINE.containedIn(metrics)
NodesStatsRequest.Metric.SEARCH_PIPELINE.containedIn(metrics),
NodesStatsRequest.Metric.PERFORMANCE_STATS.containedIn(metrics)
);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ protected ClusterStatsNodeResponse nodeOperation(ClusterStatsNodeRequest nodeReq
false,
false,
false,
false,
false
);
List<ShardStats> shardsStats = new ArrayList<>();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@
import org.opensearch.persistent.PersistentTasksClusterService;
import org.opensearch.persistent.decider.EnableAssignmentDecider;
import org.opensearch.plugins.PluginsService;
import org.opensearch.ratelimiting.tracker.PerformanceTrackerSettings;
import org.opensearch.repositories.fs.FsRepository;
import org.opensearch.rest.BaseRestHandler;
import org.opensearch.script.ScriptService;
Expand Down Expand Up @@ -654,6 +655,10 @@ public void apply(Settings value, Settings current, Settings previous) {
SegmentReplicationPressureService.MAX_REPLICATION_LIMIT_STALE_REPLICA_SETTING,
SegmentReplicationPressureService.MAX_ALLOWED_STALE_SHARDS,

// Settings related to admission control
PerformanceTrackerSettings.GLOBAL_CPU_USAGE_AC_WINDOW_DURATION_SETTING,
PerformanceTrackerSettings.GLOBAL_JVM_USAGE_AC_WINDOW_DURATION_SETTING,

// Settings related to Searchable Snapshots
Node.NODE_SEARCH_CACHE_SIZE_SETTING,
FileCache.DATA_TO_FILE_CACHE_SIZE_RATIO_SETTING,
Expand Down
25 changes: 23 additions & 2 deletions server/src/main/java/org/opensearch/node/Node.java
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,7 @@
import org.opensearch.plugins.SearchPlugin;
import org.opensearch.plugins.SystemIndexPlugin;
import org.opensearch.plugins.TelemetryPlugin;
import org.opensearch.ratelimiting.tracker.NodePerformanceTracker;
import org.opensearch.repositories.RepositoriesModule;
import org.opensearch.repositories.RepositoriesService;
import org.opensearch.rest.RestController;
Expand Down Expand Up @@ -805,7 +806,6 @@ protected Node(
searchRequestStats,
remoteStoreStatsTrackerFactory
);

final AliasValidator aliasValidator = new AliasValidator();

final ShardLimitValidator shardLimitValidator = new ShardLimitValidator(settings, clusterService, systemIndices);
Expand Down Expand Up @@ -1070,6 +1070,16 @@ protected Node(
transportService.getTaskManager(),
taskCancellationMonitoringSettings
);
final NodePerformanceTracker nodePerformanceTracker = new NodePerformanceTracker(
threadPool,
settings,
clusterService.getClusterSettings()
);
final PerformanceCollectorService performanceCollectorService = new PerformanceCollectorService(
nodePerformanceTracker,
clusterService,
threadPool
);
this.nodeService = new NodeService(
settings,
threadPool,
Expand All @@ -1091,7 +1101,8 @@ protected Node(
searchBackpressureService,
searchPipelineService,
fileCache,
taskCancellationMonitoringService
taskCancellationMonitoringService,
performanceCollectorService
);

final SearchService searchService = newSearchService(
Expand Down Expand Up @@ -1212,6 +1223,8 @@ protected Node(
b.bind(RerouteService.class).toInstance(rerouteService);
b.bind(ShardLimitValidator.class).toInstance(shardLimitValidator);
b.bind(FsHealthService.class).toInstance(fsHealthService);
b.bind(NodePerformanceTracker.class).toInstance(nodePerformanceTracker);
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
b.bind(PerformanceCollectorService.class).toInstance(performanceCollectorService);
b.bind(SystemIndices.class).toInstance(systemIndices);
b.bind(IdentityService.class).toInstance(identityService);
b.bind(Tracer.class).toInstance(tracer);
Expand Down Expand Up @@ -1328,6 +1341,8 @@ public Node start() throws NodeValidationException {
injector.getInstance(RepositoriesService.class).start();
injector.getInstance(SearchService.class).start();
injector.getInstance(FsHealthService.class).start();
injector.getInstance(NodePerformanceTracker.class).start();
injector.getInstance(PerformanceCollectorService.class).start();
nodeService.getMonitorService().start();
nodeService.getSearchBackpressureService().start();
nodeService.getTaskCancellationMonitoringService().start();
Expand Down Expand Up @@ -1490,6 +1505,8 @@ private Node stop() {
injector.getInstance(ClusterService.class).stop();
injector.getInstance(NodeConnectionsService.class).stop();
injector.getInstance(FsHealthService.class).stop();
injector.getInstance(NodePerformanceTracker.class).stop();
injector.getInstance(PerformanceCollectorService.class).stop();
nodeService.getMonitorService().stop();
nodeService.getSearchBackpressureService().stop();
injector.getInstance(GatewayService.class).stop();
Expand Down Expand Up @@ -1553,6 +1570,10 @@ public synchronized void close() throws IOException {
toClose.add(nodeService.getSearchBackpressureService());
toClose.add(() -> stopWatch.stop().start("fsHealth"));
toClose.add(injector.getInstance(FsHealthService.class));
toClose.add(() -> stopWatch.stop().start("node_performance_tracker"));
toClose.add(injector.getInstance(NodePerformanceTracker.class));
toClose.add(() -> stopWatch.stop().start("performance_collector"));
toClose.add(injector.getInstance(PerformanceCollectorService.class));
toClose.add(() -> stopWatch.stop().start("gateway"));
toClose.add(injector.getInstance(GatewayService.class));
toClose.add(() -> stopWatch.stop().start("search"));
Expand Down
81 changes: 81 additions & 0 deletions server/src/main/java/org/opensearch/node/NodePerformanceStats.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.node;

import org.opensearch.core.common.io.stream.StreamInput;
import org.opensearch.core.common.io.stream.StreamOutput;
import org.opensearch.core.common.io.stream.Writeable;

import java.io.IOException;
import java.util.Locale;

/**
* This represents the performance stats of a node along with the timestamp at which the stats object was created
* in the respective node
*/
public class NodePerformanceStats implements Writeable {
final String nodeId;
long timestamp;
double cpuUtilizationPercent;
double memoryUtilizationPercent;

public NodePerformanceStats(String nodeId, long timestamp, double memoryUtilizationPercent, double cpuUtilizationPercent) {
this.nodeId = nodeId;
this.timestamp = timestamp;
this.cpuUtilizationPercent = cpuUtilizationPercent;
this.memoryUtilizationPercent = memoryUtilizationPercent;
}

public NodePerformanceStats(StreamInput in) throws IOException {
this.nodeId = in.readString();
this.timestamp = in.readLong();
this.cpuUtilizationPercent = in.readDouble();
this.memoryUtilizationPercent = in.readDouble();
}

@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeString(this.nodeId);
out.writeLong(this.timestamp);
out.writeDouble(this.cpuUtilizationPercent);
out.writeDouble(this.memoryUtilizationPercent);
}

@Override
public String toString() {
StringBuilder sb = new StringBuilder("NodePerformanceStatistics[");
sb.append(nodeId).append("](");
sb.append("Timestamp: ").append(timestamp);
sb.append(", CPU utilization percent: ").append(String.format(Locale.ROOT, "%.1f", cpuUtilizationPercent));
sb.append(", Memory utilization percent: ").append(String.format(Locale.ROOT, "%.1f", memoryUtilizationPercent));
sb.append(")");
return sb.toString();
}

NodePerformanceStats(NodePerformanceStats nodePerformanceStats) {
this(
nodePerformanceStats.nodeId,
nodePerformanceStats.timestamp,
nodePerformanceStats.memoryUtilizationPercent,
nodePerformanceStats.cpuUtilizationPercent
);
}

public double getMemoryUtilizationPercent() {
return memoryUtilizationPercent;
}

public double getCpuUtilizationPercent() {
return cpuUtilizationPercent;
}

public long getTimestamp() {
return timestamp;
}
}
9 changes: 7 additions & 2 deletions server/src/main/java/org/opensearch/node/NodeService.java
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ public class NodeService implements Closeable {
private final ScriptService scriptService;
private final HttpServerTransport httpServerTransport;
private final ResponseCollectorService responseCollectorService;
private final PerformanceCollectorService performanceCollectorService;
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
private final SearchTransportService searchTransportService;
private final IndexingPressureService indexingPressureService;
private final AggregationUsageService aggregationUsageService;
Expand Down Expand Up @@ -114,7 +115,8 @@ public class NodeService implements Closeable {
SearchBackpressureService searchBackpressureService,
SearchPipelineService searchPipelineService,
FileCache fileCache,
TaskCancellationMonitoringService taskCancellationMonitoringService
TaskCancellationMonitoringService taskCancellationMonitoringService,
PerformanceCollectorService performanceCollectorService
) {
this.settings = settings;
this.threadPool = threadPool;
Expand All @@ -137,6 +139,7 @@ public class NodeService implements Closeable {
this.clusterService = clusterService;
this.fileCache = fileCache;
this.taskCancellationMonitoringService = taskCancellationMonitoringService;
this.performanceCollectorService = performanceCollectorService;
clusterService.addStateApplier(ingestService);
clusterService.addStateApplier(searchPipelineService);
}
Expand Down Expand Up @@ -217,7 +220,8 @@ public NodeStats stats(
boolean weightedRoutingStats,
boolean fileCacheStats,
boolean taskCancellation,
boolean searchPipelineStats
boolean searchPipelineStats,
boolean nodesPerfStats
) {
// for indices stats we want to include previous allocated shards stats as well (it will
// only be applied to the sensible ones to use, like refresh/merge/flush/indexing stats)
Expand All @@ -237,6 +241,7 @@ public NodeStats stats(
discoveryStats ? discovery.stats() : null,
ingest ? ingestService.stats() : null,
adaptiveSelection ? responseCollectorService.getAdaptiveStats(searchTransportService.getPendingSearchRequests()) : null,
nodesPerfStats ? performanceCollectorService.stats() : null,
scriptCache ? scriptService.cacheStats() : null,
indexingPressure ? this.indexingPressureService.nodeStats() : null,
shardIndexingPressure ? this.indexingPressureService.shardStats(indices) : null,
Expand Down
Loading
Loading