Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add task resource tracking service to cluster service #14681

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
import org.opensearch.common.settings.Settings;
import org.opensearch.index.IndexingPressureService;
import org.opensearch.node.Node;
import org.opensearch.tasks.TaskResourceTrackingService;
import org.opensearch.telemetry.metrics.noop.NoopMetricsRegistry;
import org.opensearch.threadpool.ThreadPool;

Expand Down Expand Up @@ -92,6 +93,7 @@
private RerouteService rerouteService;

private IndexingPressureService indexingPressureService;
private TaskResourceTrackingService taskResourceTrackingService;

public ClusterService(Settings settings, ClusterSettings clusterSettings, ThreadPool threadPool) {
this(settings, clusterSettings, threadPool, new ClusterManagerMetrics(NoopMetricsRegistry.INSTANCE));
Expand Down Expand Up @@ -265,6 +267,24 @@
return indexingPressureService;
}

/**
* Getter for {@link TaskResourceTrackingService}, This method exposes task level resource usage for other components to use.
*
* @return TaskResourceTrackingService
*/
public TaskResourceTrackingService getTaskResourceTrackingService() {
return taskResourceTrackingService;

Check warning on line 276 in server/src/main/java/org/opensearch/cluster/service/ClusterService.java

View check run for this annotation

Codecov / codecov/patch

server/src/main/java/org/opensearch/cluster/service/ClusterService.java#L276

Added line #L276 was not covered by tests
}

/**
* Setter for {@link TaskResourceTrackingService}
*
* @param taskResourceTrackingService taskResourceTrackingService
*/
public void setTaskResourceTrackingService(TaskResourceTrackingService taskResourceTrackingService) {
this.taskResourceTrackingService = taskResourceTrackingService;
}

public ClusterApplierService getClusterApplierService() {
return clusterApplierService;
}
Expand Down
1 change: 1 addition & 0 deletions server/src/main/java/org/opensearch/node/Node.java
Original file line number Diff line number Diff line change
Expand Up @@ -1109,6 +1109,7 @@ protected Node(
clusterService.getClusterSettings(),
threadPool
);
clusterService.setTaskResourceTrackingService(taskResourceTrackingService);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just initialize TaskResourceTrackingService in ClusterService then as we're creating a dependency here anyways.

Also should SetOnce be used?


final SearchBackpressureSettings searchBackpressureSettings = new SearchBackpressureSettings(
settings,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import org.opensearch.ExceptionsHelper;
import org.opensearch.action.search.SearchShardTask;
import org.opensearch.common.SuppressForbidden;
import org.opensearch.common.annotation.PublicApi;
import org.opensearch.common.inject.Inject;
import org.opensearch.common.settings.ClusterSettings;
import org.opensearch.common.settings.Setting;
Expand Down Expand Up @@ -51,6 +52,7 @@
/**
* Service that helps track resource usage of tasks running on a node.
*/
@PublicApi(since = "2.16.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you are marking this public in that it is now exposed publicly from ClusterService. Is this the right level of visibility for this service? Can we instead expose only the required functionality from ClusterService instead the whole of TaskResourceTrackingService? and/or thinking this maybe should be a separate interface for plugins not exposed through ClusterService?

Copy link
Member Author

@ansjcy ansjcy Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do a wrapper of refreshResourceStats in clusterservice, but one argument could be, does it belong to cluster service? In the right level of encapsulation, task level resource usages related operations should only belong to TaskResourceTrackingService. But I agree making the whole service public is also risky. I'm open to suggestions. cc @reta

Copy link
Collaborator

@reta reta Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ansjcy the TaskResourceTrackingService is internal to OpenSearch (and has no relation to the ClusterService either), should not be exposed to the plugins. Regarding to the issue itself:

Currently we are not refreshing task level resource usages on coordinator node for searchTasks, which means all coordinator node resource usage will be 0.

That seem to be the problem that core implementation has to fix, why the task level resources are not refreshed (on coordinator node)?

Copy link
Member Author

@ansjcy ansjcy Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @reta ! thanks for the input. Currently the TaskResourceTrackingService only refreshes task usages when a task ends. But in our case we want to get the resource usage in a SearchOperationsListener, which will be triggered before a task finishes.

Let me think about this more. Instead of exposing the TaskResourceTrackingSevice, does it make sense to you If we can do a usage refresh in core before the listeners are called (in this function: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/action/search/AbstractSearchAsyncAction.java#L469)?

Copy link
Collaborator

@reta reta Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ansjcy

thanks for the input. Currently the TaskResourceTrackingService only refreshes task usages when a task ends.

It does not sound right, the task is still ongoing so its usage won't be correct.

Let me think about this more. Instead of exposing the TaskResourceTrackingSevice, does it make sense to you If we can do a usage refresh in core before the listeners are called (in this function:

The logical point (at least to me) of capturing task usages seems to be the moment task ends. It looks to me you are trying to chime in somewhere in between (while task is still executing), that does not look like the correct way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we add per-request listener instance in the TransportSearchAction::executeRequest. The query insights plugin has nothing to do with it at this moment of time, but we capture the search task resource usage upon request completion, so the tracking data becomes available to everyone (including the query insights plugin). Does it make sense?

@reta We are currently working on a PR based on the above discussion.

For the last point you made, what other justification and work is required to make the API public? We are trying to get all the query insights changes in 2.16 and this is the only PR that is dangling currently. Want to make sure we reach a path forward. Please let us know your suggestions. In the meantime will finalize the above draft PR if there are not concerns with this approach?

Copy link
Collaborator

@reta reta Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @deshsidd

For the last point you made, what other justification and work is required to make the API public?

I did not design the original APIs, you may ask the contributor if he has any concerns. On the second point, if you need to make it public, apply the @PublicApi annotation accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @reta!
cc @buddharajusahil @sgup432 @dzane17 @jainankitk Please let us know your thoughts since you all have contributed to the file.

@reta Looks like you had initially reduced the visibility of the API.

For now I am going to continue with the approach that Reta and Chenyang had discussed above and work on the following PR. Will also make the SearchRequestOperationsListener @publicapi as part of these changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta Looks like you had initially reduced the visibility of the API.

@deshsidd yes, you will understand why once try to apply @PublicApi to it :-) : it pulls a pile of dependencies with it .... (I am very doubtful it was designed as being public in the first place).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.
@reta and others - Please take a look : #14832 and let me know your thoughts.

@SuppressForbidden(reason = "ThreadMXBean#getThreadAllocatedBytes")
public class TaskResourceTrackingService implements RunnableTaskExecutionListener {

Expand Down Expand Up @@ -357,6 +359,7 @@ public TaskResourceInfo getTaskResourceUsageFromThreadContext() {
/**
* Listener that gets invoked when a task execution completes.
*/
@PublicApi(since = "2.16.0")
public interface TaskCompletionListener {
void onTaskCompleted(Task task);
}
Expand Down
Loading