Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labels/Triage for OpenSearch Core Repo #11617

Closed
wants to merge 5 commits into from

Conversation

macohen
Copy link
Contributor

@macohen macohen commented Dec 14, 2023

Description

If maintainers agree, this should provide documentation on the triage process followed in core along with definition of some of the labels. Could use additional definition for some of the blank ones, but this is a start.

Related Issues

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…labels in core for easier understanding of what's there

Signed-off-by: Mark Cohen <[email protected]>
@github-actions github-actions bot added the RFC Issues requesting major changes label Dec 14, 2023
@msfroh msfroh added skip-changelog and removed feedback needed Issue or PR needs feedback RFC Issues requesting major changes labels Dec 14, 2023
@github-actions github-actions bot added feedback needed Issue or PR needs feedback RFC Issues requesting major changes labels Dec 14, 2023
Copy link
Contributor

github-actions bot commented Dec 14, 2023

Compatibility status:

Checks if related components are compatible with change 8a3481e

Incompatible components

Incompatible components: [https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git]

Copy link
Contributor

❕ Gradle check result for 9491101: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.index.reindex.UpdateByQueryBasicTests.testMultipleSources
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing
      1 org.opensearch.cluster.allocation.ClusterRerouteIT.testDelayWithALargeAmountOfShards

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Dec 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e265355) 71.41% compared to head (8ce29c5) 71.34%.
Report is 114 commits behind head on main.

❗ Current head 8ce29c5 differs from pull request most recent head 8a3481e. Consider uploading reports for the commit 8a3481e to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11617      +/-   ##
============================================
- Coverage     71.41%   71.34%   -0.08%     
+ Complexity    59397    59367      -30     
============================================
  Files          4923     4923              
  Lines        279212   279178      -34     
  Branches      40595    40581      -14     
============================================
- Hits         199408   199181     -227     
- Misses        63223    63437     +214     
+ Partials      16581    16560      -21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting to write this up, however; I'm not sure that I like this living it its own file, seems like it could be part of the contributing.md How do you see this being used?

1. Read through the issue to understand the request.
1. If the issue does not meet the template requirements, ask the requester for missing details and keep the untriaged label until all the details have been disclosed .
2. Move to 3 if you have all the details on the associated issue.
3. If the issue does not impact the OpenSearch core repository, then transfer the issue to the appropriate team using the Transfer issue button on the lower right hand corner. If you don’t have the access to transfer, then you can add comment by mentioning [@opensearch-project/admin](https://github.com/orgs/opensearch-project/teams/admin) to do the transfer for you. If not, move to step 4.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the @opensearch-project/triage team able to handle these requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think someone in the triage team can handle #3 unless they have maintainer status in both repos. Otherwise, I'd say they can. Are you suggesting that they do handle them? There are 120 people in that team today and I don't think all of them have eyes on this repo. Maybe maintainers or appointed label owners (could be one or more than one) who are contributors (not necessarily maintainers) to core could do this work.

# Triage in OpenSearch Core
## What Do We Do With These Labels Anyway?

Across all repositories in opensearch-project, we have a process called “triage.” The goal of this process is to make sure repository owners and maintainers check new issues to understand them. First - we learn if the issue is security related (e.g. CVE), second - we understand if we have enough information to act on the issue, third - we re-label or respond on the issue. During triage we do not work on these issues, but instead just try to get the issue to a next step. Because the OpenSearch core repository (https://github.com/opensearch-project/OpenSearch) has so many different functions with different sets of contributors working on these functions, repo owners decided to treat these functions like separate repositories when triaging to get through the backlog faster and have more focus. There are a few maintainers that do triage in public (see https://meetup.com/OpenSearch to join) and there will be more coming.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triage is described in the project wide guidance [1], what do you think about referencing that guidance and call out how this project is deviates?

@@ -0,0 +1,49 @@
# Triage in OpenSearch Core
## What Do We Do With These Labels Anyway?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tone is a little casual to my taste, consider changing this title

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Whatever, man. ;)

Suggested change
## What Do We Do With These Labels Anyway?
## Why are there Specialized Labels in Core?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually do have a style guide for the whole project in this regard: https://github.com/opensearch-project/documentation-website/blob/main/STYLE_GUIDE.md#tone

@@ -0,0 +1,49 @@
# Triage in OpenSearch Core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd highly recommend looking at the security team's triage guidance [1], while the format is based off communicating how the triage meeting operates the FAQ format might be useful to highlight edge cases.

1. Every issue is created with an untriaged label by default.
2. No less than once per week, untriaged issues should be reviewed:
1. Read through the issue to understand the request.
1. If the issue does not meet the template requirements, ask the requester for missing details and keep the untriaged label until all the details have been disclosed .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its important that 'untriaged' label don't linger, with this repro's high volume of incoming issues, I might recommend stronger guidance to close with a message such as "Thanks for filing - after reviewing this issue there are details missing that are needed to accept this issue and have closed this issue. Please update the issue to the missing all details and re-open so we know to look at it again."

In the security repo we handle this by having a separate 'triaged' flag that is only added during the triage meeting. Therefore anyone can quickly look at untriaged issues, remove the untriaged, and only during the open meeting are they accepted. YMMV

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a fine line between building the community and continuing to have the problems we have, but I'm open to it. Would definitely like to hear from other maintainers on this one, but I can make the update.

Suggested change
1. If the issue does not meet the template requirements, ask the requester for missing details and keep the untriaged label until all the details have been disclosed .
1. If the issue does not meet the template requirements, close the issue with a comment like: "Thanks for contributing - after reviewing this issue there are details missing that are needed to accept this issue and have closed this issue. Please update the issue to include X, Y, and Z re-open so we know to look at it again. We look forward to working with you on this."

5. At any point a maintainer or issue owner can add or remove a label which will help clarify the work that needs to be done. This RFC will also become a markdown file in the OpenSearch repo and we will add a link to the issue templates in the repo so issue creators can reference this process to help it go smoother.


TABLE 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of items manually included on this table, do we need this table? What would you think about pointing folks to the labels in the project or the template so we don't have to keep this up to date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these labels stand alone in definition. They may need renaming or a reference. I agree that automation and simplicity would be better. If there was a way to have a description as metadata for a label then I think that could work better than a separate read me. Or if the labels were more self-describing.

3. Make sure to enrich the issue with more details by adding your thoughts, add more context to the issue, or ask the author for more clarification if needed ...etc.
4. Remove the untriaged label
5. Add the component level labels based on TABLE 1
5. At any point a maintainer or issue owner can add or remove a label which will help clarify the work that needs to be done. This RFC will also become a markdown file in the OpenSearch repo and we will add a link to the issue templates in the repo so issue creators can reference this process to help it go smoother.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC will also become a markdown file in the OpenSearch repo and we will add a link to the issue templates in the repo so issue creators can reference this process to help it go smoother.

Should this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES! Thanks. Will do.


Across all repositories in opensearch-project, we have a process called “triage.” The goal of this process is to make sure repository owners and maintainers check new issues to understand them. First - we learn if the issue is security related (e.g. CVE), second - we understand if we have enough information to act on the issue, third - we re-label or respond on the issue. During triage we do not work on these issues, but instead just try to get the issue to a next step. Because the OpenSearch core repository (https://github.com/opensearch-project/OpenSearch) has so many different functions with different sets of contributors working on these functions, repo owners decided to treat these functions like separate repositories when triaging to get through the backlog faster and have more focus. There are a few maintainers that do triage in public (see https://meetup.com/OpenSearch to join) and there will be more coming.

Here is an example of the workflow for a new issue in the core repo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a body of maintainers/contributors that agreeing to work in this way, not a generic example.

Note; I'm not part of any such team for core, so I triage on my own whims and timescales, otherwise I'd volunteer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to whatever the maintainers/other contributors want to do here. I haven't seen any other objections to date so this example remains. Can you suggest something more concrete? Not sure I understand what you mean.

Thanks for taking on this review!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any luck finding a first follower(s) on this topic? I am concerned about documenting a process in a document that no one follows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll post this to the Slack #maintainers channel and ask for folks to review this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I side with @peternied here - we should leave the process(es) and workflow(s) up to repository maintainers to decide. The pace of changes varies greatly between repositories.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The triaging that's happening now in core isn't enough, IMO. Do maintainers agree?

That is very fair, we don't have triage meetings for core specifically but we should I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @reta. Are we back to the idea of a core public triage meeting, then? It may make sense to develop the process from there and maintainers can document the process used and decide on useful labels with the community there. @peternied, @reta, I can help set something like that up with you (I'm not a core maintainer so I would be open to facilitating and recommending, but the maintainers should determine the process).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macohen You're trying to document the process that is currently being done, right? Much of this happens internally within Amazon teams, but some of it happens in open community meetings, but to date the actually process isn't documented anywhere in the open. Is that right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @reta. Are we back to the idea of a core public triage meeting, then? I

I would love to see that happening (it is going well for security / ml-commons / search relevance) but I do not see (personally, as of today) how we could make it work: there are many components (or subsystems) in there, search relevance is an example of cutting a small piece of the core and looking into it, but there are many others pieces.

Having one meeting to go over the OpenSearch core is an option (we need at least few leads to be present there) - it may be efficient use of maintaners' time but not users times. Having N meetings (one per component or subsystem) is another option that has own pros and cons.

Depending on the way we would like to approach the problem, the triaging rules may differ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macohen You're trying to document the process that is currently being done, right? Much of this happens internally within Amazon teams, but some of it happens in open community meetings, but to date the actually process isn't documented anywhere in the open. Is that right?

Right. I don't know what the process is across all repos and I think the process has recently become more confusing and unclear so I'm attempting a declarative document that says what should be. If it's the wrong declaration, then let's change the document, but let's document the right thing. And let's keep this specific to OpenSearch repo since that's where the complexity is. I would also say that we need to make this process public and I am even more certain that the maintainers should own it.

Copy link
Contributor

✅ Gradle check result for 830f92d: SUCCESS

@peternied
Copy link
Member

@macohen I've merged the doc describing the public triage meeting that I'll be driving. I think most of the content in this PR is covered by TRIAGING.md.

I don't think we will be merging this PR, but before closing this one out I'd recommend reviewing the other doc to see if there is anything you've got here that you'd like to include in that doc, Happy to review PRs / issues in this space - thanks for spurring this effort forward!

Copy link
Contributor

❕ Gradle check result for 8ce29c5: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

❌ Gradle check result for 8a3481e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Feb 20, 2024
@peternied
Copy link
Member

@macohen I'm going to close out this PR, let us know if you'd like to revisit it

@peternied peternied closed this Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback needed Issue or PR needs feedback RFC Issues requesting major changes skip-changelog stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] New Labels for OpenSearch Core Repo
5 participants