Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove filter out serverless cluster and add support to extract index name #8872

Merged
merged 3 commits into from
Nov 21, 2024

Conversation

ananzh
Copy link
Member

@ananzh ananzh commented Nov 15, 2024

Description

Allow extract index name for both serverless and non-serverless clusters Allow different key formats:

  • datasource-id::TIMESERIES:::0
  • datasource-id:::0
  • (non-serverless case)

Issues Resolved

NA

Screenshot

2024-11-14_17-49-18.mp4

Testing the changes

NA

Changelog

  • fix: Remove filter out serverless cluster and add support to extract index name

Check List

  • All tests pass
    • yarn test:jest
    • yarn test:jest_integration
  • New functionality includes testing.
  • New functionality has been documented.
  • Update CHANGELOG.md
  • Commits are signed per the DCO using --signoff

… name

Allow extract index name for both serverless and non-serverless clusters
Allow different key formats:
- datasource-id::TIMESERIES::<index-name>:0
- datasource-id::<index-name>:0
- <index-name> (non-serverless case)

Signed-off-by: Anan Zhuang <[email protected]>
Copy link

codecov bot commented Nov 15, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 60.87%. Comparing base (413697d) to head (cad3eca).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
...ery/query_string/dataset_service/lib/index_type.ts 77.77% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8872      +/-   ##
==========================================
- Coverage   60.93%   60.87%   -0.07%     
==========================================
  Files        3800     3802       +2     
  Lines       90878    91059     +181     
  Branches    14323    14376      +53     
==========================================
+ Hits        55380    55428      +48     
- Misses      31968    32092     +124     
- Partials     3530     3539       +9     
Flag Coverage Δ
Linux_1 29.01% <11.11%> (-0.01%) ⬇️
Linux_2 56.39% <ø> (ø)
Linux_3 37.89% <77.77%> (-0.04%) ⬇️
Linux_4 28.99% <20.00%> (-0.01%) ⬇️
Windows_1 29.02% <11.11%> (-0.01%) ⬇️
Windows_2 56.34% <ø> (ø)
Windows_3 37.89% <77.77%> (-0.04%) ⬇️
Windows_4 28.99% <20.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@kavilla kavilla added backport 2.x discover for discover reinvent labels Nov 15, 2024
aggregations: {
indices: {
buckets: [
// Serverless format with TIMESERIES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think serverless is a product. probably shouldn't mention here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove this comment

// Handle the case of serverless cluster where key format is either:
// - datasource-id::TIMESERIES::<index-name>:0
// - datasource-id::<index-name>:0
// Note: Index names cannot contain ':' or '::' in OpenSearch, so these delimiters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have verified this correct?

Copy link
Member

@kavilla kavilla Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get sign off or buy in from the team formatting indices this way? it might even be worth export a const like DELIMINATER and using it. i think we have one in some plugins cant remember if it is global.

ideally in the future collections functions like workspaces accurate results since the datasource id i believe is getting long. and then our dataset takes the id over the parent and appends itself to it. our url is getting quiet long. Discover doesn't really need to know this information if the data source already knows what information.

also might need some insight on the collection concept. but it will hurt if the format changes.

the over engineered option would be to make an advanced settings that is like collection index format or something

// Note: Index names cannot contain ':' or '::' in OpenSearch, so these delimiters
// are guaranteed to be part of the serverless format, not the index name
const parts = key.split('::');
const lastPart = parts[parts.length - 1] || '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: buckets is being mapped so the key will exist and parts will always have a length

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const lastPart = parts[parts.length - 1] || '';
const lastPart = parts[parts.length - 1];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const lastPart = parts[parts.length - 1] || '';
const lastPart = parts[parts.length - 1] || key;

prolly not '' if you. decide to go with havinga fallback

// Note: Index names cannot contain ':' or '::' in OpenSearch, so these delimiters
// are guaranteed to be part of the serverless format, not the index name
const parts = key.split('::');
const lastPart = parts[parts.length - 1] || '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just verifying there's no way we would run into an exception here right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I verified both : and ::. Error is 400 invalid index name. Was trying to add screenshot but failed.

@@ -154,33 +152,6 @@ describe('s3TypeConfig', () => {
expect(result.children?.[0].title).toBe('DataSource 1');
expect(result.hasNext).toBe(true);
});

it('should filter out data sources with versions lower than 1.0.0', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps its worth to keep this test to but flip it so it shows that we do NOT filter out data sources


return rawResponse.aggregations.indices.buckets.map((bucket: { key: string }) => {
const key = bucket.key;
// Handle the case of serverless cluster where key format is either:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above i think serverless is a product. so prolly should be taken out

kavilla
kavilla previously approved these changes Nov 15, 2024
Copy link
Member

@kavilla kavilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! i think we should get buy in and committment on the format not changing if this is how it will be otherwise we should consider adding a new data type collections

Signed-off-by: Anan Zhuang <[email protected]>
kavilla
kavilla previously approved these changes Nov 15, 2024
@ananzh
Copy link
Member Author

ananzh commented Nov 15, 2024

loll new commit will wipe out changelog then update changelog will wipe out approval
@kavilla could you re-approve? 🫨

@ananzh ananzh merged commit 1cb2511 into opensearch-project:main Nov 21, 2024
69 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 21, 2024
… name (#8872)

* Remove filter out serverless cluster and add support to extract index name

Allow extract index name for both serverless and non-serverless clusters
Allow different key formats:
- datasource-id::TIMESERIES::<index-name>:0
- datasource-id::<index-name>:0
- <index-name> (non-serverless case)

Signed-off-by: Anan Zhuang <[email protected]>

* fix PR comment

Signed-off-by: Anan Zhuang <[email protected]>

* Changeset file for PR #8872 created/updated

---------

Signed-off-by: Anan Zhuang <[email protected]>
Co-authored-by: opensearch-changeset-bot[bot] <154024398+opensearch-changeset-bot[bot]@users.noreply.github.com>
(cherry picked from commit 1cb2511)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants