Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recover index bug when Flint data index is deleted accidentally #241

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Feb 1, 2024

Description

To address the 2 issues related:

  1. Quick fixed recover index API to clean up metadata log entry if Flint data index is gone. This prevents index stuck in refreshing state and infinite attempt on recover index API.
  2. Added check to prevent FlintJob hang even though no streaming job launched by recover statement.

Documentation

Updated user manual: https://github.com/dai-chen/opensearch-spark/blob/fix-recover-index-for-index-data-deleted/docs/index.md#index-job-management

TODO

  1. Integration test with FlintJob [Checking with @kaituo if possible to add IT]
  2. [BUG] Gracefully terminate index refresh job when Flint index deleted accidentally #244

Testing

Manual test to double confirm. First of all, replicate the problematic scenario as outlined below:

CREATE SKIPPING INDEX ON stream.lineitem_tiny
(l_shipdate VALUE_SET)
WITH ( auto_refresh = true );

# Delete Flint data index
DELETE flint_myglue_stream_lineitem_tiny_skipping_index

# Check index state metadata log
GET .query_execution_request_myglue/_search
      {
        "_index": ".query_execution_request_myglue",
        "_id": "ZmxpbnRfbXlnbHVlX3N0cmVhbV9saW5laXRlbV90aW55X3NraXBwaW5nX2luZGV4",
        "_score": 1,
        "_source": {
          "version": "1.0",
          "latestId": "ZmxpbnRfbXlnbHVlX3N0cmVhbV9saW5laXRlbV90aW55X3NraXBwaW5nX2luZGV4",
          "type": "flintindexstate",
          "state": "refreshing",
          "applicationId": "unknown",
          "jobId": "unknown",
          "dataSourceName": "myglue",
          "jobStartTime": 1706916312007,
          "lastUpdateTime": 1706916452474,
          "error": ""
        }
      }

Now verify the enhanced recover index API:

spark-sql> RECOVER INDEX JOB flint_myglue_stream_lineitem_tiny_skipping_index;
24/02/02 23:48:31 WARN FlintSpark: Cleaning up metadata log as index data has been deleted

# The metadata log is gone
GET .query_execution_request_myglue/_search

Issues Resolved

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen added bug Something isn't working 0.2 backport 0.1 labels Feb 1, 2024
@dai-chen dai-chen self-assigned this Feb 1, 2024
@dai-chen dai-chen marked this pull request as ready for review February 2, 2024 23:50
Signed-off-by: Chen Dai <[email protected]>
@dai-chen dai-chen changed the title Fix recover index bug when index data is deleted Fix recover index bug when Flint data index is deleted accidentally Feb 3, 2024
@vmmusings
Copy link
Member

@dai-chen RECOVER INDEX JOB flint_myglue_stream_lineitem_tiny_skipping_index;
What is the output of this command? Will it be the same in all the cases.

@dai-chen
Copy link
Collaborator Author

dai-chen commented Feb 7, 2024

@dai-chen RECOVER INDEX JOB flint_myglue_stream_lineitem_tiny_skipping_index; What is the output of this command? Will it be the same in all the cases.

We're following Spark DDL and return empty result if success in all Flint DDL statement.

@dai-chen dai-chen merged commit f4744ab into opensearch-project:main Feb 7, 2024
4 checks passed
@opensearch-trigger-bot
Copy link

The backport to 0.1 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/opensearch-spark/backport-0.1 0.1
# Navigate to the new working tree
pushd ../.worktrees/opensearch-spark/backport-0.1
# Create a new branch
git switch --create backport/backport-241-to-0.1
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f4744abf38caeb4f22758f112a32c8881842efad
# Push it to GitHub
git push --set-upstream origin backport/backport-241-to-0.1
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/opensearch-spark/backport-0.1

Then, create a pull request where the base branch is 0.1 and the compare/head branch is backport/backport-241-to-0.1.

dai-chen added a commit to dai-chen/opensearch-spark that referenced this pull request Feb 7, 2024
…pensearch-project#241)

* Clean up metadata log in recover index API

Signed-off-by: Chen Dai <[email protected]>

* Await termination only if there is streaming job running

Signed-off-by: Chen Dai <[email protected]>

* Update user manual

Signed-off-by: Chen Dai <[email protected]>

---------

Signed-off-by: Chen Dai <[email protected]>
@dai-chen dai-chen deleted the fix-recover-index-for-index-data-deleted branch February 7, 2024 23:48
penghuo pushed a commit that referenced this pull request Feb 14, 2024
… accidentally (#247)

* Fix recover index bug when Flint data index is deleted accidentally (#241)

* Clean up metadata log in recover index API

Signed-off-by: Chen Dai <[email protected]>

* Await termination only if there is streaming job running

Signed-off-by: Chen Dai <[email protected]>

* Update user manual

Signed-off-by: Chen Dai <[email protected]>

---------

Signed-off-by: Chen Dai <[email protected]>

* Cherry pick vacuum index changes

Signed-off-by: Chen Dai <[email protected]>

---------

Signed-off-by: Chen Dai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 backport 0.1 bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants