Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vacuum index API and SQL support #189

Merged

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Dec 11, 2023

Description

This is the first PR for #104. PRs related to this work are planned below:

  • [Current PR] Add vacuumIndex API and SQL support: Added vacuumIndex() API and VACUUM SQL statement for skipping, covering and MV. It deletes both Flint index data and metadata log entry.
  • [Pending] Change deleteIndex API and DROP index statement to logical delete
  • [Pending] Change query rewriter to ignore logical deleted index

Documentation

Testing

# Mock logical delete since drop index PR coming later
PUT .query_execution_request_myglue/_doc/ZmxpbnRfbXlnbHVlX2RzX3RhYmxlc19odHRwX2xvZ3Nfc2tpcHBpbmdfaW5kZXg=
{
  "version": "1.0",
  "latestId": "ZmxpbnRfbXlnbHVlX2RzX3RhYmxlc19odHRwX2xvZ3Nfc2tpcHBpbmdfaW5kZXg=",
  "type": "flintindexstate",
  "state": "deleted",
  "applicationId": "unknown",
  "jobId": "unknown",
  "dataSourceName": "myglue",
  "jobStartTime": 0,
  "lastUpdateTime": 1702329935279,
  "error": ""
}

# Run vacuum statement
spark-sql> VACUUM SKIPPING INDEX ON ds_tables.http_logs;

# Both index data and metadata log entry deleted
GET flint_myglue_ds_tables_http_logs_skipping_index/_mapping
{
  "error": {
    ...
    "type": "index_not_found_exception",
    "reason": "no such index [flint_myglue_ds_tables_http_logs_skipping_index]",
    "index": "flint_myglue_ds_tables_http_logs_skipping_index",
    "resource.id": "flint_myglue_ds_tables_http_logs_skipping_index",
    "resource.type": "index_or_alias",
    "index_uuid": "_na_"
  },
  "status": 404
}
GET .query_execution_request_myglue/_doc/ZmxpbnRfbXlnbHVlX2RzX3RhYmxlc19odHRwX2xvZ3Nfc2tpcHBpbmdfaW5kZXg=
{
  "_index": ".query_execution_request_myglue",
  "_id": "ZmxpbnRfbXlnbHVlX2RzX3RhYmxlc19odHRwX2xvZ3Nfc2tpcHBpbmdfaW5kZXg=",
  "found": false
}

TODO

  1. Will add SQL IT later due to dependency on logical delete change in next PR
  2. [TBD] Figure out how to clean up checkpoint data

Issues Resolved

#104

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen added the enhancement New feature or request label Dec 11, 2023
@dai-chen dai-chen self-assigned this Dec 11, 2023
@dai-chen dai-chen added the 0.2 label Jan 5, 2024
@dai-chen dai-chen merged commit 804b3aa into opensearch-project:main Jan 5, 2024
4 of 5 checks passed
@dai-chen dai-chen deleted the add-vacuum-index-statement branch January 5, 2024 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants