Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Flint metadata log improvement #121

Open
dai-chen opened this issue Nov 1, 2023 · 0 comments
Open

[FEATURE] Flint metadata log improvement #121

dai-chen opened this issue Nov 1, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@dai-chen
Copy link
Collaborator

dai-chen commented Nov 1, 2023

Is your feature request related to a problem?

PR #110 introduced Flint metadata log initial implementation. In particular, latestId field in Flint metadata points to the latest metadata log entry which is a single OpenSearch doc per Flint index in another OpenSearch index. See more details in PR above.

The problem in this simple implementation:

  1. Only latest metadata log entry is available without history
  2. Index will be always in REFRESHING state if auto refreshed
  3. Mix metrics in metadata log, e.g. for auto refresh index, lastUpdateTime represents heartbeat timestamp

What solution would you like?

  1. [Low Priority] Append metadata log entry rather than update in-place (this may require dedicated OS index or S3 folder)
  2. Figure out how to transit to REFRESHING at beginning of each micro batch and transit back to ACTIVE once complete (exactly same as manual refresh)
  3. Separate metrics such as heartbeat elsewhere and let lastUpdateTime only represents created/updated time of metadata log entry

Example:

{
    "id": 122,
    "kind": "skipping",
    "indexedColumns": [...],
    ...,
    source: {
    "name": "ds_tables.http_logs",
      "addFiles": ["s3://A", "s3://B" ...]
    },
    "state": "Active"
},
{
    "id": 123,
    "kind": "skipping",
    "indexedColumns": [...],
    ...,
    source: {
      "name": "ds_tables.http_logs"
    },
    "state": "Refreshing"
},
{
    "id": 124,
    "kind": "skipping",
    "indexedColumns": [...],
    ...,
    source: {
      "name": "ds_tables.http_logs",
      "addFiles": ["s3://C"]
    },
    "state": "Active"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

1 participant