You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem?
One of the key technical challenge in #719 is how to maintain the consistency between base table (S3 data) and derived table (OpenSearch index/materialized view).
What solution would you like?
One solution for the problem is to refresh new data from S3 to OpenSearch incrementally. We are proposing to enhance our query engine by unifying the batch processing and stream processing capability in single architecture as existing solution in Apache Flink and Spark. In particular, the enhancement includes changes in query planning, query execution engine and query plan itself.
What alternatives have you considered?
The alternative solution is rebuild the derived table (full refresh) on user demand or regular basis. This can be done by current batch processing architecture, however, introduce significant overhead for large S3 dataset it will.
penghuo
changed the title
[FEATURE] Query plan enhancement for stream processing
[FEATURE] Object Storage (S3) Data Ingestion through Streaming Query
Oct 21, 2022
Is your feature request related to a problem?
One of the key technical challenge in #719 is how to maintain the consistency between base table (S3 data) and derived table (OpenSearch index/materialized view).
What solution would you like?
One solution for the problem is to refresh new data from S3 to OpenSearch incrementally. We are proposing to enhance our query engine by unifying the batch processing and stream processing capability in single architecture as existing solution in Apache Flink and Spark. In particular, the enhancement includes changes in query planning, query execution engine and query plan itself.
PoC branch: https://github.com/opensearch-project/sql/tree/poc/maximus-m1. User manual and design doc in details will be published later as planned below.
What alternatives have you considered?
The alternative solution is rebuild the derived table (full refresh) on user demand or regular basis. This can be done by current batch processing architecture, however, introduce significant overhead for large S3 dataset it will.
Do you have any additional context?
Phase 1
Goal:
Tasks
Phase 2
Goal:
Tasks
Phase 3
Goal:
Tasks
The text was updated successfully, but these errors were encountered: