diff --git a/Cargo.toml b/Cargo.toml index fdac025..8ae210b 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -14,7 +14,6 @@ # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. - [package] name = "catalog2" version = "0.1.0" @@ -23,13 +22,9 @@ license = "Apache-2.0" repository = "https://github.com/cmu-db/15721-s24-catalog2" rust-version = "1.75.0" - - # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] rocket = { version = "0.5.0", features = ["json", "http2"] } -# iceberg = { src = "./libs/iceberg" } -# dotenv = "0.15.0" pickledb = "^0.5.0" derive_builder = "0.20.0" serde_json = "1.0.79" diff --git a/doc/design_doc.md b/doc/design_doc.md index b5a123d..38dddf6 100644 --- a/doc/design_doc.md +++ b/doc/design_doc.md @@ -10,12 +10,18 @@ The goal of this project is to design and implement a **Catalog Service** for an ## Architectural Design We follow the logic model described below. The input of our service comes from execution engine and I/O service. And we will provide metadata to planner and scheduler. We will use [pickleDB](https://docs.rs/pickledb/latest/pickledb/) as the key-value store to store (namespace, tables) and (table_name, metadata) as two (key, value) pairs as local db files. We will use [Rocket](https://rocket.rs) as the web framework handling incoming API traffic. +We follow the logic model described below. The input of our service comes from execution engine and I/O service. And we will provide metadata to planner and scheduler. We will use [pickleDB](https://docs.rs/pickledb/latest/pickledb/) as the key-value store to store (namespace, tables) and (table_name, metadata) as two (key, value) pairs as local db files. +We will use [Rocket](https://rocket.rs) as the web framework handling incoming API traffic. ![system architecture](./assets/system-architecture.png) ### Data Model We adhere to the Iceberg data model, arranging tables based on namespaces, with each table uniquely identified by its name. For every namespace in the database, there are associated list of tables. For every table in the catalog, there are associated metadata, including statistics, version, table-uuid, location, last-column-id, schema, and partition-spec. The parameters for request and response can be referenced from [REST API](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml). We directly import Iceberg-Rust as a starting point. +We adhere to the Iceberg data model, arranging tables based on namespaces, with each table uniquely identified by its name. +For every namespace in the database, there are associated list of tables. +For every table in the catalog, there are associated metadata, including statistics, version, table-uuid, location, last-column-id, schema, and partition-spec. +The parameters for request and response can be referenced from [REST API](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml). We directly import Iceberg-Rust as a starting point. ### Use Cases #### Namespace @@ -35,6 +41,7 @@ get metadeta by {namespace}/{table} * Centralized metadata management achieved by separating data and metadata, reducing complexity and facilitating consistent metadata handling. * Code modularity and clear interfaces facilitate easier updates and improvements. * We adopt the existing kvstore ([pickleDB](https://docs.rs/pickledb/latest/pickledb/)) and server ([Rocket](https://github.com/rwf2/Rocket)) to mitigate the engineering complexity. + * We adopt the existing kvstore ([pickleDB](https://docs.rs/pickledb/latest/pickledb/)) and server ([Rocket](https://github.com/rwf2/Rocket)) to mitigate the engineering complexity. * Testing: * Comprehensive testing plans cover correctness through unit tests and performance through long-running regression tests. Unit tests focus on individual components of the catalog service, while regression tests evaluate system-wide performance and stability. * Other Implementations: