From 1a0675342630cb94d428c21a52657a8a5f547a59 Mon Sep 17 00:00:00 2001 From: Jerry Shao Date: Wed, 3 Apr 2024 11:15:57 +0800 Subject: [PATCH] Polish the code --- docs/hadoop-catalog.md | 15 +++++++++++---- docs/manage-fileset-metadata-using-gravitino.md | 6 +++--- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/docs/hadoop-catalog.md b/docs/hadoop-catalog.md index aa158de1ea0..ff27783ad76 100644 --- a/docs/hadoop-catalog.md +++ b/docs/hadoop-catalog.md @@ -11,11 +11,14 @@ This software is licensed under the Apache License version 2." Hadoop catalog is a fileset catalog that using Hadoop Compatible File System (HCFS) to manage the storage location of the fileset. Currently, it supports local filesystem and HDFS. For -object stores like S3, ADLS, and GCS, we haven't yet tested. +object storage like S3, GCS, and Azure Blob Storage, you can put the hadoop object store jar like +hadoop-aws into the `$GRAVITINO_HOME/catalogs/hadoop/libs` directory to enable the support. +Gravitino itself haven't yest tested the object storage support, so if you have any issue, +please create an [issue](https://github.com/datastrato/gravitino/issues). Note that the Hadoop catalog is built against Hadoop 3, it should be compatible with both Hadoop 2.x and 3.x, since we don't leverage any new features in Hadoop 3. If there's any compatibility -issue, please let us know. +issue, please create an [issue](https://github.com/datastrato/gravitino/issues). ## Catalog @@ -33,7 +36,7 @@ Refer to [Catalog operations](./manage-fileset-metadata-using-gravitino.md#catal ### Schema capabilities -The Hadoop catalog supports creating, updating, and deleting schema. +The Hadoop catalog supports creating, updating, deleting, and listing schema. ### Schema properties @@ -49,8 +52,12 @@ Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema- ### Fileset capabilities -- The Hadoop catalog supports creating, updating, and deleting filesets. +- The Hadoop catalog supports creating, updating, deleting, and listing filesets. ### Fileset properties No. + +### Fileset operations + +Refer to [Fileset operations](./manage-fileset-metadata-using-gravitino.md#fileset-operations) for more details. diff --git a/docs/manage-fileset-metadata-using-gravitino.md b/docs/manage-fileset-metadata-using-gravitino.md index 12d1430fe7c..1b4a506890b 100644 --- a/docs/manage-fileset-metadata-using-gravitino.md +++ b/docs/manage-fileset-metadata-using-gravitino.md @@ -14,8 +14,8 @@ out in Gravitino, which is a collection of files and directories. Users can leve fileset to manage non-tabular data like training datasets, raw data. Typically, a fileset is mapping to a directory on a file system like HDFS, S3, ADLS, GCS, etc. -With fileset managed by Gravitino, the non-tabular data can be managed as assets in -Gravitino with an unified way. +With fileset managed by Gravitino, the non-tabular data can be managed as assets together with +tabular data and others in Gravitino with a unified way. After fileset is created, users can easily access, manage the files/directories through Fileset's identifier, without needing to know the physical path of the managed datasets. Also, with @@ -24,7 +24,7 @@ control mechanism without needing to set access controls to different storages. To use fileset, we assume that: - - Gravitino has just started, and the host and port is [http://localhost:8090](http://localhost:8090). + - Gravitino server is launched, and the host and port is [http://localhost:8090](http://localhost:8090). - Metalake has been created. ## Catalog operations