Skip to content

Commit

Permalink
Polish the code
Browse files Browse the repository at this point in the history
  • Loading branch information
jerryshao committed Apr 3, 2024
1 parent ab00ec0 commit d0dd632
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 7 deletions.
15 changes: 11 additions & 4 deletions docs/hadoop-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,14 @@ This software is licensed under the Apache License version 2."

Hadoop catalog is a fileset catalog that using Hadoop Compatible File System (HCFS) to manage
the storage location of the fileset. Currently, it supports local filesystem and HDFS. For
object stores like S3, ADLS, and GCS, we haven't yet tested.
object storage like S3, GCS, and Azure Blob Storage, you can put the hadoop object store jar like
hadoop-aws into the `$GRAVITINO_HOME/catalogs/hadoop/libs` directory to enable the support.
Gravitino itself haven't yest tested the object storage support, so if you have any issue,
please create an [issue](https://github.com/datastrato/gravitino/issues).

Note that the Hadoop catalog is built against Hadoop 3, it should be compatible with both Hadoop
2.x and 3.x, since we don't leverage any new features in Hadoop 3. If there's any compatibility
issue, please let us know.
issue, please create an [issue](https://github.com/datastrato/gravitino/issues).

## Catalog

Expand All @@ -33,7 +36,7 @@ Refer to [Catalog operations](./manage-fileset-metadata-using-gravitino.md#catal

### Schema capabilities

The Hadoop catalog supports creating, updating, and deleting schema.
The Hadoop catalog supports creating, updating, deleting, and listing schema.

### Schema properties

Expand All @@ -49,8 +52,12 @@ Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema-

### Fileset capabilities

- The Hadoop catalog supports creating, updating, and deleting filesets.
- The Hadoop catalog supports creating, updating, deleting, and listing filesets.

### Fileset properties

No.

### Fileset operations

Refer to [Fileset operations](./manage-fileset-metadata-using-gravitino.md#fileset-operations) for more details.
6 changes: 3 additions & 3 deletions docs/manage-fileset-metadata-using-gravitino.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ out in Gravitino, which is a collection of files and directories. Users can leve
fileset to manage non-tabular data like training datasets, raw data.

Typically, a fileset is mapping to a directory on a file system like HDFS, S3, ADLS, GCS, etc.
With fileset managed by Gravitino, the non-tabular data can be managed as assets in
Gravitino with an unified way.
With fileset managed by Gravitino, the non-tabular data can be managed as assets together with
tabular data and others in Gravitino with a unified way.

After fileset is created, users can easily access, manage the files/directories through
Fileset's identifier, without needing to know the physical path of the managed datasets. Also, with
Expand All @@ -24,7 +24,7 @@ control mechanism without needing to set access controls to different storages.

To use fileset, we assume that:

- Gravitino has just started, and the host and port is [http://localhost:8090](http://localhost:8090).
- Gravitino server is launched, and the host and port is [http://localhost:8090](http://localhost:8090).
- Metalake has been created.

## Catalog operations
Expand Down

0 comments on commit d0dd632

Please sign in to comment.