Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2695] feat(doc): Add docs for fileset catalog #2781

Merged
merged 6 commits into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions docs/hadoop-catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: "Hadoop catalog"
slug: /hadoop-catalog
date: 2024-4-2
keyword: hadoop catalog
license: "Copyright 2024 Datastrato Pvt Ltd.
This software is licensed under the Apache License version 2."
---

## Introduction

Hadoop catalog is a fileset catalog that using Hadoop Compatible File System (HCFS) to manage
the storage location of the fileset. Currently, it supports local filesystem and HDFS. For
object storage like S3, GCS, and Azure Blob Storage, you can put the hadoop object store jar like
hadoop-aws into the `$GRAVITINO_HOME/catalogs/hadoop/libs` directory to enable the support.
Gravitino itself hasn't yet tested the object storage support, so if you have any issue,
please create an [issue](https://github.com/datastrato/gravitino/issues).

Note that Gravitino uses Hadoop 3 dependencies to build Hadoop catalog. Theoretically, it should be
compatible with both Hadoop 2.x and 3.x, since Gravitino doesn't leverage any new features in
Hadoop 3. If there's any compatibility issue, please create an [issue](https://github.com/datastrato/gravitino/issues).

## Catalog

### Catalog properties

| Property Name | Description | Default Value | Required | Since Version |
|---------------|-------------------------------------------------|---------------|----------|---------------|
| `location` | The storage location managed by Hadoop catalog. | (none) | No | 0.5.0 |

### Catalog operations

Refer to [Catalog operations](./manage-fileset-metadata-using-gravitino.md#catalog-operations) for more details.

## Schema

### Schema capabilities

The Hadoop catalog supports creating, updating, deleting, and listing schema.

### Schema properties

| Property name | Description | Default value | Required | Since Version |
|---------------|------------------------------------------------|---------------|----------|---------------|
| `location` | The storage location managed by Hadoop schema. | (none) | No | 0.5.0 |

### Schema operations

Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema-operations) for more details.

## Fileset

### Fileset capabilities

- The Hadoop catalog supports creating, updating, deleting, and listing filesets.

### Fileset properties

None.

### Fileset operations

Refer to [Fileset operations](./manage-fileset-metadata-using-gravitino.md#fileset-operations) for more details.
8 changes: 8 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ REST API and the Java SDK. You can use either to manage metadata. See
metalakes.
* [Manage relational metadata using Gravitino](./manage-relational-metadata-using-gravitino.md)
to learn how to manage relational metadata.
* [Manage fileset metadata using Gravitino](./manage-fileset-metadata-using-gravitino.md) to learn
how to manage fileset metadata.

Also, you can find the complete REST API definition in
[Gravitino Open API](./api/rest/gravitino-rest-api), and the
Expand All @@ -72,6 +74,10 @@ Gravitino currently supports the following catalogs:
Gravitino also provides an Iceberg REST catalog service for the Iceberg table format. See the
[Iceberg REST catalog service](./iceberg-rest-service.md) for details.
qqqttt123 marked this conversation as resolved.
Show resolved Hide resolved

**Fileset catalogs:**

* [**Hadoop catalog**](./hadoop-catalog.md)

## Gravitino playground

To experience Gravitino with other components easily, Gravitino provides a playground to run. It
Expand Down Expand Up @@ -99,6 +105,8 @@ Gravitino supports different catalogs to manage the metadata in different source
* [Hive catalog](./apache-hive-catalog.md): a complete guide to using Gravitino to manage Apache Hive data.
* [MySQL catalog](./jdbc-mysql-catalog.md): a complete guide to using Gravitino to manage MySQL data.
* [PostgreSQL catalog](./jdbc-postgresql-catalog.md): a complete guide to using Gravitino to manage PostgreSQL data.
* [Hadoop catalog](./hadoop-catalog.md): a complete guide to using Gravitino to manage fileset
using Hadoop Compatible File System (HCFS).

### Trino connector

Expand Down
Loading
Loading