From 9f76dc3149ba840f80993fa90dac31898b1994b1 Mon Sep 17 00:00:00 2001 From: Raymond Cheng Date: Thu, 27 Jun 2024 16:44:39 -0700 Subject: [PATCH] docs: reorganize ordering of connect data (#1726) --- .../docs/contribute/connect-data/airbyte.md | 3 ++- apps/docs/docs/contribute/connect-data/api.md | 4 ++++ .../docs/contribute/connect-data/cloudquery.md | 3 ++- .../docs/contribute/connect-data/dagster.md | 4 ++++ .../docs/contribute/connect-data/database.md | 4 ++++ apps/docs/docs/contribute/connect-data/index.md | 17 ++++++++++------- apps/docs/docs/integrate/fork-pipeline.md | 1 + apps/docs/docs/integrate/index.md | 6 +++--- 8 files changed, 30 insertions(+), 12 deletions(-) create mode 100644 apps/docs/docs/contribute/connect-data/api.md create mode 100644 apps/docs/docs/contribute/connect-data/dagster.md create mode 100644 apps/docs/docs/contribute/connect-data/database.md diff --git a/apps/docs/docs/contribute/connect-data/airbyte.md b/apps/docs/docs/contribute/connect-data/airbyte.md index e1ca9707..c20d7fff 100644 --- a/apps/docs/docs/contribute/connect-data/airbyte.md +++ b/apps/docs/docs/contribute/connect-data/airbyte.md @@ -1,6 +1,7 @@ --- title: 🏗️ Connect via Airbyte -sidebar_position: 2 +sidebar_position: 7 +sidebar_class_name: hidden --- ## Replicating external databases diff --git a/apps/docs/docs/contribute/connect-data/api.md b/apps/docs/docs/contribute/connect-data/api.md new file mode 100644 index 00000000..6787621b --- /dev/null +++ b/apps/docs/docs/contribute/connect-data/api.md @@ -0,0 +1,4 @@ +--- +title: 🏗️ Crawl an API +sidebar_position: 3 +--- diff --git a/apps/docs/docs/contribute/connect-data/cloudquery.md b/apps/docs/docs/contribute/connect-data/cloudquery.md index 2485e6a1..f61702e2 100644 --- a/apps/docs/docs/contribute/connect-data/cloudquery.md +++ b/apps/docs/docs/contribute/connect-data/cloudquery.md @@ -1,6 +1,7 @@ --- title: Connect via CloudQuery -sidebar_position: 3 +sidebar_position: 8 +sidebar_class_name: hidden --- [CloudQuery](https://cloudquery.io) can be used to integrate external data sources diff --git a/apps/docs/docs/contribute/connect-data/dagster.md b/apps/docs/docs/contribute/connect-data/dagster.md new file mode 100644 index 00000000..5a80bdd8 --- /dev/null +++ b/apps/docs/docs/contribute/connect-data/dagster.md @@ -0,0 +1,4 @@ +--- +title: 🏗️ Custom Dagster Assets +sidebar_position: 5 +--- diff --git a/apps/docs/docs/contribute/connect-data/database.md b/apps/docs/docs/contribute/connect-data/database.md new file mode 100644 index 00000000..79fdda3c --- /dev/null +++ b/apps/docs/docs/contribute/connect-data/database.md @@ -0,0 +1,4 @@ +--- +title: 🏗️ Replicate a Database +sidebar_position: 2 +--- diff --git a/apps/docs/docs/contribute/connect-data/index.md b/apps/docs/docs/contribute/connect-data/index.md index 457e1fcd..9c952add 100644 --- a/apps/docs/docs/contribute/connect-data/index.md +++ b/apps/docs/docs/contribute/connect-data/index.md @@ -10,12 +10,14 @@ We're always looking for new data sources to integrate with OSO and deepen our c There are currently the following patterns for integrating new data sources into OSO, in order of preference: -1. [BigQuery public datasets](./bigquery.md): If you can maintain a BigQuery public dataset, this is the preferred and easiest route. -2. [Airbyte plugins](./airbyte.md): Airbyte plugins are the preferred method for crawling APIs. -3. [Database replication via Airbyte](./airbyte.md): Airbyte maintains off-the-shelf plugins for database replication (e.g. from Postgres). -4. [CloudQuery plugins](./cloudquery.md): CloudQuery offers another, more flexible avenue for writing data import plugins. -5. [Files into Google Cloud Storage (GCS)](./gcs.md): You can drop Parquet/CSV files in our GCS bucket for loading into BigQuery. -6. Static files: If the data is high quality and can only be imported via static files, please reach out to us on [Discord](https://www.opensource.observer/discord) to coordinate hand-off. This path is predominantly used for [grant funding data](./funding-data.md). +1. [**BigQuery public datasets**](./bigquery.md): If you can maintain a BigQuery public dataset, this is the preferred and easiest route. +2. [**Database replication**](./database.md): Replicate your database into an OSO dataset (e.g. from Postgres). +3. [**API crawling**](./api.md): Crawl an API by writing a plugin. +4. [**Files into Google Cloud Storage (GCS)**](./gcs.md): You can drop Parquet/CSV files in our GCS bucket for loading into BigQuery. +5. [**Custom Dagster assets**](./dagster.md): Write a custom Dagster asset for other unique data sources. +6. **Static files**: If the data is high quality and can only be imported via static files, please reach out to us on [Discord](https://www.opensource.observer/discord) to coordinate hand-off. This path is predominantly used for [grant funding data](./funding-data.md). +7. (deprecated) [Airbyte plugins](./airbyte.md): Airbyte plugins are the preferred method for crawling APIs. +8. (deprecated) [CloudQuery plugins](./cloudquery.md): CloudQuery offers another, more flexible avenue for writing data import plugins. We generally prefer to work with data partners that can help us regularly index live data that can feed our daily data pipeline. @@ -23,5 +25,6 @@ All data sources should be defined as [software-defined assets](https://docs.dagster.io/concepts/assets/software-defined-assets) in our Dagster configuration. ETL is the messiest, most high-touch part of the OSO data pipeline. -Please reach out to us for help on [Discord](https://www.opensource.observer/discord). +Please reach out to us for help on +[Discord](https://www.opensource.observer/discord). We will happily work with you to get it working. diff --git a/apps/docs/docs/integrate/fork-pipeline.md b/apps/docs/docs/integrate/fork-pipeline.md index 1f155eda..e5dca999 100644 --- a/apps/docs/docs/integrate/fork-pipeline.md +++ b/apps/docs/docs/integrate/fork-pipeline.md @@ -1,6 +1,7 @@ --- title: 🏗️ Fork the Data Pipeline sidebar_position: 6 +sidebar_class_name: hidden --- :::warning diff --git a/apps/docs/docs/integrate/index.md b/apps/docs/docs/integrate/index.md index 30e8f7c1..7ac3dc10 100644 --- a/apps/docs/docs/integrate/index.md +++ b/apps/docs/docs/integrate/index.md @@ -8,9 +8,9 @@ That means all source code, data, and infrastructure is publicly available for u - [**Get Started**](../get-started/index.mdx): to setup your Google account for data access and run your first query - [**Data Overview**](./overview/index.mdx): for an overview of all data available -- [**BigQuery Studio Guide**](./query-data.mdx): to quickly query and download any data +- [**API access**](./api.md): to integrate OSO metrics into a live production application +- [**SQL Query Guide**](./query-data.mdx): to quickly query and download any data - [**Python notebooks**](./python-notebooks.md): to do more in-depth data science and processing -- [**Fork the data pipeline**](./fork-pipeline.md): to setup your own data pipeline off any OSO model - [**Connect OSO to 3rd Party tools**](./3rd-party.mdx): like Hex.tech, Tableau, and Metabase -- [**API access**](./api.md): to integrate OSO metrics into a live production application +- [**Fork the data pipeline**](./fork-pipeline.md): to setup your own data pipeline off any OSO model - [**oss-directory**](./oss-directory.md): to leverage [oss-directory](https://github.com/opensource-observer/oss-directory) data separate from OSO