diff --git a/apps/docs/docs/contribute/connect-data/airbyte.md b/apps/docs/docs/contribute/connect-data/airbyte.md index e1ca9707d..c20d7ffff 100644 --- a/apps/docs/docs/contribute/connect-data/airbyte.md +++ b/apps/docs/docs/contribute/connect-data/airbyte.md @@ -1,6 +1,7 @@ --- title: 🏗️ Connect via Airbyte -sidebar_position: 2 +sidebar_position: 7 +sidebar_class_name: hidden --- ## Replicating external databases diff --git a/apps/docs/docs/contribute/connect-data/api.md b/apps/docs/docs/contribute/connect-data/api.md new file mode 100644 index 000000000..6787621b5 --- /dev/null +++ b/apps/docs/docs/contribute/connect-data/api.md @@ -0,0 +1,4 @@ +--- +title: 🏗️ Crawl an API +sidebar_position: 3 +--- diff --git a/apps/docs/docs/contribute/connect-data/cloudquery.md b/apps/docs/docs/contribute/connect-data/cloudquery.md index 2485e6a14..f61702e28 100644 --- a/apps/docs/docs/contribute/connect-data/cloudquery.md +++ b/apps/docs/docs/contribute/connect-data/cloudquery.md @@ -1,6 +1,7 @@ --- title: Connect via CloudQuery -sidebar_position: 3 +sidebar_position: 8 +sidebar_class_name: hidden --- [CloudQuery](https://cloudquery.io) can be used to integrate external data sources diff --git a/apps/docs/docs/contribute/connect-data/dagster.md b/apps/docs/docs/contribute/connect-data/dagster.md new file mode 100644 index 000000000..5a80bdd88 --- /dev/null +++ b/apps/docs/docs/contribute/connect-data/dagster.md @@ -0,0 +1,4 @@ +--- +title: 🏗️ Custom Dagster Assets +sidebar_position: 5 +--- diff --git a/apps/docs/docs/contribute/connect-data/database.md b/apps/docs/docs/contribute/connect-data/database.md new file mode 100644 index 000000000..79fdda3ce --- /dev/null +++ b/apps/docs/docs/contribute/connect-data/database.md @@ -0,0 +1,4 @@ +--- +title: 🏗️ Replicate a Database +sidebar_position: 2 +--- diff --git a/apps/docs/docs/contribute/connect-data/index.md b/apps/docs/docs/contribute/connect-data/index.md index 457e1fcd7..9c952addf 100644 --- a/apps/docs/docs/contribute/connect-data/index.md +++ b/apps/docs/docs/contribute/connect-data/index.md @@ -10,12 +10,14 @@ We're always looking for new data sources to integrate with OSO and deepen our c There are currently the following patterns for integrating new data sources into OSO, in order of preference: -1. [BigQuery public datasets](./bigquery.md): If you can maintain a BigQuery public dataset, this is the preferred and easiest route. -2. [Airbyte plugins](./airbyte.md): Airbyte plugins are the preferred method for crawling APIs. -3. [Database replication via Airbyte](./airbyte.md): Airbyte maintains off-the-shelf plugins for database replication (e.g. from Postgres). -4. [CloudQuery plugins](./cloudquery.md): CloudQuery offers another, more flexible avenue for writing data import plugins. -5. [Files into Google Cloud Storage (GCS)](./gcs.md): You can drop Parquet/CSV files in our GCS bucket for loading into BigQuery. -6. Static files: If the data is high quality and can only be imported via static files, please reach out to us on [Discord](https://www.opensource.observer/discord) to coordinate hand-off. This path is predominantly used for [grant funding data](./funding-data.md). +1. [**BigQuery public datasets**](./bigquery.md): If you can maintain a BigQuery public dataset, this is the preferred and easiest route. +2. [**Database replication**](./database.md): Replicate your database into an OSO dataset (e.g. from Postgres). +3. [**API crawling**](./api.md): Crawl an API by writing a plugin. +4. [**Files into Google Cloud Storage (GCS)**](./gcs.md): You can drop Parquet/CSV files in our GCS bucket for loading into BigQuery. +5. [**Custom Dagster assets**](./dagster.md): Write a custom Dagster asset for other unique data sources. +6. **Static files**: If the data is high quality and can only be imported via static files, please reach out to us on [Discord](https://www.opensource.observer/discord) to coordinate hand-off. This path is predominantly used for [grant funding data](./funding-data.md). +7. (deprecated) [Airbyte plugins](./airbyte.md): Airbyte plugins are the preferred method for crawling APIs. +8. (deprecated) [CloudQuery plugins](./cloudquery.md): CloudQuery offers another, more flexible avenue for writing data import plugins. We generally prefer to work with data partners that can help us regularly index live data that can feed our daily data pipeline. @@ -23,5 +25,6 @@ All data sources should be defined as [software-defined assets](https://docs.dagster.io/concepts/assets/software-defined-assets) in our Dagster configuration. ETL is the messiest, most high-touch part of the OSO data pipeline. -Please reach out to us for help on [Discord](https://www.opensource.observer/discord). +Please reach out to us for help on +[Discord](https://www.opensource.observer/discord). We will happily work with you to get it working. diff --git a/apps/docs/docs/integrate/fork-pipeline.md b/apps/docs/docs/integrate/fork-pipeline.md index 1f155edab..e5dca999e 100644 --- a/apps/docs/docs/integrate/fork-pipeline.md +++ b/apps/docs/docs/integrate/fork-pipeline.md @@ -1,6 +1,7 @@ --- title: 🏗️ Fork the Data Pipeline sidebar_position: 6 +sidebar_class_name: hidden --- :::warning diff --git a/apps/docs/docs/integrate/index.md b/apps/docs/docs/integrate/index.md index 30e8f7c19..7ac3dc10b 100644 --- a/apps/docs/docs/integrate/index.md +++ b/apps/docs/docs/integrate/index.md @@ -8,9 +8,9 @@ That means all source code, data, and infrastructure is publicly available for u - [**Get Started**](../get-started/index.mdx): to setup your Google account for data access and run your first query - [**Data Overview**](./overview/index.mdx): for an overview of all data available -- [**BigQuery Studio Guide**](./query-data.mdx): to quickly query and download any data +- [**API access**](./api.md): to integrate OSO metrics into a live production application +- [**SQL Query Guide**](./query-data.mdx): to quickly query and download any data - [**Python notebooks**](./python-notebooks.md): to do more in-depth data science and processing -- [**Fork the data pipeline**](./fork-pipeline.md): to setup your own data pipeline off any OSO model - [**Connect OSO to 3rd Party tools**](./3rd-party.mdx): like Hex.tech, Tableau, and Metabase -- [**API access**](./api.md): to integrate OSO metrics into a live production application +- [**Fork the data pipeline**](./fork-pipeline.md): to setup your own data pipeline off any OSO model - [**oss-directory**](./oss-directory.md): to leverage [oss-directory](https://github.com/opensource-observer/oss-directory) data separate from OSO