diff --git a/apps/docs/docs/get-started/index.mdx b/apps/docs/docs/get-started/index.mdx
index e94dd8445..5867e18dd 100644
--- a/apps/docs/docs/get-started/index.mdx
+++ b/apps/docs/docs/get-started/index.mdx
@@ -4,6 +4,7 @@ sidebar_position: 1
---
import Link from "@docusaurus/Link";
+import Button from "../../src/components/plasmic/Button";
:::info
There are two easy ways of accessing OSO datasets: through our GraphQL API
@@ -16,8 +17,8 @@ it's best to go direct to the data warehouse.
OSO's data warehouse is currently located in BigQuery on Google Cloud (GCP).
Every data model is made publicly available by a BigQuery dataset.
-See our [data exchange](https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae)
-on Google Analytics Hub for a full list of public data sets.
+See our [data overview](../integrate/overview/index.mdx)
+for a full list of public data sets.
## Sign up for Google Cloud
@@ -25,7 +26,7 @@ Navigate to [Google Cloud](https://cloud.google.com/) and log in.
If this is your first time here, you can sign up for a free cloud account
using your existing Google account.
If you already have a GCP account,
-[skip to the query](#make-your-first-query).
+[skip to the dataset](#subscribe-to-the-oso-production-dataset).
![GCP Signup](./gcp_signup.png)
@@ -44,7 +45,7 @@ Finally, you will be brought to the admin console where you can create a new pro
Feel free to name this GCP project anything you'd like.
(Or you can simply leave the default project name 'My First Project'.)
-## Make your first query
+## Subscribe to the OSO production dataset
Go to the [BigQuery Console](https://console.cloud.google.com/bigquery).
Navigate to **BigQuery** from the left-hand menu and
@@ -59,18 +60,36 @@ This will be your workspace for querying the OSO dataset.
![GCP Welcome](./gcp_welcome.png)
+Click on the following link to subscribe to the OSO production dataset:
+
+
+
+Create a linked dataset in your own GCP project.
+
+![link dataset](../integrate/overview/bigquery_subscribe.png)
+
+## Make your first query
+
Open a new tab by clicking on the `+` icon
on the top right of the console to `Create SQL Query`.
From here you will be able to write any SQL you'd like any OSO dataset.
-For example, you can query the `oso_playground` dataset for
-a sample of collections like this:
+For example, you can query the `oso_production` dataset for
+all available collections like this:
```sql
SELECT *
-FROM `opensource-observer.oso_playground.collections_v1`
+FROM `YOUR_PROJECT_NAME.oso_production.collections_v1`
```
+**Remember to update the project name in the query.**
+
Click **Run** to execute your query.
The results will appear in a table at the bottom of the console.
@@ -79,9 +98,10 @@ The results will appear in a table at the bottom of the console.
The console will help you complete your query as you type, and will also provide you with a preview of the results and computation time. You can save your queries, download the results, and even make simple visualizations directly from the console.
:::tip
-To explore all the OSO datasets available, see [here](https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae).
+To explore all the OSO datasets available, see the
+[Data Overview](../integrate/overview/index.mdx).
-- **oso** contains all production data. This can be quite large depending on the dataset.
+- **oso\_production** contains all production data. This can be quite large depending on the dataset.
- **oso\_playground** contains only the last 2 weeks for every dataset. We recommend using this for development and testing.
:::
@@ -89,10 +109,10 @@ To explore all the OSO datasets available, see [here](https://console.cloud.goog
Now that you're set up, there are many ways to contribute to OSO and integrate the data with your application:
-- [BigQuery Studio Guide](../integrate/query-data)
-- [Write Python notebooks](../integrate/python-notebooks)
-- [Propose an impact model](../contribute/impact-models) to run in our data pipeline
-- [Query the OSO API](../integrate/api) for metrics and impact vectors from your web app
+- [SQL Query Guide](../integrate/query-data.mdx)
+- [Write Python notebooks](../integrate/python-notebooks.md)
+- [Propose an impact model](../contribute/impact-models.md) to run in our data pipeline
+- [Query the OSO API](../integrate/api.md) for metrics and impact vectors from your web app
If you think you'll be an ongoing contributor to OSO,
please apply to join the [Kariba Data Collective](https://www.kariba.network).
diff --git a/apps/docs/docs/integrate/3rd-party.md b/apps/docs/docs/integrate/3rd-party.md
deleted file mode 100644
index 5a80cb67b..000000000
--- a/apps/docs/docs/integrate/3rd-party.md
+++ /dev/null
@@ -1,8 +0,0 @@
----
-title: 🏗️ Connect to 3rd Party Tools
-sidebar_position: 5
----
-
-:::warning
-Coming soon... This page is a work in progress.
-:::
diff --git a/apps/docs/docs/integrate/3rd-party.mdx b/apps/docs/docs/integrate/3rd-party.mdx
new file mode 100644
index 000000000..deef5ce2f
--- /dev/null
+++ b/apps/docs/docs/integrate/3rd-party.mdx
@@ -0,0 +1,88 @@
+---
+title: Connect to 3rd Party Tools
+sidebar_position: 5
+---
+
+import Button from "../../src/components/plasmic/Button";
+
+Because all OSO datasets and models are accessible as
+public datasets on BigQuery,
+connecting and exploring the data
+
+## Subscribe to an OSO dataset
+
+First, we need to subscribe to an OSO dataset in your own
+Google Cloud account.
+You can see all of our available datasets in the
+[Data Overview](./overview/index.mdx).
+
+We recommend starting with the OSO production data pipeline here:
+
+
+
+## Connect your third party tool
+
+BigQuery has built-in support in many BI, notebook, and
+data analysis tools. To see how to connect to a specific tool,
+check out a specific guide:
+
+- [Tableau](https://cloud.google.com/bigquery/docs/analyze-data-tableau)
+- [Metabase](https://www.metabase.com/docs/latest/databases/connections/bigquery)
+- [Hex](https://learn.hex.tech/docs/connect-to-data/data-connections/data-connections-introduction)
+- [Looker](https://cloud.google.com/bigquery/docs/visualize-looker-studio)
+- [Observable](https://observablehq.com/documentation/data/databases/overview)
+- [Databricks](https://docs.databricks.com/en/connect/external-systems/bigquery.html)
+
+## Hex example
+
+For the rest of this guide, we'll use Hex as a running example.
+
+First, you'll need to
+[create a service account](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating)
+in GCP and download the JSON key file.
+Click [here](https://console.cloud.google.com/iam-admin/serviceaccounts?walkthrough_id=iam--create-service-account-keys&start_index=1#step_index=1)
+to navigate to the credentials page on Google Cloud.
+
+Click "+ Create Service Account":
+
+![create service account](./gcp_service_account_create.png)
+
+Grant this new service account the "BigQuery User"
+and "BigQuery Data Viewer" roles:
+
+![permission service account](./gcp_service_account_perm.png)
+
+With the newly created service account,
+navigate to the "Keys" tab, and
+click "Create New Key".
+Create a new JSON key and download the file.
+
+![service account keys](./gcp_service_account_key.png)
+
+In your Hex project, navigate to the "Data browser" pane
+and click "BigQuery" under "Add a data connection".
+
+![Hex connect](./hex_connections.png)
+
+Fill in the connection details.
+Use the GCP project that you previously
+subscribed the OSO dataset into.
+You also need to copy and paste the entire contents of
+the JSON key file you downloaded earlier.
+
+![Hex add connection](./hex_add_connection.png)
+
+Give hex a minute to refresh the available datasets.
+Once that is done, you should be able to browse
+the OSO datasets in the "Data browser" pane.
+Now try running a query on the OSO dataset!
+
+![Hex query](./hex_query.png)
+
+
diff --git a/apps/docs/docs/integrate/api.md b/apps/docs/docs/integrate/api.md
index af3475003..1d4795ad0 100644
--- a/apps/docs/docs/integrate/api.md
+++ b/apps/docs/docs/integrate/api.md
@@ -1,13 +1,13 @@
---
title: Use the GraphQL API
-sidebar_position: 10
+sidebar_position: 2
---
The OSO API currently only allows read-only GraphQL queries against OSO mart models
(e.g. impact metrics, project info).
This API should only be used to fetch data to integrate into a live application in production.
For data exploration, check out the guides on
-[performing queries](./query-data.md)
+[performing queries](./query-data.mdx)
and [Python notebooks](./python-notebooks.md).
## Generate an API key
@@ -34,7 +34,9 @@ All API requests are sent to the following URL:
https://opensource-observer.hasura.app/v1/graphql
```
-You can navigate to our [public GraphQL explorer](https://cloud.hasura.io/public/graphiql?endpoint=https://opensource-observer.hasura.app/v1/graphql) to explore the schema and execute test queries.
+You can navigate to our
+[public GraphQL explorer](https://cloud.hasura.io/public/graphiql?endpoint=https://opensource-observer.hasura.app/v1/graphql)
+to explore the schema and execute test queries.
## Authentication
@@ -86,7 +88,7 @@ query GetCodeMetrics {
## GraphQL Explorer
-The GraphQL schema is automatically generated from [`oso/dbt/models/marts`](https://github.com/opensource-observer/oso/tree/main/dbt/models/marts). Any dbt model defined there will automatically be exported to our GraphQL API. See the guide on [adding DBT models](../contribute/impact-models) for more information on contributing to our marts models.
+The GraphQL schema is automatically generated from [`oso/dbt/models/marts`](https://github.com/opensource-observer/oso/tree/main/dbt/models/marts). Any dbt model defined there will automatically be exported to our GraphQL API. See the guide on [adding DBT models](../contribute/impact-models.md) for more information on contributing to our marts models.
:::warning
Our data pipeline is under heavy development and all table schemas are subject to change until we introduce versioning to marts models.
diff --git a/apps/docs/docs/integrate/bigquery_cost_estimate.png b/apps/docs/docs/integrate/bigquery_cost_estimate.png
new file mode 100644
index 000000000..e03e44f32
Binary files /dev/null and b/apps/docs/docs/integrate/bigquery_cost_estimate.png differ
diff --git a/apps/docs/docs/integrate/fork-pipeline.md b/apps/docs/docs/integrate/fork-pipeline.md
index e09bdeb2e..1f155edab 100644
--- a/apps/docs/docs/integrate/fork-pipeline.md
+++ b/apps/docs/docs/integrate/fork-pipeline.md
@@ -1,6 +1,6 @@
---
title: 🏗️ Fork the Data Pipeline
-sidebar_position: 4
+sidebar_position: 6
---
:::warning
diff --git a/apps/docs/docs/integrate/gcp_service_account_create.png b/apps/docs/docs/integrate/gcp_service_account_create.png
new file mode 100644
index 000000000..55ae57bdc
Binary files /dev/null and b/apps/docs/docs/integrate/gcp_service_account_create.png differ
diff --git a/apps/docs/docs/integrate/gcp_service_account_key.png b/apps/docs/docs/integrate/gcp_service_account_key.png
new file mode 100644
index 000000000..efdf389da
Binary files /dev/null and b/apps/docs/docs/integrate/gcp_service_account_key.png differ
diff --git a/apps/docs/docs/integrate/gcp_service_account_perm.png b/apps/docs/docs/integrate/gcp_service_account_perm.png
new file mode 100644
index 000000000..60bbef723
Binary files /dev/null and b/apps/docs/docs/integrate/gcp_service_account_perm.png differ
diff --git a/apps/docs/docs/integrate/hex_add_connection.png b/apps/docs/docs/integrate/hex_add_connection.png
new file mode 100644
index 000000000..9ee2320f9
Binary files /dev/null and b/apps/docs/docs/integrate/hex_add_connection.png differ
diff --git a/apps/docs/docs/integrate/hex_connections.png b/apps/docs/docs/integrate/hex_connections.png
new file mode 100644
index 000000000..34ede6cc1
Binary files /dev/null and b/apps/docs/docs/integrate/hex_connections.png differ
diff --git a/apps/docs/docs/integrate/hex_query.png b/apps/docs/docs/integrate/hex_query.png
new file mode 100644
index 000000000..4b260aa35
Binary files /dev/null and b/apps/docs/docs/integrate/hex_query.png differ
diff --git a/apps/docs/docs/integrate/index.md b/apps/docs/docs/integrate/index.md
index 48ddfe9f3..30e8f7c19 100644
--- a/apps/docs/docs/integrate/index.md
+++ b/apps/docs/docs/integrate/index.md
@@ -6,11 +6,11 @@ sidebar_position: 0
Open Source Observer is a fully open data pipeline for measuring the impact of open source efforts.
That means all source code, data, and infrastructure is publicly available for use.
-- [Get Started](../get-started): to setup your Google account for data access and run your first query
-- [Data Overview](./overview): for an overview of all data available
-- [BigQuery Studio Guide](./query-data): to quickly query and download any data
-- [Python notebooks](./python-notebooks): to do more in-depth data science and processing
-- [Fork the data pipeline](./fork-pipeline): to setup your own data pipeline off any OSO model
-- [Connect OSO to 3rd Party tools](./3rd-party): like Hex.tech, Tableau, and Metabase
-- [API access](./api): to integrate OSO metrics into a live production application
-- [oss-directory](./oss-directory): to leverage [oss-directory](https://github.com/opensource-observer/oss-directory) data separate from OSO
+- [**Get Started**](../get-started/index.mdx): to setup your Google account for data access and run your first query
+- [**Data Overview**](./overview/index.mdx): for an overview of all data available
+- [**BigQuery Studio Guide**](./query-data.mdx): to quickly query and download any data
+- [**Python notebooks**](./python-notebooks.md): to do more in-depth data science and processing
+- [**Fork the data pipeline**](./fork-pipeline.md): to setup your own data pipeline off any OSO model
+- [**Connect OSO to 3rd Party tools**](./3rd-party.mdx): like Hex.tech, Tableau, and Metabase
+- [**API access**](./api.md): to integrate OSO metrics into a live production application
+- [**oss-directory**](./oss-directory.md): to leverage [oss-directory](https://github.com/opensource-observer/oss-directory) data separate from OSO
diff --git a/apps/docs/docs/integrate/oss-directory.md b/apps/docs/docs/integrate/oss-directory.md
index f07db5842..e3df53ff6 100644
--- a/apps/docs/docs/integrate/oss-directory.md
+++ b/apps/docs/docs/integrate/oss-directory.md
@@ -1,6 +1,6 @@
---
title: Fetch Project Info
-sidebar_position: 12
+sidebar_position: 11
---
:::info
diff --git a/apps/docs/docs/integrate/overview/bigquery_starred_datasets.png b/apps/docs/docs/integrate/overview/bigquery_starred_datasets.png
new file mode 100644
index 000000000..66bb12d97
Binary files /dev/null and b/apps/docs/docs/integrate/overview/bigquery_starred_datasets.png differ
diff --git a/apps/docs/docs/integrate/overview/bigquery_subscribe.png b/apps/docs/docs/integrate/overview/bigquery_subscribe.png
new file mode 100644
index 000000000..f6edb9b35
Binary files /dev/null and b/apps/docs/docs/integrate/overview/bigquery_subscribe.png differ
diff --git a/apps/docs/docs/integrate/overview/index.mdx b/apps/docs/docs/integrate/overview/index.mdx
index 078b8c707..03a95e5ad 100644
--- a/apps/docs/docs/integrate/overview/index.mdx
+++ b/apps/docs/docs/integrate/overview/index.mdx
@@ -13,8 +13,16 @@ import LensLogo from "./lens-protocol.png";
import GitcoinLogo from "./gitcoin.png";
import OpenrankLogo from "./openrank.png";
+First, go to the
+[Get Started](../../get-started/index.mdx)
+page to setup your BigQuery account.
-## OSO Data Pipeline
+## OSO Data Exchange on Analytics Hub
+
+To explore all the OSO datasets available on our BigQuery data exchange,
+see [here](https://console.cloud.google.com/bigquery/analytics-hub/exchanges/projects/87806073973/locations/us/dataExchanges/open_source_observer_190181416ae).
+
+## OSO Production Data Pipeline
@@ -36,7 +44,7 @@ You can find the reference documentation on every data model on
### OSO Mart Models
These are the final product from the data pipeline,
-which is served from our [API](../api).
+which is served from our [API](../api.md).
For example, you can get a list of
[oss-directory projects](https://models.opensource.observer/#!/model/model.opensource_observer.projects_v1)
@@ -47,17 +55,19 @@ select
project_name,
display_name,
description
-from `opensource-observer.oso.projects_v1` LIMIT 10
+from `YOUR_PROJECT_NAME.oso_production.projects_v1` LIMIT 10
```
or [code metrics by project](https://models.opensource.observer/#!/model/model.opensource_observer.code_metrics_by_project_v1).
```sql
select *
-from `opensource-observer.oso.code_metrics_by_project_v1`
+from `YOUR_PROJECT_NAME.oso_production.code_metrics_by_project_v1`
where project_name = 'uniswap'
```
+**Remember to update the project name in the query.**
+
*Note: Unless the model name is versioned, expect that the model is unstable and should not depended on
in a live production application.*
@@ -66,9 +76,9 @@ in a live production application.*
From source data, we produce a "universal event table", currently stored at
[`int_events`](https://models.opensource.observer/#!/model/model.opensource_observer.int_events).
-Each event consists of an [event_type](../../how-oso-works/event)
+Each event consists of an [event_type](../../how-oso-works/event.md)
(e.g. a git commit or contract invocation),
-[to/from artifacts](../../how-oso-works/oss-directory/artifact),
+[to/from artifacts](../../how-oso-works/oss-directory/artifact.md),
a timestamp, and an amount.
From this event table, we aggregate events in downstream models to produce our metrics.
@@ -76,7 +86,8 @@ For example, you may find it cheaper to run queries against
[`int_events_daily_to_project`](https://models.opensource.observer/#!/model/model.opensource_observer.int_events_daily_to_project).
```sql
-SELECT event_source, SUM(amount) FROM `opensource-observer.oso.int_events_daily_to_project`
+SELECT event_source, SUM(amount)
+FROM `YOUR_PROJECT_NAME.oso_production.int_events_daily_to_project`
WHERE project_id = 'XSDgPwFuQVcj57ARcKTGrm2w80KKlqJxaBWF6jZqe7w=' AND event_type = 'CONTRACT_INVOCATION_DAILY_COUNT'
GROUP BY project_id, event_source
```
@@ -180,7 +191,7 @@ ethereum-etl code is covered by the
children={"Subscribe on BigQuery"}
/>
-OSO maintains public datasets for the Superchain,
+OSO is proud to provide public datasets for the Superchain,
backed by our partners at
[Goldsky](https://goldsky.com/).
@@ -193,18 +204,13 @@ We currently have coverage for:
- [PGN](https://models.opensource.observer/#!/source_list/pgn)
- [Zora](https://models.opensource.observer/#!/source_list/zora)
+For term of use, please see the OSO
+[terms and conditions](https://www.opensource.observer/terms).
+
### Farcaster Data
-
-
[Reference documentation](https://models.opensource.observer/#!/source_list/farcaster)
:::warning
@@ -215,14 +221,6 @@ Coming soon...
-
-
[Reference documentation](https://models.opensource.observer/#!/source_list/lens)
:::warning
@@ -233,35 +231,72 @@ Coming soon...
-
-
[Reference documentation](https://models.opensource.observer/#!/source_list/gitcoin)
-:::warning
-Coming soon...
-:::
+[Gitcoin Passport](https://passport.gitcoin.co/)
+is a web3 identity verification protocol.
+OSO and Gitcoin have collaborated to make this dataset
+of address scores available for use in understanding user reputations.
+
+For example, you can can vitalik.eth's passport score:
+```sql
+select
+ passport_address,
+ last_score_timestamp,
+ evidence_rawScore,
+ evidence_threshold,
+from opensource-observer.gitcoin.passport_scores
+where passport_address = '0xd8da6bf26964af9d7eed9e03e53415d37aa96045'
+```
### OpenRank Data
-
-
[Reference documentation](https://models.opensource.observer/#!/source_list/karma3)
-:::warning
-Coming soon...
-:::
\ No newline at end of file
+[OpenRank](https://openrank.com/) is a decentralized reputation protocol based on
+[Eigentrust](https://en.wikipedia.org/wiki/EigenTrust).
+In this dataset, we scored Farcaster IDs.
+
+For example, you can get the reputational score of vitalik.eth
+
+```sql
+select
+ strategy_id,
+ i,
+ v,
+ date
+from opensource-observer.karma3.globaltrust
+where i = 5650
+```
+
+## Subscribe to a dataset
+
+### 1. Data exchange listings
+
+For datasets listed on the OSO public data exchange,
+click on the "Subscribe on BigQuery" button to create a new
+dataset that is linked to OSO.
+
+![subscribe](./bigquery_subscribe.png)
+
+This has a few benefits:
+
+- Data is automatically kept live and real-time with OSO
+- You keep a reference to the data in your own GCP project
+- This gives OSO the ability to track public usage of models
+
+### 2. Direct access to datasets
+
+For datasets without a listing on the OSO public data exchange,
+we make the dataset open to public queries for direct queries.
+Click on the "View on BigQuery" button to go straight
+to the dataset.
+
+You can star the dataset to keep it in your project.
+
+![star](./bigquery_starred_datasets.png)
+
+
diff --git a/apps/docs/docs/integrate/python-notebooks.md b/apps/docs/docs/integrate/python-notebooks.md
index 9c5e2b5a1..b374f392c 100644
--- a/apps/docs/docs/integrate/python-notebooks.md
+++ b/apps/docs/docs/integrate/python-notebooks.md
@@ -1,11 +1,12 @@
---
title: Write Python notebooks
-sidebar_position: 3
+sidebar_position: 4
---
Notebooks are a great way for data scientists to explore data, organize ad-hoc analysis, and share insights. We've included several template notebooks to help you get started working with OSO data. You can find these on [Google Colab](https://drive.google.com/drive/folders/1mzqrSToxPaWhsoGOR-UVldIsaX1gqP0F?usp=drive_link) and in the [community directory](https://github.com/opensource-observer/insights/tree/main/community/notebooks) of our insights repo. We encourage you to share your analysis and visualizations with the OSO community.
-You will need access to the OSO data warehouse to do data science. See our getting started guide [here](../get-started).
+You will need access to the OSO data warehouse to do data science.
+See our getting started guide [here](../get-started/index.mdx).
## Fetching Data
@@ -210,7 +211,7 @@ Alternatively, you can stick to static analysis and export your data from BigQue
#### Obtain a GCP Service Account Key
-This section will walk you through the process of obtaining a GCP service account key and connecting to BigQuery from a Jupyter notebook. If you don't have a GCP account, you will need to create one (see [here](../get-started) for instructions).
+This section will walk you through the process of obtaining a GCP service account key and connecting to BigQuery from a Jupyter notebook. If you don't have a GCP account, you will need to create one (see [here](../get-started/index.mdx) for instructions).
From the [GCP Console](https://console.cloud.google.com/), navigate to the BigQuery API page by clicking **API & Services** > **Enabled APIs & services** > **BigQuery API**.
@@ -495,7 +496,7 @@ An **impact metric** is essentially a SQL query made against the OSO dataset tha
There are a variety of statistical techniques for analyzing data about impact metrics and identifying trends. This section provides a basic example of how to create an impact metric and run a distribution analysis.
:::tip
-The complete specification for an impact metric is available [here](../how-oso-works/impact-metrics/).
+The complete specification for an impact metric is available [here](../how-oso-works/impact-metrics/index.mdx).
:::
### General guide for creating an impact metric
@@ -507,13 +508,13 @@ The complete specification for an impact metric is available [here](../how-oso-w
#### 2. Define the Metric and Selection Criteria
-- **Metric**: Get inspiration from some of our [impact metrics](../how-oso-works/impact-metrics) or [propose a new metric](../contribute/impact-models). Examples: "Number of Full-Time Developer Months", "Number of Dependent Onchain Apps", "Layer 2 Gas Fees", "Number of New Contributors".
+- **Metric**: Get inspiration from some of our [impact metrics](../how-oso-works/impact-metrics/index.mdx) or [propose a new metric](../contribute/impact-models.md). Examples: "Number of Full-Time Developer Months", "Number of Dependent Onchain Apps", "Layer 2 Gas Fees", "Number of New Contributors".
- **Time Period**: Specify a time interval for applying the metric. Examples: "Last 6 months", "Since the project's inception".
- **Selection Filter**: Make explicit the criteria to identify which projects are eligible (or ineligible) to be included in the analysis. Examples: "Projects with developer activity in the last 90 days", "Projects with NPM packages used by at least 5 onchain projects", "Projects with a permissive open source license (e.g., MIT, Apache 2.0) and a codebase that is at least 6 months old".
#### 3. Normalize the Data
-- **Query Logic**: Provide the code that fetches the metrics for each project in the selection set. The query may only make use of datasets that are public and in the OSO data warehouse. (Contribute new pubic datasets [here](../contribute/connect-data).)
+- **Query Logic**: Provide the code that fetches the metrics for each project in the selection set. The query may only make use of datasets that are public and in the OSO data warehouse. (Contribute new pubic datasets [here](../contribute/connect-data/index.md).)
- **Normalization Method**: Choose an appropriate method for normalizing the metric data (e.g., Gaussian distribution, log scale) that fits the metric characteristics. The script in the tutorial (see next section) includes an example of a normalization method you can start with.
#### 4. Optional: Share Your Analysis
diff --git a/apps/docs/docs/integrate/query-data.md b/apps/docs/docs/integrate/query-data.md
deleted file mode 100644
index ff83763a7..000000000
--- a/apps/docs/docs/integrate/query-data.md
+++ /dev/null
@@ -1,10 +0,0 @@
----
-title: 🏗️ Query on BigQuery Studio
-sidebar_position: 2
----
-
-As part of our [open source, open data, open infrastructure](../../blog/open-source-open-data-open-infra) initiative, we are making OSO data as widely available as possible. Use this guide to download the latest data for our own data stack.
-
-:::warning
-Coming soon... This page is a work in progress.
-:::
diff --git a/apps/docs/docs/integrate/query-data.mdx b/apps/docs/docs/integrate/query-data.mdx
new file mode 100644
index 000000000..db4d6b8b7
--- /dev/null
+++ b/apps/docs/docs/integrate/query-data.mdx
@@ -0,0 +1,96 @@
+---
+title: Run SQL Queries
+sidebar_position: 3
+---
+
+import Button from "../../src/components/plasmic/Button";
+
+As part of our
+[open source, open data, open infrastructure](../../blog/open-source-open-data-open-infra)
+initiative, we are making OSO data as widely available as possible.
+Use this guide to download the latest data for our own data stack.
+
+Please refer to the
+[getting started](../get-started/index.mdx)
+guide to first get your BigQuery account setup.
+
+## Subscribe an OSO dataset
+
+First, we need to subscribe to an OSO dataset in your own
+Google Cloud account.
+You can see all of our available datasets in the
+[Data Overview](./overview/index.mdx).
+
+We recommend starting with the OSO production data pipeline here:
+
+
+
+After subscribing, you can reference the dataset
+within your GCP project namespace, for example:
+`YOUR_PROJECT_NAME.oso_production`
+
+
+## Cost Estimation
+
+BigQuery [on-demand pricing](https://cloud.google.com/bigquery/pricing)
+charges by the number of bytes scanned,
+with the first 1 TB free every month.
+
+Therefore, it is helpful to keep track of how many bytes you will be
+scanning before running queries.
+
+![cost estimate](./bigquery_cost_estimate.png)
+
+As long as the query is valid, you should see the
+bytes scanned in the top right corner
+before running your query.
+
+## Exploring the data
+
+The OSO data peipleine is fully visible to queries.
+You can find all of the model definitions under
+[`warehouse/dbt/models/`](https://github.com/opensource-observer/oso/tree/main/warehouse/dbt/models)
+in our
+[monorepo](https://github.com/opensource-observer/oso).
+
+We also maintain reference documentation at
+[https://models.opensource.observer/](https://models.opensource.observer/),
+where you can find the model
+[lineage graph](https://models.opensource.observer/#!/overview?g_v=1).
+These references can help you understand the schema
+of any particular model to form your queries.
+
+Generally speaking there are three types of models:
+1. **Staging models and source data**:
+For each data source, staging models are created to clean and normalize
+the necessary subset of data.
+2. **Intermediate models**:
+Here, we join all data sources into a master event table,
+[`int_events`](https://models.opensource.observer/#!/model/model.opensource_observer.int_events).
+Then, we produce a series of aggregations such as
+[`int_events_daily_to_project`](https://models.opensource.observer/#!/model/model.opensource_observer.int_events_daily_to_project)
+3. **Mart models**:
+From the intermediate models, we create the final metrics models
+that are served from the API.
+
+## Cost optimization
+
+Generally speaking, downstream models are typically smaller
+than upstream models, like source data.
+Therefore, it is generally recommended to use the model
+that is furtherest downstream in the lineage graph
+that can satisfy your query.
+Each stage of the pipeline typically reduces the size of the data
+by 1-2 orders of magnitude.
+
+If there is an intermediate model addition
+(such as a new event type or aggregation)
+that you think can help save costs for others in the future,
+please consider contributing to our
+[data models](../contribute/impact-models.md).