Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hubble Docs #167

Merged
merged 12 commits into from
Jun 30, 2023
8 changes: 8 additions & 0 deletions docs/accessing-data/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"position": 80,
"label": "Accessing Historical Data",
"link": {
"type": "generated-index"
}
}

134 changes: 134 additions & 0 deletions docs/accessing-data/connecting.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
title: "Connecting"
sidebar_position: 10
---

BigQuery offers multiple connection methods to the Hubble dataset. This guide details three common methods:

- [BigQuery UI](#bigquery-ui) - analysts that need to perform ad hoc analysis using SQL
- [BigQuery SDK](#bigquery-sdk) - developers that need to integrate data into applications
- [Looker Studio](#looker-studio) - business people that need to visualize data

## Prerequisites

To access the Hubble dataset, you will need a Google Cloud Project with billing and the BigQuery API enabled. For more information, please follow the instructions provided by [Google Cloud](https://cloud.google.com/bigquery/docs/quickstarts/query-public-dataset-console).

Google does provide a BigQuery Sandbox for free that allows users to explore datasets in a limited capacity.

## BigQuery UI

1. From a browser, open the [crypto-stellar.crypto_stellar](http://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1scrypto-stellar!2scrypto_stellar) dataset.
2. This will open the public dataset `crypto_stellar`, where you can browse its contents in the **Explorer** pane.
3. Click the **star** icon in the Explorer pane. This will favorite the dataset for you. More detailed information about starring resources can be found [here](https://cloud.google.com/bigquery/docs/bigquery-web-ui#star_resources).

> **_Caution:_** Hubble cannot be found directly from the Explorer pane! <br /> You cannot search for the dataset. To view the dataset, you **must** use the [dataset link](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1scrypto-stellar!2scrypto_stellar).

Copy and paste the following example query in the Editor:

<CodeExample>

```sql
select
account_id,
balance
from `crypto-stellar.crypto_stellar.accounts_current`
order by balance desc;
sydneynotthecity marked this conversation as resolved.
Show resolved Hide resolved
```

</CodeExample>

This query will return the XLM balances for all Stellar wallet addresses, ordered from largest to smallest amounts.

## BigQuery SDK

There are multiple [BigQuery API Client Libraries](https://cloud.google.com/bigquery/docs/reference/libraries) available.

The following example uses Python to access the Hubble dataset.

Install the client library locally, and configure your environment to use your Google Cloud Project:

<CodeExample>

```bash
pip install --upgrade google-cloud-bigquery
gcloud config set project PROJECT_ID
sydneynotthecity marked this conversation as resolved.
Show resolved Hide resolved
```

</CodeExample>

Use the Python Interpreter to run the example below to list the tables available in Hubble:

<CodeExample>

```python
from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

dataset_id = 'crypto-stellar.crypto_stellar'

# Make an API request
tables = client.list_tables(dataset_id)

# List the tables found in Hubble
print(f'Tables contained in {dataset_id}':)
for table in tables:
print(f'{table.project}.{table.dataset_id}.{table.table_id}')
```

</CodeExample>

Run the example below to show how to run a query and print the results:
sydneynotthecity marked this conversation as resolved.
Show resolved Hide resolved

<CodeExample>

```python
from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
SELECT
account_id,
balance,
FROM `crypto-stellar.crypto_stellar.accounts_current`
ORDER BY balance DESC
LIMIT 10;
"""

# Make an API request
query_job = client.query(query)

print("The query data:")
for row in query_job:
# Row values can be accessed by field name or index.
print(f'account_id={row[0]}, balance={row["balance"]}')
```

</CodeExample>

There are various ways to extract and load data using BigQuery. See the [BigQuery Client Documentation](https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.client.Client) for more information.

## Looker Studio

[Looker Studio](https://cloud.google.com/looker-studio) is a business intelligence tool that can be used to connect to and visualize data from the Hubble dataset.

To connect Hubble as a data source:

1. Open [Looker Studio](https://lookerstudio.google.com/)
2. Click on **Create** > **Data Source**
3. Search for the BigQuery connector
4. _(Optional)_ Change the name of the data source at the top of the webpage
5. Click _Shared Projects_ > Select your Google Cloud Project
6. Enter `crypto-stellar` as the Shared Project name
7. Click on the Dataset `crypto_stellar`
8. Select the desired table to connect
9. Click `CONNECT` on the top right of the webpage.

And you're connected!

General information about Looker Studio can be found [here](https://support.google.com/looker-studio/?hl=en#topic=6267740).

General information about connecting data sources can be found [here](https://support.google.com/looker-studio/topic/6370331?hl=en&ref_topic=7441382&sjid=14945902445646860578-NA).
sydneynotthecity marked this conversation as resolved.
Show resolved Hide resolved
32 changes: 32 additions & 0 deletions docs/accessing-data/overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: "Overview"
sidebar_position: 0
---

## What is Hubble?

Hubble is an open-source, publicly available dataset that provides a complete historical record of the Stellar network. Similar to Horizon, it ingests and presents the data produced by the Stellar network in a format that is easier to consume than the performance-oriented data representations used by Stellar Core. The dataset is hosted on BigQuery–meaning it is suitable for large, analytic workloads, historical data retrieval and complex data aggregation. Hubble should not be used for real-time data retrieval and cannot submit transactions to the network.
sydneynotthecity marked this conversation as resolved.
Show resolved Hide resolved

This guide describes when to use Hubble and how to connect. For more information regarding underlying data structures, queries and examples, please refer to the Hubble Technical Docs under APIs.
sydneynotthecity marked this conversation as resolved.
Show resolved Hide resolved

## Why Use Hubble?

Some questions are hard to answer with the Horizon API and its underlying PostgreSQL database. This is because its infrastructure is optimized for quick database reads and writes so that it can process online transactions. Horizon can accurately store the results of these smaller transactions, however it sacrifices the ability to execute complex queries easily. The Stellar Network’s data footprint has also increased exponentially, which is creating space constraints and performance issues for Horizon instances that store the full historical record.

This is where Hubble comes in. It is optimized to execute complex queries and scan large amounts of data. Hubble can store orders of magnitude more data than Horizon and will not run into the same storage constraints. Queries that require pagination in Horizon or timeout can be returned in a single query. Hubble empowers users to explore, analyze, and derive meaningful conclusions from the data without the burden of maintaining a database.

Users should be aware of the following limitations:

- Hubble is read-only; it cannot interact with the Stellar Network.
- The database is updated in intraday batches. There is no guarantee for same-day data availability.
- The SDF hosts a public instance of Hubble, and end users incur the cost to execute queries. Visit the [BigQuery Pricing Page](https://cloud.google.com/bigquery/pricing#analysis_pricing_models) to learn more.

## Why We Chose BigQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think worth providing one more section after this at bottom as brief summary of next steps, to guide the reader into 'how to use'

Next, using Hubble

  1. Data model
  2. Connecting
  3. Queries
  4. Optimizing
  5. ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not suggesting more text, just links to the other existing pages, the sequence may provide 'getting started' for readers that want to proceed.


BigQuery is Google Cloud’s data warehouse that comes with some key features that fulfill Stellar’s analytic needs.

First, BigQuery allows anyone to make a dataset publicly available. This means that the SDF can contribute open source repositories to build and maintain a data warehouse and also host a public instance.

BigQuery also separates storage from compute, which makes it sustainable to host a public instance. The maintainer only has to pay the cost of storage without incurring the cost of the analytics running on the dataset.

Most importantly, BigQuery is the de facto platform for blockchain datasets. By selecting BigQuery, Stellar Network data is located with other blockchain data, which allows for cross-chain analytics.
56 changes: 56 additions & 0 deletions docs/accessing-data/viewing-metadata.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: "Viewing Metadata"
sidebar_position: 20
---

Hubble publishes metadata which can help users determine which tables to query, how frequently the dataset updates, and general information about the dataset.

There are two ways to access this information:

## BigQuery Explorer

When accessing Hubble from its starred link, the Explorer pane will load metadata about the `crypto-stellar.crypto_stellar` dataset.

Use the Toggle to view the contents of the Dataset. Clicking a table name will load the following:

- _Schema_ - detailed information about the table schema, including column definitions and data types. Viewing the schema helps write a SQL query
- _Details_ - general information about the table itself, including partitioning, clustering and table size. Viewing details helps with query optimization
- _Preview_ - raw sample data from the table. The data presented is the equivalent of running a `SELECT *` statement

## INFORMATION_SCHEMA

BigQuery supports read-only, system-defined views that provide metadata information about BigQuery objects. The views can be queried via SQL from the BigQuery UI or Client Libraries.

> _*Keep in Mind*_: Queries against the `INFORMATION_SCHEMA` cannot be cached and **will** incur data processing charges for each execution.

From the BigQuery Editor, the following query will list all tables in Hubble:

<CodeExample>

```sql
# List all tables in Hubble
#standardSQL
select *
from `crypto-stellar.crypto_stellar`.INFORMATION_SCHEMA.TABLES;
```

</CodeExample>

If you want details on a particular table, you can return the table schema:

<CodeExample>

```sql
# List all columns for the accounts table
select table_name,
column_name,
is_nullable,
data_type,
is_partitioning_column
from `crypto-stellar.crypto_stellar`.INFORMATION_SCHEMA.COLUMNS
where table_name = "accounts";
```

</CodeExample>

More on the `INFORMATION_SCHEMA` can be found [here](https://cloud.google.com/bigquery/docs/information-schema-intro).
2 changes: 1 addition & 1 deletion docs/encyclopedia/_category_.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"position": 90,
"position": 100,
"label": "Encyclopedia",
"link": {
"type": "generated-index"
Expand Down
2 changes: 1 addition & 1 deletion docs/glossary.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Glossary"
sidebar_position: 100
sidebar_position: 110
---

### Account
Expand Down
2 changes: 1 addition & 1 deletion docs/tools-and-sdks.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Tools and SDKs"
sidebar_position: 80
sidebar_position: 90
---

## Tools
Expand Down