Google_Big_Query

Google Big Query

BigQuery

Introduction to BigQuery

Data Warehouse History

Databases

Primarily used for transaction processing
Difficult for managers to analyze data and create reports when the data resides in numerous databases across an organization (log files, streaming data, etc.)

Data warehouses

Collect data from wide variety of sources
Designed for reporting and data analysis

Why Use BigQuery

Ease of implementation
- Building our own is expensive, time-consuming, and difficult to scale
- With BigQuery, just load data and pay only for what you use
Speed
- Processes billions of rows in seconds
- Handles real-time analysis of streaming data

View BigQuery Public Datasets

SELECT name, state, year FROM `bigquery-public-data.usa_names.usa_1910_2013` LIMIT 100;

SELECT name, state , year FROM `bigquery-public-data.usa_names.usa_1910_2013`;

Running a Query in GCP Console

Login to GCP, go to the console, then select BigQuery from the menu or type BigQuery in the search bar

Click + Compose a new query

In the query editor, enter a valid GoogleSQL query, then click Run

Query 1

 SELECT 
   *
 FROM
   `bigquery-public-data.usa_names.usa_1910_2013`
 ORDER BY
   number DESC
 LIMIT
   10

Query 2

SELECT
  *
FROM
  `bigquery-public-data.usa_names.usa_1910_2013`
WHERE
  gender = 'F'
ORDER BY
  number DESC
LIMIT
  10

Query 3 Query the most common names in the United States between the years 1910 and 2013

SELECT
   name, gender,
   SUM(number) AS total
FROM
   `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY
   name, gender
ORDER BY
   total DESC
LIMIT
 10;

View BigQuery Results

Job Information

Results

Json Results

Execution Details

Chart Preview

Perform a Dry Run

A dry run in BigQuery provides the following information:

estimate of charges in on-demand mode
validation of your query
approximate size and complexity of your query in capacity mode

Dry runs don't use query slots, and you are not charged for performing a dry run. You can use the estimate returned by a dry run to calculate query costs in the pricing calculator.

Go to the BigQuery page
Enter your query in the query editor. If the query is valid, then a check mark automatically appears along with the amount of data that the query will process.

The following query will process 6.2 GB when run.

SELECT COUNT(*) FROM `patents-public-data.google_patents_research.publications_202101` WHERE title LIKE "%covid%";

The follow query This query will process 323.61 GB when run.

SELECT * FROM `patents-public-data.google_patents_research.publications_202101` WHERE title LIKE "%covid%";

BigQuery Pricing

BigQuery pricing has two main components:

Compute (analysis) pricing is the cost to process queries, including SQL queries, user-defined functions, scripts, and certain data manipulation language (DML) and data definition language (DDL) statements.
- Queries (on-demand) $6.25 per TB, the first 1 TB per month is free
Storage - is the cost to store data that you load into BigQuery
- $0.02 per GB per month
- After 90 days with no edits, price drops to $0.01 per GB per month
- No charges for reading data from storage

Cached or non-cached query results

If no destination table specified, query results are cached in Temporary table
Temporary table stays in cache for one day
If you run a query again within 24 hours, there is no charge
If you run a query again and specify a destination table to store results, it won't read from cache, and you will be charge.

Run Query from Python

Install Python Client for Google BigQuery

You can install BigQuery Python package by one of options below:

1. Install by PIP

Open a terminal, run the command:

 pip install google-cloud-bigquery

2. Install by PyCharm

Select View -> Tool Windows -> Python Packages
Type bigquery in search field
Select google-cloud-bigquery in the list
Click Install package

Python Script

Copy following Python code and save to the file sample_bq_query.py

from google.cloud import bigquery
from google.oauth2 import service_account

## construct credentials from service account key file
credentials = service_account.Credentials.from_service_account_file(
    'agu55-big-data-fall2023_srvacct.json')  ## relative file path
## '/home/PGU6/workspace/big-data-class/gcp/bq_srv_acct.json') ## absolute file path

## construct a BigQuery client object
client = bigquery.Client(credentials=credentials)

## Your logics implementation goes below
# Perform a query.
QUERY = (
    'SELECT name, gender FROM `bigquery-public-data.usa_names.usa_1910_2013` '
    'WHERE state = "TX" '
    'LIMIT 5')
query_job = client.query(QUERY)  # API request
rows = query_job.result()  # Waits for query to finish

for row in rows:
    print("By index:  name=" + row[0] + ", gender=" + row[1])  ## by index
    print("By column: name=" + row.name + ", gender=" + row.gender)  ## by column name

Run Python Script

py sample_bq_query.py

or 
python sample_bq_query.py

Outputs

By index:  name=Ruby, gender=F  
By column: name=Ruby, gender=F  
By index:  name=Louise, gender=F
By column: name=Louise, gender=F
By index:  name=Carrie, gender=F
By column: name=Carrie, gender=F
By index:  name=Lucy, gender=F
By column: name=Lucy, gender=F
By index:  name=Elizabeth, gender=F
By column: name=Elizabeth, gender=F

Good Practices

References

Introduction to SQL in BigQuery

An overview of supported statements and SQL dialects in BigQuery.

BigQuery Query Syntax

The syntax for SQL queries in GoogleSQL for BigQuery.

Get Table Properties by Python

How to retrieve the properties of a table for a given table ID using Python

BigQuery Query Optimization & Best Practices

Best practices for optimizing your query performance.

from google.cloud import bigquery from google.oauth2 import service_account

construct credentials from service account key file

credentials = service_account.Credentials.from_service_account_file( './.ssl/pgu6-msba2024-emory_srvacct.json') ## relative file path

'/mnt/c/Users/PGU6/workspace-GBS/student-technology-tools/docs/gcp/.ssl/bq_srv_acct.json') ## absolute file path

construct a BigQuery client object

client = bigquery.Client(credentials=credentials)

Your logics implementation goes below

Perform a query.

QUERY = ( 'SELECT name FROM bigquery-public-data.usa_names.usa_1910_2013 ' 'WHERE state = "TX" ' 'LIMIT 5') query_job = client.query(QUERY) # API request rows = query_job.result() # Waits for query to finish

for row in rows: print(row[0]) ## by index print(row.name) ## by column name

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
README.pdf		README.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google_Big_Query

BigQuery

Introduction to BigQuery

Data Warehouse History

Databases

Data warehouses

Why Use BigQuery

View BigQuery Public Datasets

Running a Query in GCP Console

View BigQuery Results

Job Information

Results

Json Results

Execution Details

Chart Preview

Perform a Dry Run

BigQuery Pricing

Run Query from Python

Install Python Client for Google BigQuery

1. Install by PIP

2. Install by PyCharm

Python Script

Good Practices

References

Introduction to SQL in BigQuery

BigQuery Query Syntax

Get Table Properties by Python

BigQuery Query Optimization & Best Practices

construct credentials from service account key file

'/mnt/c/Users/PGU6/workspace-GBS/student-technology-tools/docs/gcp/.ssl/bq_srv_acct.json') ## absolute file path

construct a BigQuery client object

Your logics implementation goes below

Perform a query.

About

Releases

Packages

dmleaderman/Google_Big_Query

Folders and files

Latest commit

History

Repository files navigation

Google_Big_Query

BigQuery

Introduction to BigQuery

Data Warehouse History

Databases

Data warehouses

Why Use BigQuery

View BigQuery Public Datasets

Running a Query in GCP Console

View BigQuery Results

Job Information

Results

Json Results

Execution Details

Chart Preview

Perform a Dry Run

BigQuery Pricing

Run Query from Python

Install Python Client for Google BigQuery

1. Install by PIP

2. Install by PyCharm

Python Script

Good Practices

References

Introduction to SQL in BigQuery

BigQuery Query Syntax

Get Table Properties by Python

BigQuery Query Optimization & Best Practices

construct credentials from service account key file

'/mnt/c/Users/PGU6/workspace-GBS/student-technology-tools/docs/gcp/.ssl/bq_srv_acct.json') ## absolute file path

construct a BigQuery client object

Your logics implementation goes below

Perform a query.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages