Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a use_query_cache override directly in create_engine or execution_options #1112

Open
gallodustin opened this issue Aug 30, 2024 · 0 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@gallodustin
Copy link

Is your feature request related to a problem? Please describe.
If I want to execute a particular query that bypasses the BQ query cache, I have to instantiate a new client object, pass it the QueryJobConfig with use_query_cache=False, and then pass that client into create_engine, like below:

import sqlalchemy
from google.cloud import bigquery

job_config_kwargs = {}
job_config_kwargs["use_query_cache"] = False
job_config = bigquery.QueryJobConfig(**job_config_kwargs)
...
bq_client = bigquery.Client(
    default_query_job_config=job_config,
    ...
)

create_engine_kwargs = {}
create_engine_kwargs["connect_args"]["client"] = bq_client
...
bq_engine = sqlalchemy.create_engine(url=..., **create_engine_kwargs)
with bq_engine.connect() as conn:
    conn.execute("SELECT ... FROM ...")

The main issue with this is that create_engine is slow for BigQuery compared to most other dialects, perhaps because of oauth2 checking the credentials passed into the BQ client. It also means that calling code has to keep track of the job_config, bq_client and bq_engine objects.

For these reasons I would rather pass a parameter, perhaps a QueryJobConfig object, and let the dialect and the DBAPI handle the client object.

Describe the solution you'd like
I see that BigQueryDialect inherits from DefaultDialect and that Cursor.execute in the BigQuery DBAPI repo does include a parameter for job_config. I wonder if it as easy as overriding do_execute in the dialect within this repo to pass along that job_config?

The caller could then use it like this:

import sqlalchemy
from google.cloud import bigquery

bq_engine = sqlalchemy.create_engine(url=...). # Just once! I can reuse it whether I want to use the cache or not!
job_config = bigquery.QueryJobConfig(use_query_cache=False)
with bq_engine.connect().execution_options(job_config=job_config) as conn:
    conn.execute("query bypassing the cache")
with bq_engine.connect() as conn:
    conn.execute("query potentially hitting the cache")

Describe alternatives you've considered
See the pseudocode above. It works for us, but it's cumbersome and causes a performance hit.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Aug 30, 2024
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants