Buggy metadata reflection without a default dataset #1088

JacobHayes · 2024-06-27T04:48:36Z

The get_table_names and get_view_names methods are supposed to return the bare names (no {schema}. prefix) of the resources for a single schema/dataset (where schema=None is the "default schema"). However, the BigQueryDialect implementation returns:

bare names for a specific schema: if the connection has a default dataset
{schema}. prefixed names for one schema: if the connection doesn't have a default dataset and the schema arg is a string
{schema}. prefixed names for all schemas: if the connection doesn't have a default dataset and the schema arg is None

The bolded behaviors, which arise when the connection doesn't have a default dataset, are incorrect and trigger some edge cases:

permission errors when access is limited (1) because all datasets are scanned
duplicate schema. prefixes / incorrect NoSuchTableErrors (1, 2) because SQLAlchemy adds the prefix when appropriate
alembic revision --autogenerate wipes and recreates all tables
- include_schemas=False (default) should manage a single schema but the BigQueryDialect code would:
  - scan the current schema and incorrectly prefix the schema. resulting in ("schema", "schema.table") tuples that don't match the models represented as ("schema", "table") tuples. This causes alembic to try to delete and recreate all of our models.
  - scan all other schemas. Alembic doesn't see any models defined for them so tries to delete all tables in other schemas.
- include_schemas=True manages all schemas, but:
  - alembic looks up all existing schemas and then does the reflection for schema=None and all discovered schemas. Since the BigQueryDialect would return prefixed names for all schemas, it would reflect table tuples like (None, "schema1.table"), (None, "schema2.table"), ("schema1", "schema1.table"), ("schema2", "schema2.table"). The None records will seem new and emit CREATE TABLEs while the double prefixed ones later error with NoSuchTableError when looking up their metadata.

Environment details

OS type and version: macOS Sonoma 14.5; M1
Python version: 3.12.3
pip version: 24.0
sqlalchemy-bigquery version: 1.11.0

Steps to reproduce

Run the script below
At the breakpoint, notice that Base.metadata.tables has tables from all datasets
Notice that the script errors with NoSuchTableError even though the table exists

Code example

from os import getenv

from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column

dataset_1, dataset_2 = "tmp1", "tmp2"  # bq query "create schema tmp1; create schema tmp2;"
project = getenv("GOOGLE_CLOUD_PROJECT")

engine = create_engine(f"bigquery://{project}")


class Base(DeclarativeBase):
    pass


class User(Base):
    __tablename__ = "user"
    __table_args__ = {"schema": dataset_1}

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]
    fullname: Mapped[str]
    nickname: Mapped[str | None]


class Org(Base):
    __tablename__ = "org"
    __table_args__ = {"schema": dataset_2}

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]


Base.metadata.create_all(engine)
Base.metadata.clear()

Base.metadata.reflect(engine)
# NOTE: Base.metadata.tables *should* be empty (no schema nor a connection default), but is actually populated from all datasets
breakpoint()

# NOTE: This will confusingly raise `sqlalchemy.exc.NoSuchTableError: {schema}.{table}` - it's actually
# looking for `{schema}.{table}` _within_ `schema`.
Base.metadata.reflect(engine, schema=dataset_1)

Stack trace

$ python3 x.py
(Pdb) len(Base.metadata.tables)
15
(Pdb) c
Traceback (most recent call last):
  File "/.../x.py", line 51, in <module>
    Base.metadata.reflect(engine, schema=dataset_1)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 5843, in reflect
    Table(name, self, **reflect_opts)
  File "<string>", line 2, in __new__
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/deprecations.py", line 281, in warned
    return fn(*args, **kwargs)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 432, in __new__
    return cls._new(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 486, in _new
    with util.safe_reraise():
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 482, in _new
    table.__init__(name, metadata, *args, _no_init=False, **kw)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 862, in __init__
    self._autoload(
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 894, in _autoload
    conn_insp.reflect_table(
  File "/[...]/.venv/lib/python3.12/site-packages/sqlalchemy/engine/reflection.py", line 1538, in reflect_table
    raise exc.NoSuchTableError(table_name)
sqlalchemy.exc.NoSuchTableError: tmp1.user

The text was updated successfully, but these errors were encountered:

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Jun 27, 2024

blunderbuss-gcf bot assigned Neenu1995 Jun 27, 2024

This was referenced Jun 27, 2024

fix: Fix metadata reflection without a default dataset #1089

Open

Metadata.reflect() fails if user does not have access to all datasets/tables in project. #838

Closed

Neenu1995 assigned chalmerlowe and unassigned Neenu1995 Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buggy metadata reflection without a default dataset #1088

Buggy metadata reflection without a default dataset #1088

JacobHayes commented Jun 27, 2024 •

edited

Loading

Buggy metadata reflection without a default dataset #1088

Buggy metadata reflection without a default dataset #1088

Comments

JacobHayes commented Jun 27, 2024 • edited Loading

Environment details

Steps to reproduce

Code example

Stack trace

JacobHayes commented Jun 27, 2024 •

edited

Loading