Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buggy metadata reflection without a default dataset #1088

Open
JacobHayes opened this issue Jun 27, 2024 · 0 comments · May be fixed by #1089
Open

Buggy metadata reflection without a default dataset #1088

JacobHayes opened this issue Jun 27, 2024 · 0 comments · May be fixed by #1089
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API.

Comments

@JacobHayes
Copy link
Contributor

JacobHayes commented Jun 27, 2024

The get_table_names and get_view_names methods are supposed to return the bare names (no {schema}. prefix) of the resources for a single schema/dataset (where schema=None is the "default schema"). However, the BigQueryDialect implementation returns:

  • bare names for a specific schema: if the connection has a default dataset
  • {schema}. prefixed names for one schema: if the connection doesn't have a default dataset and the schema arg is a string
  • {schema}. prefixed names for all schemas: if the connection doesn't have a default dataset and the schema arg is None

The bolded behaviors, which arise when the connection doesn't have a default dataset, are incorrect and trigger some edge cases:

  • permission errors when access is limited (1) because all datasets are scanned
  • duplicate schema. prefixes / incorrect NoSuchTableErrors (1, 2) because SQLAlchemy adds the prefix when appropriate
  • alembic revision --autogenerate wipes and recreates all tables
    • include_schemas=False (default) should manage a single schema but the BigQueryDialect code would:
      • scan the current schema and incorrectly prefix the schema. resulting in ("schema", "schema.table") tuples that don't match the models represented as ("schema", "table") tuples. This causes alembic to try to delete and recreate all of our models.
      • scan all other schemas. Alembic doesn't see any models defined for them so tries to delete all tables in other schemas.
    • include_schemas=True manages all schemas, but:
      • alembic looks up all existing schemas and then does the reflection for schema=None and all discovered schemas. Since the BigQueryDialect would return prefixed names for all schemas, it would reflect table tuples like (None, "schema1.table"), (None, "schema2.table"), ("schema1", "schema1.table"), ("schema2", "schema2.table"). The None records will seem new and emit CREATE TABLEs while the double prefixed ones later error with NoSuchTableError when looking up their metadata.

Environment details

  • OS type and version: macOS Sonoma 14.5; M1
  • Python version: 3.12.3
  • pip version: 24.0
  • sqlalchemy-bigquery version: 1.11.0

Steps to reproduce

  1. Run the script below
  2. At the breakpoint, notice that Base.metadata.tables has tables from all datasets
  3. Notice that the script errors with NoSuchTableError even though the table exists

Code example

from os import getenv

from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column

dataset_1, dataset_2 = "tmp1", "tmp2"  # bq query "create schema tmp1; create schema tmp2;"
project = getenv("GOOGLE_CLOUD_PROJECT")

engine = create_engine(f"bigquery://{project}")


class Base(DeclarativeBase):
    pass


class User(Base):
    __tablename__ = "user"
    __table_args__ = {"schema": dataset_1}

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]
    fullname: Mapped[str]
    nickname: Mapped[str | None]


class Org(Base):
    __tablename__ = "org"
    __table_args__ = {"schema": dataset_2}

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]


Base.metadata.create_all(engine)
Base.metadata.clear()

Base.metadata.reflect(engine)
# NOTE: Base.metadata.tables *should* be empty (no schema nor a connection default), but is actually populated from all datasets
breakpoint()

# NOTE: This will confusingly raise `sqlalchemy.exc.NoSuchTableError: {schema}.{table}` - it's actually
# looking for `{schema}.{table}` _within_ `schema`.
Base.metadata.reflect(engine, schema=dataset_1)

Stack trace

$ python3 x.py
(Pdb) len(Base.metadata.tables)
15
(Pdb) c
Traceback (most recent call last):
  File "/.../x.py", line 51, in <module>
    Base.metadata.reflect(engine, schema=dataset_1)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 5843, in reflect
    Table(name, self, **reflect_opts)
  File "<string>", line 2, in __new__
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/deprecations.py", line 281, in warned
    return fn(*args, **kwargs)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 432, in __new__
    return cls._new(*args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 486, in _new
    with util.safe_reraise():
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 482, in _new
    table.__init__(name, metadata, *args, _no_init=False, **kw)
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 862, in __init__
    self._autoload(
  File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 894, in _autoload
    conn_insp.reflect_table(
  File "/[...]/.venv/lib/python3.12/site-packages/sqlalchemy/engine/reflection.py", line 1538, in reflect_table
    raise exc.NoSuchTableError(table_name)
sqlalchemy.exc.NoSuchTableError: tmp1.user
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants