You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The get_table_names and get_view_names methods are supposed to return the bare names (no {schema}. prefix) of the resources for a single schema/dataset (where schema=None is the "default schema"). However, the BigQueryDialect implementation returns:
bare names for a specific schema: if the connection has a default dataset
{schema}. prefixed names for one schema: if the connection doesn't have a default dataset and the schema arg is a string
{schema}. prefixed names for all schemas: if the connection doesn't have a default dataset and the schema arg is None
The bolded behaviors, which arise when the connection doesn't have a default dataset, are incorrect and trigger some edge cases:
permission errors when access is limited (1) because all datasets are scanned
duplicate schema. prefixes / incorrect NoSuchTableErrors (1, 2) because SQLAlchemy adds the prefix when appropriate
alembic revision --autogenerate wipes and recreates all tables
include_schemas=False (default) should manage a single schema but the BigQueryDialect code would:
scan the current schema and incorrectly prefix the schema. resulting in ("schema", "schema.table") tuples that don't match the models represented as ("schema", "table") tuples. This causes alembic to try to delete and recreate all of our models.
scan all other schemas. Alembic doesn't see any models defined for them so tries to delete all tables in other schemas.
include_schemas=True manages all schemas, but:
alembic looks up all existing schemas and then does the reflection for schema=None and all discovered schemas. Since the BigQueryDialect would return prefixed names for all schemas, it would reflect table tuples like (None, "schema1.table"), (None, "schema2.table"), ("schema1", "schema1.table"), ("schema2", "schema2.table"). The None records will seem new and emit CREATE TABLEs while the double prefixed ones later error with NoSuchTableError when looking up their metadata.
Environment details
OS type and version: macOS Sonoma 14.5; M1
Python version: 3.12.3
pip version: 24.0
sqlalchemy-bigquery version: 1.11.0
Steps to reproduce
Run the script below
At the breakpoint, notice that Base.metadata.tables has tables from all datasets
Notice that the script errors with NoSuchTableError even though the table exists
Code example
fromosimportgetenvfromsqlalchemyimportcreate_enginefromsqlalchemy.ormimportDeclarativeBase, Mapped, mapped_columndataset_1, dataset_2="tmp1", "tmp2"# bq query "create schema tmp1; create schema tmp2;"project=getenv("GOOGLE_CLOUD_PROJECT")
engine=create_engine(f"bigquery://{project}")
classBase(DeclarativeBase):
passclassUser(Base):
__tablename__="user"__table_args__= {"schema": dataset_1}
id: Mapped[int] =mapped_column(primary_key=True)
name: Mapped[str]
fullname: Mapped[str]
nickname: Mapped[str|None]
classOrg(Base):
__tablename__="org"__table_args__= {"schema": dataset_2}
id: Mapped[int] =mapped_column(primary_key=True)
name: Mapped[str]
Base.metadata.create_all(engine)
Base.metadata.clear()
Base.metadata.reflect(engine)
# NOTE: Base.metadata.tables *should* be empty (no schema nor a connection default), but is actually populated from all datasetsbreakpoint()
# NOTE: This will confusingly raise `sqlalchemy.exc.NoSuchTableError: {schema}.{table}` - it's actually# looking for `{schema}.{table}` _within_ `schema`.Base.metadata.reflect(engine, schema=dataset_1)
Stack trace
$ python3 x.py
(Pdb) len(Base.metadata.tables)
15
(Pdb) c
Traceback (most recent call last):
File "/.../x.py", line 51, in <module>
Base.metadata.reflect(engine, schema=dataset_1)
File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 5843, in reflect
Table(name, self, **reflect_opts)
File "<string>", line 2, in __new__
File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/deprecations.py", line 281, in warned
return fn(*args, **kwargs) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^
File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 432, in __new__
return cls._new(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^
File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 486, in _new
with util.safe_reraise():
File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
raise exc_value.with_traceback(exc_tb)
File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 482, in _new
table.__init__(name, metadata, *args, _no_init=False, **kw)
File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 862, in __init__
self._autoload(
File "/.../.venv/lib/python3.12/site-packages/sqlalchemy/sql/schema.py", line 894, in _autoload
conn_insp.reflect_table(
File "/[...]/.venv/lib/python3.12/site-packages/sqlalchemy/engine/reflection.py", line 1538, in reflect_table
raise exc.NoSuchTableError(table_name)
sqlalchemy.exc.NoSuchTableError: tmp1.user
The text was updated successfully, but these errors were encountered:
The
get_table_names
andget_view_names
methods are supposed to return the bare names (no{schema}.
prefix) of the resources for a single schema/dataset (whereschema=None
is the "default schema"). However, theBigQueryDialect
implementation returns:schema
: if the connection has a default dataset{schema}.
prefixed names for one schema: if the connection doesn't have a default dataset and theschema
arg is a string{schema}.
prefixed names for all schemas: if the connection doesn't have a default dataset and theschema
arg isNone
The bolded behaviors, which arise when the connection doesn't have a default dataset, are incorrect and trigger some edge cases:
schema.
prefixes / incorrectNoSuchTableError
s (1, 2) because SQLAlchemy adds the prefix when appropriatealembic revision --autogenerate
wipes and recreates all tablesinclude_schemas=False
(default) should manage a single schema but the BigQueryDialect code would:schema.
resulting in("schema", "schema.table")
tuples that don't match the models represented as("schema", "table")
tuples. This causes alembic to try to delete and recreate all of our models.include_schemas=True
manages all schemas, but:schema=None
and all discovered schemas. Since theBigQueryDialect
would return prefixed names for all schemas, it would reflect table tuples like(None, "schema1.table"), (None, "schema2.table"), ("schema1", "schema1.table"), ("schema2", "schema2.table")
. TheNone
records will seem new and emitCREATE TABLE
s while the double prefixed ones later error withNoSuchTableError
when looking up their metadata.Environment details
sqlalchemy-bigquery
version: 1.11.0Steps to reproduce
Base.metadata.tables
has tables from all datasetsNoSuchTableError
even though the table existsCode example
Stack trace
The text was updated successfully, but these errors were encountered: