Can I separate I/O from MetaData writes while Table autoloading, so that I can wrap a lock only around MetaData writes? #503
Labels
api: bigquery
Issues related to the googleapis/python-bigquery-sqlalchemy API.
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
TL;DR: while autoloading a Table, is there a way to separate the network I/O from the manipulation of the SQLAlchemy metadata, so I can wrap my lock only around the MetaData manipulation?
I autoload a bunch of Table objects in my app. Sometimes it seems that the BigQuery auth backend times out and this causes the Table loading to fail. (for whatever reason, this seems to mostly happen with auth and not actual connections to BigQuery. Maybe just because auth happens to have a really long timeout and it happens first?)
Also, I load these Table objects in a lock. This is necessary because the SQLAlchemy documentation says MetaData "may not be completely thread-safe" (whatever that means). And I have seen this first-hand, since we had a bug where Table metadata would be corrupted when trying to autoload them without a lock -- we would commonly get Tables that had no columns. So we introduced the lock.
The problem is, sometimes we also get these persistent timeouts with the google auth backend (below there is a stack trace created with
faulthandler.dump_traceback()
that shows this). Since the default timeout is 120 (and I haven't seen a way to override that other than monkey-patching, as an aside), this leads to a pileup of lock acquisition problems in my app, since all of this code runs implicitly in theTable()
instantiaton.What I would like is to be able to separate the network requests for authentication & fetching of table information from the manipulation of SQLAlchemy MetaData, so that I can wrap the lock acquisiton only around the part that updates SQLAlchemy MetaData. Why? Well, it's a best practice to only wrap locks around the code that needs to be serialized, and in this case it's to localize problems so only the threads that are actually having problems die; I don't want to kill all requests over the course of 2 minutes, I only want to kill the one request that happened to timeout.
I don't know if SQLAlchemy really makes this possible, but maybe there's a way to preload the data into a cache, or at least force authentication, before my call to
Table()
?Environment details
sqlalchemy-bigquery
version: 1.4.4Steps to reproduce
Code example
Stack trace
Here's the stack trace from the thread that seems to be stuck on creating a connection for auth (from faulthandler.dump_traceback(), so it doesn't have source code embedded)
The text was updated successfully, but these errors were encountered: