Skip to content
This repository has been archived by the owner on Mar 29, 2023. It is now read-only.

Use SQLAlchemy for BigQuery backend #72

Open
datapythonista opened this issue Jun 14, 2021 · 6 comments
Open

Use SQLAlchemy for BigQuery backend #72

datapythonista opened this issue Jun 14, 2021 · 6 comments
Labels
question Further information is requested

Comments

@datapythonista
Copy link
Contributor

@tswast I'd like to know your opinion on this.

Seems to me, that the Ibis SQL compiler (i.e. ibis/backends/base/sql/compiler) is basically reinventing SQLAlchemy. SQLAlchemy is great at generating an SQL string given expressions, and that's exactly what the compiler also does. And now we've got our backends fragmented in the ones that use the Ibis compiler, and the ones using SQLAlchemy.

Seems to me, that for the future would make much more sense to exclusively use SQLAlchemy for the generation of SQL, and remove from Ibis all the code in ibis/backends/base/sql/compiler). That would involve that all backends use their SQLAlchemy engine for Ibis. For BigQuery this seems to be trivial, as there is already a BigQuery SQLAlchemy plugin, and implementing a SQLAlchemy backend is quite straight-forward (and even more after the refactoring I'm finishing).

Is this something that makes sense to you? Have you considered this already? Anything am I missing?

@tswast
Copy link
Collaborator

tswast commented Jun 14, 2021

For BigQuery this makes sense, especially now that we've done some pretty extensive testing of the SQLAlchemy connector for BigQuery.

My concerns:

@tswast
Copy link
Collaborator

tswast commented Jun 14, 2021

I guess another concern is UDFs. I don't believe we have support for this in the SQLAlchemy connector (though it would be cool to have!)

@datapythonista
Copy link
Contributor Author

Thanks for all the information, that's very useful to know. We clearly need to plan this well, if it ever happens. OmniSci seems to be happy with the idea, and I think for clickhouse is very doable. Impala is trickier, and we'll have to see if it can be done, or if it makes sense.

I'll be soon writing a draft of an Ibis roadmap with future plans, so we can discuss them. But I think after 2.0 we can make the base SQLAlchemy not depend on the Ibis compiler to start with. And maybe we can later move to separate projects / extensions the Ibis compiler, and the backends that use them. Then we'll see whether it's worth to maintain the Ibis compiler, or if backends are eventually migrated to SQLAlchemy.

We can discuss further after 2.0, when I have the draft for the roadmap. For now all your comments were very useful to know if researching in this direction made sense or not. Thanks!

@tswast
Copy link
Collaborator

tswast commented Jun 15, 2021

Another thought: I'd still want some custom code to execute the query text once compiled & transform to pandas dataframe. SQLAlchemy -> pandas is possible with just the SQLAlchemy connector, but it's much less performant.

@datapythonista
Copy link
Contributor Author

Another thought: I'd still want some custom code to execute the query text once compiled & transform to pandas dataframe. SQLAlchemy -> pandas is possible with just the SQLAlchemy connector, but it's much less performant.

This make sense. I think the part that really adds value to replace is the generation of the SQL, which if we only use SQLAlchemy it will let us delete few thousands lines of code in Ibis.

Somehow unrelated, there were some discussions on fetching data from backends as Arrow (which can easily converted to pandas with a .to_pandas() call). This is unrelated to SQLAlchemy, and open to discusssion, but I think that should be an improvement.

@tswast
Copy link
Collaborator

tswast commented Aug 18, 2021

Some news: we now have a 1.0 (General Availability) release of the official SQLAlchemy connector for BigQuery (https://github.com/googleapis/python-bigquery-sqlalchemy)

After that and my failed attempt at migration in #91, I'm thinking maybe it is time to migrate.

I don't want to abandon the Ibis 1.x users though, so perhaps we create a v2 branch for now and keep the old code paths around. I think it will take time for tools like https://github.com/GoogleCloudPlatform/professional-services-data-validator to migrate to Ibis 2.0 once its available, especially since there are several custom backends in https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop/third_party/ibis that have yet to be split out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants