Skip to content
This repository has been archived by the owner on Mar 29, 2023. It is now read-only.

Incorrect generated (bigquery) sql for complex query #86

Open
jcmincke opened this issue Jun 25, 2021 · 1 comment
Open

Incorrect generated (bigquery) sql for complex query #86

jcmincke opened this issue Jun 25, 2021 · 1 comment

Comments

@jcmincke
Copy link

import pandas as pd
import ibis
import ibis_bigquery

ibis.options.interactive = False

conn = ibis_bigquery.connect(
    project_id=project,
    dataset_id='playground')

client = conn.client

pdf_store = pd.DataFrame({"pk_store": range(5),
                          "store": range(5),
                         })

pdf_sale = pd.DataFrame({'fk_store': range(5),
                          "fk_date": range(5),
                          "a": range(5),
                          "b": range(5),
                          "c": range(5)
                          })

pdf_date = pd.DataFrame({'pk_date': range(5),
                         "date": range(5),
                         })

store_table = 'playground.store'
sale_table = 'playground.sale'
date_table = 'playground.date'

job = client.load_table_from_dataframe(pdf_store, store_table)
job = client.load_table_from_dataframe(pdf_sale, sale_table)
job = client.load_table_from_dataframe(pdf_date, date_table)

store_t = conn.table("store", "playground")
sale_t = conn.table("sale", "playground")
date_t = conn.table("date", "playground")

# table loaded

sale_t = store_t.inner_join(sale_t, [sale_t.fk_store == store_t.pk_store]).materialize() \
    .drop(["fk_store"])

t = sale_t.inner_join(date_t, [date_t.pk_date == sale_t.fk_date]).materialize() \
    .drop(["fk_date", "pk_date"])

t1 = t.group_by(["a"]).aggregate(t.c.sum().name("c1"))
t1 = t1.mutate(b=7)

t2 = t.group_by(["b"]).aggregate(t.c.sum().name("c2"))
t2 = t2.mutate(a=9)

t1 = t.view()  # this can be replaced by t1 = t, or any other sub-expr, same error
t2 = t.view()
r = t1.inner_join(t2, ["a", "b"])[t1.a, t1.b]

print(ibis_bigquery.compile(r))

The generated sql is incorrect, look at the alias on line 15. The error seems to occur
when a complex query is translated to sql that contains a WITH clause.

WITH t0 AS (
  SELECT *
  FROM `project.playground.store` t6
    INNER JOIN `project.playground.sale` t7
      ON t7.`fk_store` = t6.`pk_store`
),
t1 AS (
  SELECT `pk_store`, `store`, `fk_date`, `a`, `b`, `c`
  FROM t0
),
t2 AS (
  SELECT *
  FROM t1
    INNER JOIN `project.playground.date` t7
      ON t7.`pk_date` = t0.`fk_date`
),
t3 AS (
  SELECT `pk_store`, `store`, `a`, `b`, `c`, `date`
  FROM t2
)
SELECT t4.`a`, t4.`b`
FROM t3 t4
  INNER JOIN (
    SELECT `pk_store`, `store`, `a`, `b`, `c`, `date`
    FROM t2
  ) t5
    ON (t4.`a` = t5.`a`) AND
       (t4.`b` = t5.`b`) 
@jcmincke jcmincke changed the title Incorrect generated sql for complex query Incorrect generated (bigquery) sql for complex query Jun 25, 2021
@jcmincke
Copy link
Author

There is a workaround: convert a sub-expression to sql and then turn it back to a tableExpr:

e1 = ....
e2 = ....
e3 = f(e1, e2)   # sql generation fails

e1g = conn.sql(ibis_bigquery.compile(e1))
e2 = ...
e3 = f(e1g, e2)   # sql generation (might) succeed(s)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant