Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXCLUDE in the Presence of Schema #91

Open
alancai98 opened this issue Sep 11, 2024 · 1 comment
Open

EXCLUDE in the Presence of Schema #91

alancai98 opened this issue Sep 11, 2024 · 1 comment

Comments

@alancai98
Copy link
Member

alancai98 commented Sep 11, 2024

Context

The current EXCLUDE RFC draft describes EXCLUDE semantics during evaluation time without schema. EXCLUDE’s behavior in the presence of schema is left in the “Unresolved questions” section — https://github.com/partiql/partiql-docs/blob/exclude-rfc/RFCs/0051-exclude-operator.adoc#unresolved-questions.

The reference implementation, partiql-lang-kotlin, does not currently give an error if we EXCLUDE on a tuple attribute that does not exist during compilation and evaluation. Given a closed schema (i.e. no structs or binding tuples are open), it has been confusing in some cases when an EXCLUDE path does not actually exclude any attributes.

SELECT t.*
EXCLUDE t.this_column_dont_exists
FROM t

^ currently no compile-time error

Cases to consider

Assuming closed schema, the following are some examples to correspond to each of the EXCLUDE RFCs unspecified cases w/ schema,

1. EXCLUDE on a tuple attribute that does not exist

SELECT *
EXCLUDE t.flds.does_not_exist
FROM <<
  {
    'foo': 'bar',
    'flds': {
      'a': {
        'field_x': 1,
        'field_y': 'one'
      },
      'b': {
        'field_x': 2,
        'field_y': 'two'
      },
      'c': {
        'field_x': 3,
        'field_y': 'three'
      }
    }
  }
>> AS t

2. EXCLUDE tuple step on a collection or collection step on a tuple

E.g. w/ invalid collection wildcard step when flds just has structs

SELECT *
EXCLUDE t.flds[*].a
FROM <<
  {
    'foo': 'bar',
    'flds': {
      'a': {
        'field_x': 1,
        'field_y': 'one'
      },
      'b': {
        'field_x': 2,
        'field_y': 'two'
      },
      'c': {
        'field_x': 3,
        'field_y': 'three'
      }
    }
  }
>> AS t

3. EXCLUDE on a collection index out of bounds

SELECT * 
EXCLUDE t.a[42].field_x -- out of bounds
FROM <<
    {
        'a': [
            { 'field_x': 1, 'field_y': 'one' },
            { 'field_x': 2, 'field_y': 'two' },
            { 'field_x': 3, 'field_y': 'three' }
        ]
    }
>> AS t

4. EXCLUDE collection index on a bag

SELECT * 
EXCLUDE t.a[1].field_x
FROM <<
    {
        'a': <<       -- bag here rather than an array
            { 'field_x': 1, 'field_y': 'one' },
            { 'field_x': 2, 'field_y': 'two' },
            { 'field_x': 3, 'field_y': 'three' }
        >>
    }
>> AS t

5. EXCLUDE on a tuple attribute with duplicates

SELECT *
EXCLUDE t.foo
FROM <<
  {
    'foo': 'bar1',
    'foo': 'bar2'
  }
>> AS t

6. EXCLUDE with redundant steps

SELECT *
EXCLUDE 
    t.flds,
    t.flds.a -- t.flds.a is already excluded by `t.flds`
FROM <<
  {
    'foo': 'bar',
    'flds': {
      'a': {
        'field_x': 1,
        'field_y': 'one'
      },
      'b': {
        'field_x': 2,
        'field_y': 'two'
      },
      'c': {
        'field_x': 3,
        'field_y': 'three'
      }
    }
  }
>> AS t

Expected outcomes

  1. Decide on the behavior for above cases in presence of closed schema
  2. Decide on the behavior for above cases for no schema and partial schema
  3. Update the behavior in the implementation(s) and/or conformance tests
@alancai98
Copy link
Member Author

Based on initial discussion w/ the team on 9/11/24, we focused primarily on the first case -- EXCLUDE on a tuple attribute that does not exist assuming closed schema.


Should we keep the existing behavior or give a warning or an error?

Initially, I was inclined to make this case a compile-time error since the existing behavior of no errors would "silently" not exclude any values. However, the team had concerns over

  1. the syntactic equivalence defined in the RFC (using the PIVOT and UNPIVOT) not following this same error behavior
  • The syntactic equivalence does not currently give an error when an EXCLUDE path references an invalid attribute. It may be too strict then to give an error.
  1. the flexibility of giving an error compared to a warning
  • Consider a use case in which we want to add a new field to a table and have a query referencing that table EXCLUDE on that field. If we choose to always error, it may not be possible to effectively update the query before the new field is added to the table.
  1. similar operators with compile-time issues
  • E.g. WHERE clause that may have a conditional always return true or false
  • E.g. EXCEPT set operation where the RHS has something that will never appear in the LHS
  • E.g. CASE WHEN branch that will never be true
  • All of the above examples are similar issues in queries that can be caught at compile-time. They are all syntactically valid and can run during evaluation time. However, they feel more along the lines of a lint/warning rather than an error.

A possible middle-ground for error vs no error is to be more flexible and leave it to implementations to give a warning or error for this case. Consider a phrasing like the following:

Assuming full, closed schema, an implementation may choose to give a compile-time warning or error when it can prove that an EXCLUDE path will never exclude any attributes.


Should we tie invalid EXCLUDE paths to the existing spec behavior (4.1.1) regarding path expressions that always return missing?

As outlined in the PartiQL spec section 4.1.1,

In the presence of schema, PartiQL may return a compile-time error when the query processor can prove that the path
expression is guaranteed to always produce MISSING. The extent of error detection is implementation-specific.

We cannot tie exclude path behavior to the same behavior as path expressions since they follow different typing behavior. As mentioned in the RFC draft,

<exclude path>s are different than PartiQL path expressions

Exclude paths are not expressions evaluating to values and thus we shouldn't apply the same compile-time error behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant