Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-4918] Add a VARIANT data type #3947

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

mihaibudiu
Copy link
Contributor

This PR introduces a VARIANT SQL data type, based on the design in Snowflake: https://docs.snowflake.com/en/sql-reference/data-types-semistructured. VARIANT is a much better base to build support for JSON that the existing functions in Calcite. All the existing Calcite JSON functions assume that JSON is stored as an unparsed string. VARIANT can represent any JSON document in a "binary" form (and much more than JSON!)

The PR is divided into two commits:

  • the parser and validator support
  • the runtime support

Turns out that the validator support is almost trivial. Most of the work is in the runtime, where a new kind of value must be introduced, Variant, which carries runtime type information. For this purpose a new class hierarchy has been introduced, rooted at RuntimeTypeInformation.

Currently the runtime type information carries precision and scale, but I am not sure that these are actually necessary. I will study this a bit more and may amend this in a subsequent commit.

There are no functions supporting VARIANT at this point, but VARIANTS can still be created using casts, and accessed using variant.field, variant[index], and variant['key'].

Subsequent pull requests are expected to add more functions. In particular, functions to parse and unparse JSON into variants.

@julianhyde
Copy link
Contributor

Can you move (or at least copy) that description to the Jira case? Jira cases are our feature specifications. Minimal features thatI would like to see specified:

  • How VARIANT interacts with JDBC
  • Whether there is an 'instanceof' operator
  • Behavior in CAST (and implicit conversion)
  • How variants are converted to and from strings

Copy link
Contributor

@julianhyde julianhyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

One useful addition would be a variant.iq quidem test. With extensive comments, it could read like a tutorial, showing all the things you can do with variants. As a new language feature, specific to Calcite (albeit based on Snowflake) I think it needs better documentation than we usually provide.

import static java.util.Objects.requireNonNull;

/**
* This class represents the type of a SQL expression at runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove 'This class represents'

* a dynamically-typed value, and needs this kind of information.
* We cannot use the very similar RelDataType type since it carries extra
* baggage, like the type system, which is not available at runtime. */
public abstract class RuntimeTypeInformation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the impression that RuntimeTypeInformation is connected with either the enumerable engine, or as part of the Java API (which is used to define Java UDFs). If it is either of those, I don't think it belongs in (a subpackage of) the util package.

}

/**
* Create and return an expression that creates a runtime type that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Create/Creates/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and generally, please follow the best practice of using third-person indicative for method javadoc.

/** This class is the runtime support for values of the VARIANT SQL type. */
public class Variant {
/** Actual value. */
final @Nullable Object value;
Copy link
Contributor

@julianhyde julianhyde Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making value nullable probably complicates a lot of code. (If the experience with RexLiteral is anything to go by.) Could you have a subclass for null variants (or some other solution)?

* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.calcite.util;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The util package feels like the wrong place (for the reasons I described in rtti).

How about org.apache.calcite.runtime? The classes in that package are intended for use by generated code, rather than user code. But it has a similar goal to have few dependencies.

(DateString, TimeString, TimestampString should have been there too.)

@mihaibudiu
Copy link
Contributor Author

@julianhyde I have hopefully implemented your suggestions. The structure is much better this way.
I have added TYPEOF and VARIANTNULL functions. TYPEOF is very useful for debugging.
The variant.iq file will grow as we add more functions.

@suibianwanwank
Copy link
Contributor

suibianwanwank commented Sep 6, 2024

Implicitly cast is an important part of VARIANT type, will it be supported in this PR?

edit: I just noticed you mention this in Jira.

@mihaibudiu
Copy link
Contributor Author

Yes, let's leave implicit casts for later.
I think we can add coercions fairly easily, but other implicit casts may require much more work due to the way Calcite handles them.

| Type | Description | Example type
|:-------- |:---------------------------|:---------------
| ANY | The union of all types |
| Type | Description | Example type |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our markdown style is to not have a | after the last column. Can you remove it.

(Intellij does not respect that style, so its edits need to be reverted.)

Also revert the changes in spacing, which become spurious diffs.

@julianhyde
Copy link
Contributor

@mihaibudiu Thanks for all your edits. Much improved.

+1 after you fix the cosmetic issues in reference.md.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Sep 6, 2024
@mihaibudiu
Copy link
Contributor Author

I will leave this PR open for a few more days in case people want to comment. Otherwise I have tagged it with "merge soon".

Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihaibudiu I reviewed the comments and left some minor comment. I think this PR can be merged.

@@ -1722,6 +1722,68 @@ void testCastToBoolean(CastType castType, SqlOperatorFixture f) {
f.checkNull("cast(null as row(f0 varchar, f1 varchar))");
}

/** Test cases for
* <a href="https://issues.apache.org/jira/browse/CALCITE-4918">
* [CALCITE-4918] Add a VARIANT data type</a>. */
Copy link
Member

@caicancai caicancai Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a redundant space


import org.checkerframework.checker.nullness.qual.Nullable;

/** The VARIANT type has its own notion of null, which is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a line break would be better.

Copy link

sonarcloud bot commented Nov 15, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants