-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Improve validation for SQL statement #65
Comments
Other validation example:
|
Another validation required. Because
The expression |
Spark structured streaming doesn't support Hive table. Here is the test that identifies a table is Hive or not: $ spark-shell ... --conf spark.flint.datasource.name=myglue scala> import org.apache.spark.sql.flint.{loadTable, parseTableName, qualifyTableName} scala> def getTableProperties(qualifiedTableName: String): java.util.Map[String, String] = { | val (catalog, ident) = parseTableName(spark, qualifiedTableName) | val table = loadTable(catalog, ident) | table.get.properties | } scala> getTableProperties("myglue.stream.lineitem_tiny") res11: java.util.Map[String,String] = {location=s3://.../tpch-lineitem-tiny, provider=JSON, external=true, option.compression=gzip, owner=hadoop} scala> getTableProperties("myglue.ds_tables.http_logs") res12: java.util.Map[String,String] = {location=s3://.../http_logs_partitioned_json_bz2, provider=json, external=true, option.compression=bzip2, owner=hadoop} scala> getTableProperties("myglue.mydatabase.noaa_ghcn_pds") res14: java.util.Map[String,String] = {location=s3://noaa-ghcn-pds/csv, provider=hive, transient_lastDdlTime=1675459327, option.serialization.format=1, external=true, classification=csv, owner=hadoop, option.separatorChar=,} |
Reproduce Issue
|
Proposed SolutionsUsing SHOW TABLE EXTENDED to filter out hive table. The procedures are
Hive table info
Spark datasource table info
|
If auto_refresh is true, user should not specify |
Another Validation Required is restricting the length of the index name. |
SummaryHere is an summary for all issues listed above, especially CREATE Flint index DDL statement. Out of Scope
Index Option Validations
Other Validations
|
Tested checkpoint location validate approach.
|
Finished high priority items in |
Is your feature request related to a problem?
Improve validation for SQL create statement:
a. Validate WITH options and report error if invalid given
b. Check if given column is not supported by skipping/covering index, report error early instead of reporting when submitting DataFrame job at background
What solution would you like?
FlintSparkIndexOptions
The text was updated successfully, but these errors were encountered: