You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some cases the user can benefit from a build-in table analysis API (query) so that it can have a good estimation regarding cost/compute for different operations
What solution would you like?
Flint call /_async_analyze/$tableName would result with the following response:
Based on the statistics provided, here's the summarized information about the table:
- **Table Name**: `otel_traces`
- **Database**: `default`
- **Owner**: `hadoop`
- **Created By**: `Spark 3.3.2-amzn-0`
- **Table Type**: `EXTERNAL`
- **Data Format**: `json`
- **Location**: `s3://flint-data-dp-eu-west-1-beta/oteldemo`
### Statistics
- **Total Size**: 297,027,661,632 bytes (approximately 297 GB)
- **Total Rows**: 5,982,891 rows
### Technical Details
- **Serde Library**: `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe`
- **InputFormat**: `org.apache.hadoop.mapred.SequenceFileInputFormat`
- **OutputFormat**: `org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat`
implementation process:
First metadata stats is collected using the next statement: ANALYZE TABLE tableName COMPUTE STATISTICS
Afterwards the collected data is fetched using the following statements : DESCRIBE EXTENDED tableName
This process can also be collected continuously so that the query engine has updated statistics for query analysis .
The user may be shown this statistics while hovering over the table in the data explorer view
Is your feature request related to a problem?
In some cases the user can benefit from a build-in table analysis API (query) so that it can have a good estimation regarding cost/compute for different operations
What solution would you like?
Flint call
/_async_analyze/$tableName
would result with the following response:implementation process:
ANALYZE TABLE tableName COMPUTE STATISTICS
DESCRIBE EXTENDED tableName
This process can also be collected continuously so that the query engine has updated statistics for query analysis .
The user may be shown this statistics while hovering over the table in the data explorer view
Do you have any additional context?
The text was updated successfully, but these errors were encountered: