Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]Table Analysis API #113

Open
YANG-DB opened this issue Oct 27, 2023 · 1 comment
Open

[FEATURE]Table Analysis API #113

YANG-DB opened this issue Oct 27, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Oct 27, 2023

Is your feature request related to a problem?

In some cases the user can benefit from a build-in table analysis API (query) so that it can have a good estimation regarding cost/compute for different operations

What solution would you like?
Flint call /_async_analyze/$tableName would result with the following response:

Based on the statistics provided, here's the summarized information about the table:

- **Table Name**: `otel_traces`
- **Database**: `default`
- **Owner**: `hadoop`
- **Created By**: `Spark 3.3.2-amzn-0`
- **Table Type**: `EXTERNAL`
- **Data Format**: `json`
- **Location**: `s3://flint-data-dp-eu-west-1-beta/oteldemo`

### Statistics

- **Total Size**: 297,027,661,632 bytes (approximately 297 GB)
- **Total Rows**: 5,982,891 rows

### Technical Details

- **Serde Library**: `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe`
- **InputFormat**: `org.apache.hadoop.mapred.SequenceFileInputFormat`
- **OutputFormat**: `org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat`

implementation process:

  • First metadata stats is collected using the next statement: ANALYZE TABLE tableName COMPUTE STATISTICS
  • Afterwards the collected data is fetched using the following statements : DESCRIBE EXTENDED tableName

This process can also be collected continuously so that the query engine has updated statistics for query analysis .
The user may be shown this statistics while hovering over the table in the data explorer view

Screenshot 2023-10-27 at 3 37 48 PM

Do you have any additional context?

@dai-chen
Copy link
Collaborator

Not clear what's the issue. Is existing Analyze and DESC statement in Spark not sufficient?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants