Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional dataset characteristics? #838

Open
ranman opened this issue Feb 9, 2021 · 1 comment
Open

Add optional dataset characteristics? #838

ranman opened this issue Feb 9, 2021 · 1 comment

Comments

@ranman
Copy link

ranman commented Feb 9, 2021

It would be useful to have searchable/indexable data on the characteristics of a dataset:

  • total dataset size
  • AVG file size
  • file type
  • etc.
@chueatwork
Copy link
Contributor

Hi @ranman, thanks for your feedback! Some of the attributes you are looking for can be pulled right from the CLI using the --summarize subcommand (https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html). In addition, I would encourage you to check out open.quiltdata.com. Quilt Data compiles bucket-level statistics similar to the ones you've listed (common file types and percentage of bucket, total bucket size) and has demonstrated what they can do on several open datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants