Quilt is a versioned data portal for AWS

Quilt in action

open.quiltdata.com is a petabyte-scale open data portal that runs on Quilt
quiltdata.com includes case studies, use cases, videos, and instructions on how to run a private Quilt instance
Versioning data and models for rapid experimentation in machine learning shows how to use Quilt for real world projects

Who is Quilt for?

Quilt is for data-driven teams and offers features for coders (data scientists, data engineers, developers) and business users alike.

What does Quilt do?

Quilt manages data like code so that teams in machine learning, biotech, and analytics can experiment faster, build smarter models, and recover from errors.

How does Quilt work?

Quilt consists of a Python client, web catalog, lambda functions—all of which are open source—plus a suite of backend services and Docker containers orchestrated by CloudFormation.

The latter are available for private use under a paid license on quiltdata.com.

Use cases

Share data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to create an IAM role).
Understand data better through inline documentation (Jupyter notebooks, markdown) and visualizations (Vega, Vega Lite)
Discover related data by indexing objects in ElasticSearch
Model data by providing a home for large data and models that don't fit in git, and by providing immutable versions for objects and data sets (a.k.a. "Quilt Packages")
Decide by broadening data access within the organization and supporting the documentation of decision processes through audit-able versioning and inline documentation

Roadmap

I - Performance and core services

Address performance issues with push (e.g. re-hash)
Provide Presto-DB-powered services for filtering package repos with SQL
Investigate and implement more efficient manifest formats (e.g. Parquet), that scale to 10M keys; consider abbreviated "fast manifests" for lazy browsing
Refactor s3://bucket/.quilt for improved listing and delete performance

II - CI/CD for data

Ability to fork/merge packages
Data quality monitoring

III - Storage agnostic (support Azure, GCP buckets)

Evaluate min.io and ceph.io as shims
Evaluate feasibility of on-prem local storage as a repo

IV - Cloud agnostic

Evaluate K8s and Terraform to replace CloudFormation
Shim lambdas (consider serverless.com)
Shim ElasticSearch (consider SOLR)
Shim IAM via RBAC

Name		Name	Last commit message	Last commit date
Latest commit History 2,568 Commits
.circleci		.circleci
api/python		api/python
catalog		catalog
docs		docs
gendocs		gendocs
lambdas		lambdas
s3-proxy		s3-proxy
.deepsource.toml		.deepsource.toml
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
book.json		book.json
pylintrc		pylintrc
renovate.json		renovate.json
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quilt is a versioned data portal for AWS

Quilt in action

Who is Quilt for?

What does Quilt do?

How does Quilt work?

Use cases

Roadmap

I - Performance and core services

II - CI/CD for data

III - Storage agnostic (support Azure, GCP buckets)

IV - Cloud agnostic

About

Releases

Packages

Languages

License

neumoratx/quilt

Folders and files

Latest commit

History

Repository files navigation

Quilt is a versioned data portal for AWS

Quilt in action

Who is Quilt for?

What does Quilt do?

How does Quilt work?

Use cases

Roadmap

I - Performance and core services

II - CI/CD for data

III - Storage agnostic (support Azure, GCP buckets)

IV - Cloud agnostic

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages