Name		Name	Last commit message	Last commit date
parent directory ..
agha-gdr-demo		agha-gdr-demo
giab		giab
public-drs-htsget		public-drs-htsget
README.md		README.md

README.md

Manifest-based Data Submission

Q. How do I submit pre-existing backlog of big data, like 1 PetaByte?

A. You will need file object manifest, minimally. Continue reading for how.

Q. Do we physically need to move pre-existing data buckets into Gen3 instance?

A. No. Your existing data can be kept as-is where they are. This includes some data volume in HPC. You just index them using manifest. Continue reading for how.

Context

In Gen3, there are couple ways to make data submission.
Typical end-user data submission for smaller dataset (couple of files) from your desktop/workstation follow normal data submission procedure.
However, this is not always the case such that there exists backlog of data stores in Cloud Storage buckets or data volumes on HPC cluster.
You can drive all these data that sit in elsewhere into Gen3 Indexing and catalogue metadata Graph model.
This kind of data submission in Gen3 is known as couple names:
- manifest-based data submission
- out-of-band data ingestion
- DIIRM indexing (DIIRM - Data Ingestion, Integration, and Release Management)

Notes

Each sub-directory contains demo example on this manifest-based data submission with some mock data or public dataset for exploration; including Consent & Data Access (ACL), interoperability (DRS & Htsget), external bucket, so on.

REF

For more details on technical:

Bucket manifest:

Note that, this essentially entails how you generate manifest file for existing Big data. This process can be outside Gen3 with some Batch Job running in Cloud (AWS/GCP) or HPC.

Or, you could also utilise Gen3 EKS Kubernetes cluster. If so, technical pointers are as follows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submit

submit

README.md

Manifest-based Data Submission

Context

Notes

REF

Files

submit

Directory actions

More options

Directory actions

More options

Latest commit

History

submit

Folders and files

parent directory

README.md

Manifest-based Data Submission

Context

Notes

REF