This repository contains information for QC, analyses, and tutorials relating to the combined HGDP (Human Genome Diversity Project) + 1kGP (1000 Genomes Project) data
-
Metadata available on Google Cloud: gs://gcp-public-data--gnomad/release/3.1/secondary_analyses/hgdp_1kg_v2/metadata_and_qc/gnomad_meta_updated.tsv
- The metadata can be downloaded from google cloud as described here.
-
All data are freely available and described in more detail here.
-
The gnomAD HGDP+1kGP callset (pre-QC mt) can be found here.
- Note that files ending with
.bgz
can be viewed usingzcat
on the command line
- Note that files ending with
-
Datasets used in the tutorials are located here.
-
Phased haplotypes are available as BCFs on Google Cloud: gs://gcp-public-data--gnomad/resources/hgdp_1kg/phased_haplotypes_v2/
- More details can be found here.
-
Datasets found on the Downloads page of the gnomAD browser are released on Google Cloud Platform, Amazon Web Services, and Microsoft Azure. Instructions on how to download them can be found here.
PCA plotting and projection scripts available here (used for the COVID-19 Host Genetics Initiative, Global Biobank Meta-analysis Initiative, and related projects to align external cohorts to this resource): https://github.com/atgu/pca_projection/blob/master/hgdp_tgp_reference/hgdp_tgp_pca_intersection.py