diff --git a/docs/how-to/ancestry.rst b/docs/how-to/ancestry.rst index eee628ad..7c435f7f 100644 --- a/docs/how-to/ancestry.rst +++ b/docs/how-to/ancestry.rst @@ -12,10 +12,11 @@ The fastest method of getting started is to download a `reference panel`_: $ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_1000G_v1.tar.zst -This example reference panel is based on 1000 Genomes. +This example reference panel is based on 1000 Genomes (`Nature 2015`_). -We also provide a reference panel that includes Human Genome Diversity Project data, -which includes more ancestry groups: +We also provide a reference panel that combines 1000 Genomes with data from the Human Genome +Diversity Project derived from the gnomAD release (v3.1, `Koenig, Yohannes et al. bioRxiv 2023`_), +which includes additional samples and ancestry groups: .. code-block:: console @@ -23,6 +24,8 @@ which includes more ancestry groups: .. _`resources section`: https://www.cog-genomics.org/plink/2.0/resources .. _`reference panel`: https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/ +.. _`Nature 2015`: https://doi.org/10.1038/nature15393 +.. _`Koenig, Yohannes et al. bioRxiv 2023`: https://doi.org/10.1101/2023.01.23.525248 Bootstrap reference data ~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/how-to/database.rst b/docs/how-to/database.rst index d8f34b32..4469a1b8 100644 --- a/docs/how-to/database.rst +++ b/docs/how-to/database.rst @@ -8,17 +8,18 @@ A reference database is required to run some parts of the workflow: - Automatic genetic ancestry assignment with Principal Component Analysis - PGS normalisation methods that account for genetic ancestry -.. note:: It's simplest to download the reference database we have hosted at the - PGS Catalog +.. note:: It's simplest to download a reference database we host at the + PGS Catalog FTP Download reference database --------------------------- -A reference database is available to download here: +PGS Catalog created reference database(s) are available to download here: -``https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_calc.tar.zst`` +``https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_1000G_v1.tar.zst`` +``https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst`` -The database is about 7GB and supports both GRCh37 and GRCh38 input target +The databases are either 7GB or 16GB and support both GRCh37 and GRCh38 input target genomes. Once the reference database is included, remember you must include the ``--run_ancestry``