diff --git a/docs/getting-started.rst b/docs/getting-started.rst index c7293752..a7867124 100644 --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -97,7 +97,7 @@ parameter: --pgs_id PGS001229 # one score --pgs_id PGS001229,PGS001405 # many scores separated by , (no spaces) -.. note:: You can also select scores associated with traits (``--efo_id``) and +.. note:: You can also select scores associated with traits (``--trait_efo``) and publications (``--pgp_id``) If you would like to use a custom scoring file not published in the PGS Catalog, @@ -129,11 +129,17 @@ for more information). If your custom PGS was in GRCh37 an example would look li ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To enable genetic ancestry similarity calculations and PGS normalisation, -download our pre-built reference database: +download one of our pre-built reference databases: .. code-block:: console - $ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_calc.tar.zst + $ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst + +This database contains a merged 1000 Genomes and Human Genome Diversity Project reference panel, and is the recommended default panel. + +You may prefer to use 1000 Genomes only: + + $ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_1000G_v1.tar.zst See :ref:`ancestry` for more details. @@ -149,7 +155,7 @@ they match the scoring file genome build. -profile \ --input samplesheet.csv --target_build GRCh37 \ --pgs_id PGS001229 \ - --run_ancestry pgsc_calc.tar.zst + --run_ancestry pgsc_HGDP+1kGP_v1.tar.zst Congratulations, you've now (`hopefully`) calculated some scores! |:partying_face:| diff --git a/docs/how-to/ancestry.rst b/docs/how-to/ancestry.rst index 18f1239d..a60f3dad 100644 --- a/docs/how-to/ancestry.rst +++ b/docs/how-to/ancestry.rst @@ -6,17 +6,29 @@ How do I normalise calculated scores across different genetic ancestry groups? Download reference data ----------------------- -The fastest method of getting started is to download our reference panel: +The fastest method of getting started is to download a `reference panel`_: .. code-block:: console - $ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_calc.tar.zst + $ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_1000G_v1.tar.zst -The reference panel is based on 1000 Genomes. It was originally downloaded from -the PLINK 2 `resources section`_. To minimise file size INFO annotations are -excluded. KING pedigree corrections were enabled. +This example reference panel is based on 1000 Genomes (`Nature 2015`_). + +We also provide a reference panel that combines 1000 Genomes with data from the Human Genome +Diversity Project derived from the gnomAD release (v3.1, `Koenig, Yohannes et al. bioRxiv 2023`_), +which includes additional samples and ancestry groups: + +.. code-block:: console + + $ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst .. _`resources section`: https://www.cog-genomics.org/plink/2.0/resources +.. _`reference panel`: https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/ +.. _`Nature 2015`: https://doi.org/10.1038/nature15393 +.. _`Koenig, Yohannes et al. bioRxiv 2023`: https://doi.org/10.1101/2023.01.23.525248 + +.. note:: These reference databases are not compatible with the test profile. + The test profile is not biologically meaningful, and is only used to test the workflow installed. Bootstrap reference data ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -34,6 +46,6 @@ To enable genetic similarity analysis and score normalisation, just include the .. code-block:: console $ nextflow run pgscatalog/pgsc_calc -profile test,docker \ - --run_ancestry path/to/reference/pgsc_calc.tar.zst + --run_ancestry path/to/reference/pgsc_HGDP+1kGP_v1.tar.zst The ``--run_ancestry`` parameter requires the path to the reference database. diff --git a/docs/how-to/database.rst b/docs/how-to/database.rst index d8f34b32..4469a1b8 100644 --- a/docs/how-to/database.rst +++ b/docs/how-to/database.rst @@ -8,17 +8,18 @@ A reference database is required to run some parts of the workflow: - Automatic genetic ancestry assignment with Principal Component Analysis - PGS normalisation methods that account for genetic ancestry -.. note:: It's simplest to download the reference database we have hosted at the - PGS Catalog +.. note:: It's simplest to download a reference database we host at the + PGS Catalog FTP Download reference database --------------------------- -A reference database is available to download here: +PGS Catalog created reference database(s) are available to download here: -``https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_calc.tar.zst`` +``https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_1000G_v1.tar.zst`` +``https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst`` -The database is about 7GB and supports both GRCh37 and GRCh38 input target +The databases are either 7GB or 16GB and support both GRCh37 and GRCh38 input target genomes. Once the reference database is included, remember you must include the ``--run_ancestry``