Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix reference link #214

Merged
merged 7 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions docs/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ parameter:
--pgs_id PGS001229 # one score
--pgs_id PGS001229,PGS001405 # many scores separated by , (no spaces)

.. note:: You can also select scores associated with traits (``--efo_id``) and
.. note:: You can also select scores associated with traits (``--trait_efo``) and
publications (``--pgp_id``)

If you would like to use a custom scoring file not published in the PGS Catalog,
Expand Down Expand Up @@ -129,11 +129,17 @@ for more information). If your custom PGS was in GRCh37 an example would look li
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To enable genetic ancestry similarity calculations and PGS normalisation,
download our pre-built reference database:
download one of our pre-built reference databases:

.. code-block:: console

$ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_calc.tar.zst
$ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst

This database contains a merged 1000 Genomes and Human Genome Diversity Project reference panel, and is the recommended default panel.

You may prefer to use 1000 Genomes only:

$ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_1000G_v1.tar.zst

See :ref:`ancestry` for more details.

Expand All @@ -149,7 +155,7 @@ they match the scoring file genome build.
-profile <docker/singularity/conda> \
--input samplesheet.csv --target_build GRCh37 \
--pgs_id PGS001229 \
--run_ancestry pgsc_calc.tar.zst
--run_ancestry pgsc_HGDP+1kGP_v1.tar.zst

Congratulations, you've now (`hopefully`) calculated some scores!
|:partying_face:|
Expand Down
24 changes: 18 additions & 6 deletions docs/how-to/ancestry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,29 @@ How do I normalise calculated scores across different genetic ancestry groups?
Download reference data
-----------------------

The fastest method of getting started is to download our reference panel:
The fastest method of getting started is to download a `reference panel`_:

.. code-block:: console

$ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_calc.tar.zst
$ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_1000G_v1.tar.zst

The reference panel is based on 1000 Genomes. It was originally downloaded from
the PLINK 2 `resources section`_. To minimise file size INFO annotations are
excluded. KING pedigree corrections were enabled.
This example reference panel is based on 1000 Genomes (`Nature 2015`_).

We also provide a reference panel that combines 1000 Genomes with data from the Human Genome
Diversity Project derived from the gnomAD release (v3.1, `Koenig, Yohannes et al. bioRxiv 2023`_),
which includes additional samples and ancestry groups:

.. code-block:: console

$ wget https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst

.. _`resources section`: https://www.cog-genomics.org/plink/2.0/resources
.. _`reference panel`: https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/
.. _`Nature 2015`: https://doi.org/10.1038/nature15393
.. _`Koenig, Yohannes et al. bioRxiv 2023`: https://doi.org/10.1101/2023.01.23.525248

.. note:: These reference databases are not compatible with the test profile.
The test profile is not biologically meaningful, and is only used to test the workflow installed.

Bootstrap reference data
~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -34,6 +46,6 @@ To enable genetic similarity analysis and score normalisation, just include the
.. code-block:: console

$ nextflow run pgscatalog/pgsc_calc -profile test,docker \
--run_ancestry path/to/reference/pgsc_calc.tar.zst
--run_ancestry path/to/reference/pgsc_HGDP+1kGP_v1.tar.zst

The ``--run_ancestry`` parameter requires the path to the reference database.
11 changes: 6 additions & 5 deletions docs/how-to/database.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,18 @@ A reference database is required to run some parts of the workflow:
- Automatic genetic ancestry assignment with Principal Component Analysis
- PGS normalisation methods that account for genetic ancestry

.. note:: It's simplest to download the reference database we have hosted at the
PGS Catalog
.. note:: It's simplest to download a reference database we host at the
PGS Catalog FTP

Download reference database
---------------------------

A reference database is available to download here:
PGS Catalog created reference database(s) are available to download here:

``https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_calc.tar.zst``
``https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_1000G_v1.tar.zst``
``https://ftp.ebi.ac.uk/pub/databases/spot/pgs/resources/pgsc_HGDP+1kGP_v1.tar.zst``

The database is about 7GB and supports both GRCh37 and GRCh38 input target
The databases are either 7GB or 16GB and support both GRCh37 and GRCh38 input target
genomes.

Once the reference database is included, remember you must include the ``--run_ancestry``
Expand Down