This is meant to collect a set of script needed to run finemapping on genetic association studies.
We use snakemake
workflow managment system based on python language. For additional details, refers to the homepage
of Snakemake
In particular, we use the recently developed algorithm susie
with its R implementation
susieR
, for further details on the algorithm,
please refer to the original paper Wang et al. 2020
The pipeline starts from the summary statistic generated by the regenie
algorithm, it applies a clumping step
based on parameters defined in the configuration file. Afterwards, the clumps are enlarge to have a minimum size of 1Mb
and if any overlaps between clumps is found, the two regions are merged together.
For each clump, the susieR
algorithm is applied. If a credible set is found, then it will be reported in the summary
file.
- Summary statistic files:
pheno.regenie.gz
. - Phenotype file (needed) containing the original phenotype used for the GWAS.
- Genotype
plink
file sets [.bim
,.fam
,.bed
] matching the GWAS analysis.
- snakemake=8.4.8
- snakemake-executor-plugin-slurm
- git
Optional if already installed by the system administrator or already available in a conda environment.
See Install snakemake for further information and specific parameters.
conda create -n snakemake bioconda::snakemake bioconda::snakemake-executor-plugin-slurm
- Now clone this repo into your working directory.
git clone https://github.com/EuracBiomedicalResearch/finemap_pipeline
cd finemap_pipeline
- Write a configuration file
All the available parameters are defined through a configuration file written in YAML
format language.
Take the file config/config.yaml
as an example and modify it according to your needs.
- Activate the conda environment
conda activate snakemake
- Dry-run to see the number of jobs to be submitted
sbatch snakemake --configfile config/config.yaml -n
- Submit the command to slurm
NB See the snakemake documentation on how to create a slurm profile to submit jobs.
sbatch snakemake --configfile config/config.yaml --profile ~/snake_prof/slurm
--executor slurm
--latency-wait 60
--nolock
The pipeline produce a summary tsv
file with the leading variant for each credible set found in the analysis.
The summary contain a subset of the original summary statistic.
{#ref-susier} Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. (2020). A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society, Series B 82, 1273–1300. https://doi.org/10.1111/rssb.12388