Sequence-based Kinship Estimation Pipeline

Summary

This pipeline includes the software tools for estimating pairwise kinship coefficients starting from the sequencing datasets. It is based on . The following steps are performed:

Variant calling

To address the missing data and genotype uncertainty, we call the variants by samtools mpileup tools and then process them based on the LD-based genotype calling algorithm in BEAGLE.

Ancestry estimation

For each sequenced genome (in BAM), we use LASER to estimate individual ancestry background given an external ancestry reference panel.

Kinship estimation

We use SEEKIN to estimate pairwise kinship coefficients for homogenous/heterogenous samples.

Steps to perform kinship estimation

To estimate kinship coefficients using this pipeline, you should prepare for the configuration file. One example can be seen in example/test.conf. In this file, each line specifies one parameter, followed by the parameter value.

BAM_LIST: aligned sequenced reads in BAM format. Each BAM file should contain one sample per subject. It also must be indexed using samtools index or equivalent software tools. See example/sample.bam.lst for example.
VCF_SITE_FILE: candidate variant sites file in the VCF format. This file includes region in which samtools mpileup is generated. This file can include the markers from the 1000 Genomes Project. See example/EAS.panel.sites.vcf.gz for example.
BEAGLE_REF_LST: external reference panel file list for beagle imputation (one VCF file per chromosome). See example/EAS_file_list.txt for example.

Other parameters are easily understood according to the comments. More details can be seen in the SEEKIN, LASER and BEAGLE manuals. Please remember to modify the path of software to specify it installed in your own machine.

Then, you can generate the job files by running the following step

 $ python  $pipelinePath/lib/GetConf.py  -c test.conf   -o run.yaml

After this step, all the jobs required to be run can be seen in the folder ./jobfiles.

To perform variant calling, run the following step

 $ snakemake -s $pipelinePath/Snakefile  --jobs 100  varCall --rerun-incomplete --timestamp --printshellcmds --stats logs/snakemake.stats --configfile run.yaml --latency-wait 60 --cluster-config cluster.GIS.yaml --drmaa " -pe OpenMP {threads} -l mem_free={cluster.mem} -l h_rt={cluster.time} -cwd -v PATH -e logs -o logs -w n" --jobname "SEEKIN.slave.{rulename}.{jobid}.sh" >> logs/snakemake.log 2>&1

The generated genotype file will be available at ./snp/Beagle.gp.vcf.gz.

To perform ancestry estimation, run the following step

 $ snakemake -s $pipelinePath/Snakefile  --jobs 10  laser --rerun-incomplete --timestamp --printshellcmds --stats logs/snakemake.stats --configfile run.yaml --latency-wait 60 --cluster-config cluster.GIS.yaml --drmaa " -pe OpenMP {threads} -l mem_free={cluster.mem} -l h_rt={cluster.time} -cwd -v PATH -e logs -o logs -w n" --jobname "SEEKIN.slave.{rulename}.{jobid}.sh" >> logs/snakemake.log 2>&1

The generated PCA coordinate file of study samples will be available at ./laser/laser.seqPC.coord.

To perform kinship estimation, run the following step

 $ snakemake -s $pipelinePath/Snakefile  --jobs 1  seekin --rerun-incomplete --timestamp --printshellcmds --stats logs/snakemake.stats --configfile run.yaml --latency-wait 60 --cluster-config cluster.GIS.yaml --drmaa " -pe OpenMP {threads} -l mem_free={cluster.mem} -l h_rt={cluster.time} -cwd -v PATH -e logs -o logs -w n" --jobname "SEEKIN.slave.{rulename}.{jobid}.sh" >> logs/snakemake.log 2>&1

The generated output will be available at ./seekin.

Questions

For further questions, please contact Jinzhuang Dou ([email protected]) and Chaolong Wang ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
apps		apps
etc		etc
example		example
lib		lib
resource		resource
scripts		scripts
README.md		README.md
Snakefile		Snakefile
VERSION		VERSION
init		init

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequence-based Kinship Estimation Pipeline

Summary

Steps to perform kinship estimation

Questions

About

Releases

Packages

Languages

jinzhuangdou/seekin_pipeline

Folders and files

Latest commit

History

Repository files navigation

Sequence-based Kinship Estimation Pipeline

Summary

Steps to perform kinship estimation

Questions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages