This repository includes the used pipeline for the article A comprehensive map of single base polymorphisms in the hypervariable LPA Kringle-IV-2 copy number variation region. The paper can be found here.
You can upload a BAM file and get annotated low-level variants in return. In the current version, indel detection and BAQ features are disabled.
We recommend to use BWA-MEM to align your FASTQ files against a reference. Please use this reference for the alignment process.
bwa mem kiv2_6.fasta <file1.fastq> <file2.fastq>| gzip -3 > aln-pe.sam
samtools view -S -b aln-pe.sam > sample.bam
All sequence data has been upload to Dataverse and can be accesed here.
Please check out this repository if you want to have a look at the actual source code.
- Low-level Variant Detection
- Type-B Base Annotation
- Region Annotation
- Overall Statistics
- Install Cloudgene with the following commands
mkdir cloudgene
cd cloudgene
curl -s install.cloudgene.io | bash
- Install the LPA workflow
./cloudgene gh seppinho/lpa-workflow
3a) Start the local web service and run
./cloudgene server
Open your web browser and enter http://localhost:8082. Use admin
and admin1978
to login.
3b) Run on the command line
./cloudgene run seppinho-lpa-workflow --input <bam-folder> --archive <fasta file> --annotateBase <annotation file> --annotateRegion <region file>
-
Download BAM file
-
Download Reference
-
Download Base Annotation
-
Download Region Annotation
./cloudgene run seppinho-lpa-workflow --input <bam-folder-withAK-Sample> --archive kiv2_6.fasta --annotateBase typeb_annotation.csv --annotateRegion maplocus_v3.txt
The script to download data from 1000 Genomes can be found here.
Please contact Stefan Coassin and Sebastian Schoenherr in case of problems.