Skip to content

talkowski-lab/count_variant_occurrence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Given a set of single-sample VCF files, it counts how many times each variant is called in the set. It reports this in the following format:

chrom  start  stop  alts  count_with_other_filters    count_with_filter_pass
chr1  11021  11022  ('A',)  1  0

Build Docker

Build the Docker image using the following command after replacing [TAG] with your preferred tag.

docker build --platform linux/amd64 -t [TAG] .

Inputs

The following is an example input to the workflow.

{
    "VariantOccurrenceFrequency.sample_ids": [
        "SP0001643",
        "SP0001677",
        "SP0001710",
        "SP0001952",
        "SP0002342"
    ],
    "VariantOccurrenceFrequency.vcf_files": [
        "SP0001643.vcf.gz",
        "SP0001677.vcf.gz",
        "SP0001710.vcf.gz",
        "SP0001952.vcf.gz",
        "SP0002342.vcf.gz"
    ],
    "VariantOccurrenceFrequency.runtime_override_encode": {"docker": "pzm:latest"},
    "VariantOccurrenceFrequency.runtime_override_merge": {"docker": "pzm:latest"},
    "VariantOccurrenceFrequency.runtime_override_decode": {"docker": "pzm:latest"},
    "VariantOccurrenceFrequency.max_batch_size": 100
}

The workflow batches the input variants into batches of max_batch_size size. The batching is related to the first step of the workflow that reads the VCF files to extract information. The rest of the pipeline is not batched. It takes about ~30seconds to read each VCF, hence a batch of 100, may take ~3000 seconds per batch.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published