Skip to content

This little helper usues jinja2 and igvf_utils to generate seqspec files for mpra count and association data.

Notifications You must be signed in to change notification settings

kircherlab/IGVF_MPRA_seqspec_generator

Repository files navigation

IGVF MPRA seqspec generator

This little helper usues jinja2 and igvf_utils to generate seqspec files for mpra count and association data. It is very flexible because many templates can be defined. It loads all necessary infomation from the IGVF data portal, like file-size, md5sums, sequencing platform, etc.

Installation

You need to have jinja2 and click. I install it via mamba:

mamba install jinja2 click

Please install the actual seqspec development:

pip install git+https://github.com/pachterlab/seqspec@devel

Then install the igvf_utils (install documentation):

pip install https://github.com/IGVF-DACC/igvf_utils/archive/master.zip

Set the IGVF_API_KEY environment variable to your API key as well as the IGVF_SECRET_KEY to your secret key. See configuration documentation.

Quickstart

python generate_seqspec.py --help

Shendure grant

Assignment data

python generate_seqspec.py --template templates/igvf_mpra_lenti_assignment.v0.3.0.yml \
--name mpra_shendure_proximal_promoter --modality dna \
--r1-id IGVFFI7003XVUG --r1-id IGVFFI5231ATBS --r1-id IGVFFI0921LXBF \
--r2-id IGVFFI8640LLIG --r2-id IGVFFI0576IRDC --r2-id IGVFFI6354NJGB \
--r3-id IGVFFI4403ENTR --r3-id IGVFFI2142OYFW --r3-id IGVFFI6807PKQA \
--r1-primer GGCCCGCTCTAGACCTGCAGGAGGACCGGATCAACT --r2-primer GCAAAGTGAACACATCGCTAAGCGAAAGCTAAG --r3-primer CATTGCGTGAACCGACACTAGAGGGTATATAATG \
--onlist-id IGVFFI2041KXFD \
--bc-length 15 --oligo-length 200 \
--output IGVF_shendure_proximalPromoter_assignment.yaml

DNA assignment seqspec of lenti virus MPRA from Shendure (UW) grant:

python generate_seqspec.py --template templates/igvf_mpra_lenti_assignment.v0.3.0.yml \
--name mpra_shendure_80K --modality dna \
--r1-id IGVFFI9931MZQI --r2-id IGVFFI9154RAYY --r3-id IGVFFI7509PYSL \
--r1-primer GGCCCGCTCTAGACCTGCAGGAGGACCGGATCAACT --r2-primer GCAAAGTGAACACATCGCTAAGCGAAAGCTAAG --r3-primer CATTGCGTGAACCGACACTAGAGGGTATATAATG \
--bc-length 15 --oligo-length 270 \
--output test.yaml
seqspec print -f seqspec-ascii test.yaml

returns:

dna
---
                                                                                             |------------------------------------------------------------------------------------------------------------------------------------------------->(1) Oligo fwd
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |-------------->(3) BC
AATGATACGGCGACCACCGAGATCTACACXXXXXXXXXXCAGCCTGCATTTCTGCCAGGGCCCGCTCTAGACCTGCAGGAGGACCGGATCAACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCAAAGTGAACACATCGCTAAGCGAAAGCTAAGGAAGCTCGACTTCCAGCTTGGCAATCCGGTACTGTCATTGCGTGAACCGACACTAGAGGGTATATAATGXXXXXXXXXXXXXXXACCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGCATCTCGTATGCCGTCTTCTGCTTG
TTACTATGCCGCTGGTGGCTCTAGATGTGXXXXXXXXXXGTCGGACGTAAAGACGGTCCCGGGCGAGATCTGGACGTCCTCCTGGCCTAGTTGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGTTTCACTTGTGTAGCGATTCGCTTTCGATTCCTTCGAGCTGAAGGTCGAACCGTTAGGCCATGACAGTAACGCACTTGGCTGTGATCTCCCATATATTACXXXXXXXXXXXXXXXTGGCCAGCGGTGGTACCACTCGTTCCCGCTCCTCGTAGAGCATACGGCAGAAGACGAAC
                                                                                                                                                                                                                          <-------------------------------------------------------------------------------------------------------------------------------------------------|(2) Oligo rev

Count data

RNA count seqspec of lenti virus MPRA from shendure (UW) grant:

python generate_seqspec.py --template templates/igvf_mpra_lenti_counts.v0.3.0.yml \
--name mpra_shendure_80K --modality rna \
--r1-id IGVFFI8223UESF --r1-id IGVFFI9990NOMV --r1-id IGVFFI3050NXPU \
--r2-id IGVFFI9560VIAN --r2-id IGVFFI5074MDCR --r2-id IGVFFI4713QQLG \
--r3-id IGVFFI1814DAMK --r3-id IGVFFI0172AZKE --r3-id IGVFFI2509VPWV \
--r1-primer GCAAAGTGAACACATCGCTAAGCGAAAGCTAAG --r2-primer ACCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGC \
--bc-length 15 \
--output test.yaml
seqspec print -f seqspec-ascii test.yaml

returns:

rna
---
                                                                       |-------------->(1) RNA BC count fwd
                                                                                                                         |--------------->(3) RNA BC count id
AATGATACGGCGACCACCGAGATCTACACXXXXXXXXXXGCAAAGTGAACACATCGCTAAGCGAAAGCTAAGNNNNNNNNNNNNNNNACCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGCXXXXXXXXXXXXXXXXATCTCGTATGCCGTCTTCTGCTTG
TTACTATGCCGCTGGTGGCTCTAGATGTGXXXXXXXXXXCGTTTCACTTGTGTAGCGATTCGCTTTCGATTCNNNNNNNNNNNNNNNTGGCCAGCGGTGGTACCACTCGTTCCCGCTCCTCGXXXXXXXXXXXXXXXXTAGAGCATACGGCAGAAGACGAAC
                                                                        <--------------|(2) RNA BC count rev

Mohlke grant

RNA count seqspec of plasmid MPRA from Mohlke (UNC) grant:

python generate_seqspec.py --template templates/igvf_mpra_unc_counts.v0.3.0.yml \
--name mpra_unc_hepg2 --modality rna \
--r1-id  IGVFFI1586GLDT --r1-id IGVFFI1618FFIN \
--r1-primer CCAAGAAGGGCGGCAAGATCGCCGTGTAATAATTCTAGA --bc-length 20 --onlist-id IGVFFI9520JZQK \
--output test.yaml
seqspec print -f seqspec-ascii test.yaml

returns:

rna
---
                                                                                                                                                                              |-------------------------------------------------->(1) RNA BC count fw
AATGATACGGCGACCACCGAGATCTACACTACAACCGCCAAGAAGCTGCGCGGTGGTGTTGTGTTCGTGGACGAGGTGCCTAAAGGACTGACCGGCAAGTTGGACGCCCGCAAGATCCGCGAGATTCTCATTAAGGCCAAGAAGGGCGGCAAGATCGCCGTGTAATAATTCTAGANNNNNNNNNNNNNNNNNNNNACTAGTACACTCCCCGTCGGCAGTTGGGAAGAGCATAGTCGTAGAGCACGCGGACTCCTATCTCGTATGCCGTCTTCTGGTTG
TTACTATGCCGCTGGTGGCTCTAGATGTGATGTTGGCGGTTCTTCGACGCGCCACCACAACACAAGCACCTGCTCCACGGATTTCCTGACTGGCCGTTCAACCTGCGGGCGTTCTAGGCGCTCTAAGAGTAATTCCGGTTCTTCCCGCCGTTCTAGCGGCACATTATTAAGATCTNNNNNNNNNNNNNNNNNNNNTGATCATGTGAGGGGCAGCCGTCAACCCTTCTCGTATCAGCATCTCGTGCGCCTGAGGATAGAGCATACGGCAGAAGACCAAC

About

This little helper usues jinja2 and igvf_utils to generate seqspec files for mpra count and association data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages