Pipeline Ab germline

1. merge reads with pandaseq, trim random nucleotides/primers at beginning and end of read, collapes unique reads, keep only if more than 10 members

/data/AbX/germline/GermAb/1_merge_trim_collapse.sh /data/MiSeq/MiSeqOutput/XXX/Data/Intensities/BaseCalls/

Input: _R1.fastq _R2.fastq Script: 1_merge_trim_collapse.sh contains primer_trim.py Output: _panda.fasta _trimmed.fasta _uniq.fasta

2. align reads to IMGT reference

Input: _uniq.fasta Script: 2_align.sh Output: _aligned.sam _aligned.txt

IMGT reference was modified as follows: IGHV3-23D was deleted (identical to 3-23) -> analysis of 3-23 is that of 3-23 AND 3-23D IGHV1-69D was deleted (identical to 1-69) -> analysis of 1-69 is that of 1-69 AND 1-69D IGHV2-70D04 was deleted (identical to 2-7004), IGHV2-70D14 was renamed to IGHV2-7014 -> analysis of 2-70 is that of 2-70 AND 2-70D, IGHV2-70D04v renamed into IGHV2-7004v

3. filter functional Ab seqs, combine identical seqs with 0, 1 and 2 mutations from reference using sam cigar

the following deletes reads with mutation at position 229 (or 226, depends on primers) (wt: CCAAGAACCAGTT, mut: CCAAGACCCAGTT) filter(position != 230 | !grepl("IGHV4", allele) | !grepl("A", nt)) %>% filter(position != 227 | !grepl("IGHV4", allele) | !grepl("A", nt))

-> run R on server (takes too long otherwise) -> delete „Volumes“ in path files for this

Input: _aligned.txt Script: 3_functional_combine_identical.R Output: _alleles_comb.txt

4.determine alleles

Input: _alleles_comb.txt Script: 4_determine_alleles.sh contains freq_drop.py Output: _alleles_final.txt (list of readcount and assigned alleles) all_results.txt (list for all patients: number of alleles per gene and patient) _final_results.txt (list of alleles and number of mutations to allele)

Analysis

Exclude non-neutralizing patients: • 16198 (=SB126, score=2) -> not sequenced • 17420 (score=2) • 18826 (score=9) -> not sequenced • 18928 (AK170, score=11), labelled Ak170 in first run • 26500 (score=0) • 26586 (score=0) • 31822 (score=10) • 31933 (score=11) -> not sequenced • 34545 (score=12) • 41895 (ART) • 42080 (score=10) -> not sequenced • 42335 (score=10) • 42335 (score=12) Exclude patients with <10`000 reads -> repeat in next run • 17811 • 18322 • 15504 • 18357 • 15224 • 18669 • 18311 • 18418 • 19138 • 13853 • 25478 • 17241 • 31396 (run 3) Exclude controls : recombination controls in 3rd run Exclude read 46179_S1 (patient was sequenced twice)

(exclusions are done by filter(!grepl("46179_S1_|Hy|HD|AK170|41895|17420|26500|26586|34545|42335", patient_ID)) on first run samples and filter(!grepl("17811|18322|15504|18357|15224|18669|18311|18418|19138|13853|25478|17241|41895|31822", patient_ID)) on second run samples, done in combine_n_alleles_pat_characteristics.R) filter(!grepl("4-59-|4-28-|mix|31396", patient_ID)) on third run samples

R scripts

Run-related parametes (Read numbers etc ): • reads_per_patient.R analyzes reads per patient for all patients

• reads_per_family_gene.R analyzes reads per gene and family, also contains same analysis only with samples >10000 reads

• missing_genes_vs_total_reads.R plots total reads per sample vs number of missing genes

Reformat data, write output tables with patient characteristics and germline information • combine_n_alleles_pat_characteristics.R combines “all_results.txt” (number of alleles per gene and patient) with patient ethnicity and neut status, removes samples with <10000 reads, removes wrongly included samples (ART, didnt make it into top 105 etc) -> writes table: “patients_n_alleles_ethn_neut_subtype.txt” (contains patient, gene, n_alleles, run, ethnicity, subtype, bnAb activity) -> from this, check all samples with alleles > 4 using “alleles_final” files and correct if necessary (file to view alleles with readcounts: multiple_alleles_raw.txt, corrections are recorded in multiple_alleles_corr.txt), save as patients_n_alleles_corr.txt”

Analyse for now exclude • 4-28 • 4-30-2 • 4-30-4 • 4-38-2 • 4-39 • 4-4 • 4-61 • 2-70

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
R_analysis		R_analysis
1_merge_trim_collapse.sh		1_merge_trim_collapse.sh
2_align.sh		2_align.sh
3_functional_combine_identical.R		3_functional_combine_identical.R
3_new_functional_combine_identical.R		3_new_functional_combine_identical.R
4_determine_alleles.sh		4_determine_alleles.sh
LICENSE		LICENSE
Pipeline_Ab_germline.docx		Pipeline_Ab_germline.docx
README.md		README.md
freq_drop.py		freq_drop.py
primer_trim.py		primer_trim.py
test_freq_drop.py		test_freq_drop.py
test_primer_trim.py		test_primer_trim.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipeline Ab germline

1. merge reads with pandaseq, trim random nucleotides/primers at beginning and end of read, collapes unique reads, keep only if more than 10 members

2. align reads to IMGT reference

3. filter functional Ab seqs, combine identical seqs with 0, 1 and 2 mutations from reference using sam cigar

4.determine alleles

Analysis

R scripts

About

Releases

Packages

Languages

License

medvir/GermAb

Folders and files

Latest commit

History

Repository files navigation

Pipeline Ab germline

1. merge reads with pandaseq, trim random nucleotides/primers at beginning and end of read, collapes unique reads, keep only if more than 10 members

2. align reads to IMGT reference

3. filter functional Ab seqs, combine identical seqs with 0, 1 and 2 mutations from reference using sam cigar

4.determine alleles

Analysis

R scripts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages