SV calling initiated from the GOBACK project, but become a general pipeline on HGSC cluster
Further filter by excluding RLCR region. ITX and CTX are in pgl format, the others are in bed format.
- Calls: DEL INS INV ITX CTX
- SVtyper genotyped (Does not genotype ITX CTX INS)
- Simple filter: for DEL INV, based on SVtyper, genotype 0/1 or 1/1, and QUAL>1; for INS ITX CTX, require DP > 10
- breakdancer_call.sub
- breakdancer_svtyper.sub
- breakdancer_categorize.sub
- Calls: DEL DUP (also CNV based on read-depth)
- Use its own genotyper
- No simple filter applied
- cnvnator_call.sub
- cnvnator_categorize.sub
- Calls: DEL DUP (tandem only, also CNV based on read-depth) INV CTX (as BND)
- Use its own genotyper
- Simple filter: based on genotype 0/1 or 1/1, and PASS for the site and for the sample
- delly_call.sub
- delly_categorize.sub
- Calls: DEL DUP (tandem only) INV ITX (as BND) CTX (as BND)
- SVtyper genotyped
- Filter: based on SVtyper, genotype 0/1 or 1/1, and QUAL>1. Also only keep one record for ITX and CTX
- lumpy_preprocess.sub
- lumpy_call.sub
- lumpy_svtyper.sub
- lumpy_categorize.sub
- Calls: DEL DUP (tandem only) INS INV CTX (as BND)
- Use its own genotyper
- Filter: PASS for the site and for the sample. Also only keep one record for CTX
- manta_call.sub
- manta_categorize.sub
- Calls: DEP DUP (both disparse and tandem, also CNV based on read-depth) INV ITX (as BND) CTX (as BND)
- Use its own genotyper
- Filter: PASS for the site. Combine disparse and tandem duplications.
- tiddit_call.sub
- tiddit_categorize.sub
Further filter by limiting the size within (100bp,1Mb). And remove calls overlapping with RLCRs_no_Repeat_Masker.txt
- breakdancer
- cnvnator
- delly
- lumpy
- manta
- tiddit
- cnvnator
- delly
- lumpy
- manta
- tiddit
- breakdancer
- delly
- lumpy
- manta
- tiddit
- breakdancer
- manta
Remove calls overlapping with RLCRs_no_Repeat_Masker.txt
Output is in pgl format
- breakdancer
- delly
- lumpy
- manta
- tiddit
- breakdancer
- delly
- lumpy
- manta
- tiddit
- cnvnator
- delly
- tiddit
- The SV does not exist in any other samples
- TODO: confirm using genotyping
- The SV overlap with at least one other family member, but not other samples outside the family
- TODO: confirm using genotyping
- The SV does not exist in any healthy controls. (Only run for probands, different between Private is whether in other probands or not)
- TODO: confirm using genotyping
From family specific SV, genotyping to select SV that are 1/1 in proband, 0/1 in parents, and not 1/1 in sibling
- TODO: extend the genotyping candidate to family-specific SV found in all family members
TODO later