forked from walaj/svaba
-
Notifications
You must be signed in to change notification settings - Fork 0
/
change.log
292 lines (259 loc) · 11.4 KB
/
change.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
#snowman 67
-- loosened the filters a bit, to be more sensitive to low AF
# snowman 68
-- fixed bug with complex events, and with not doing discordant lookup for
inter-chr events
# snowman 69
-- slighlty more permissive on complex breaks
# snowman 70
-- no dedup calc for non rule pass (improved memory and speed performance)
# snowman 71
-- changed to a global one-read-name-per-split-coverage rule. Previously
only applied to SVs > 1000
# snowman 72
-- fixed missing printed contig name in AlignedContig
-- added local check for assembly-only. non-locals are ditched
-- added dup-read check, which checks r2c alignments to get span of split
read covs. Ditches the variant as DUPREADS if split span is same < read
length
-- moved fail-safe warning (10,000 read max per region) to log file
# snowman 73
-- fixed bug that make the dupreads fix from snowman72 not work
# snowman 74
##-- more strict in dup reads. Now, required that split coverage spans
##read_len + 13 bases
-- exit_code for "help" switched to 0 (from 1)
-- bug that let PCR duplicate reads though (bad minirules
expression). Fixed
-- added check that both the first and second mate of a pair cannot
contribtue to same break. If so, reject all reads with that qname
-- fixed bug that was causing reads to not be trimmed correctly in
MiniReads
-- added refiltering of complex events
## snowman 75
-- greatly reduced the cutoff for calling reads somatic (DBCUTOFF 15 -> 7,
NODBCUTOFF 8 -> 2.5
## snowman 76
-- fixed bug that allowed contigs with a high number (3+) of
secondary alignmetns to make it to the filtered VCFs
-- require 50 bases match min per alignmetn for contisg with < 0.2
AF
-- prevents indels from being called if part of something like
50M3D8M9I
-- fixed bug that only counted repeats in one direction for indels
-- added ability to pass single chr string to -k (e.g. snowman run
-k 1)
-- added contig-wide repeat filter (for super long repeats) in BreakPoint::repeatFilter
## snowman 77
-- fixed bug that didn't filter out 50 bases match min from (76)
-- added filter for inter-chr assembly-only that need to have all
pieces 60+ for non-complex
-- added -- else if (num_align == 2 && std::min(b1.mapq, b2.mapq) <
50 && b1.gr.chr != b2.gr.chr) // interchr need good mapq for
assembly only
-- added blacklist check for assembly-only
-- set default NM to - for reduced breakend
-- added discorand read counts to ALT for DSCR
-- fixed bug that was throwing out indels at DBSNP sites (due to
bad AF frac calculation)
-- added blat option
-- added refilter option
-- fixed bug that forced all DSCRD to be intra-chromosomal
-- post-pend contig names with C to make them unique when grepping
-- fixed bug that under-estimated LOD for indels
-- added cname to discordants.txt.gz output
-- added info header tags to indel
-- allow for germline/somatic classification to be skipped for
single BAM files
-- turned off error printing for MT chr in DBSNP filter
-- fixed bug in discordants with extra "x" in read names and blank
for contigs (should be 'x')
-- fix error where ALT counts for DSCRD clusters not being
reported in bps
-- got rid of dup read names for read tracking in DSCRD / ASDIS case
-- added filter to get rid of germline events with low AF
## snowman 78
-- added BOOST to LD_PATH for FH task
-- allow local parts of complex breaks, and will check them
against discordant reads
-- fixed NaN bug on log-likelihood calc
-- removed bug where somatic SVs were being later set to germline
because they were at site of low-confidence germline indel. Made
it now for indels only, and requires PASS germline indel
-- got rid of bug where normal reads aligning fulling to one
contig alignment were counting as split when they aligned in a
homology region
-- more persmissive for ASDIS with secondary alignments. if strong
discordant cluster (with high MAPQ and 10+ reads) then keep
-- fixed toFileString bug in discordant cluster
## snowman 79
-- slightly stricter on assmebly only for short (4+ now)
-- slogihtly stricter on cliped reads for insertions (16bp or more
now required to be supported by only one side spanned by clip)
## snowman 80
-- fixed issue where high quality gemrline indels were being
rejected because of a sign error on the LR caclulation
-- fixed issue where germline indels were being called somatic if
there were multiple alleles at one location. This was from r2c
mappings being thrown out too quickly if they had any indel vs the
contig. Now they are parsed and only removed if indel exactly
mirrors the variant (e.g. 40I on a contig with a 40D is not valid)
-- removed 5+ split read filtered required for inter-chr ASDIS
-- fixed bug where small non-FR pairs not being picked up (caused
false negative short DUPS)
-- relaxed other read rules (10->5 clip, no mapq filter)
-- fixed bug where discordant clusters being reported incorrectly
in file
## snowman 81
-- fixed error in how some indels are parsed
## snowman 82
-- turned off over-aggressive DB snp filter for events iwth AF <
0.2 nearby a DBsnp site
## snowman 84
-- added multi-round jumping for complex events
-- reinstated bootstrap assembly
-- added gap-open penalty option
## snowman 85
-- decreased strictness of the contig length cutoff (115% of read length)
## snomwan 86
-- turned off debug printing
-- added gap extension penalty option
## snowman 87
-- fixed issue where discordant clusters were not being reported
if they weren't being looked up. E.g. normal clusters
that are informative were not being recorded
## snowman 88
-- fixed assert statments releative to v87 that were causing failure
## snowman 89
-- moved to JSON rules
-- forgot a "use Samtools" statement in snow.sh
## snowman 90
-- more strict about do too many jumps to mate regions
## snowman 91
-- removed hard limit on read lookups in mate window that was
giving 0 coverage issues
## snowman 92
-- loosened filters on DSCRD, strengthed on ASDIS
-- fixed bug with over-counting split and disc from same qnameg
## snowman 93
-- added NM filter for assembly only
-- fixed bug with somatic svs being removed by random weird
germline events nearby
-- fixed issue where we weren't checking orientation of BPs and
DCs when looking for overlap
-- allow ASDISC with lots of disc reads but few split reads
-- synced ASDISC filter for MAPQ so that something that passes
DSCRD would also pass ASDISC. Previously, would fail something
with an assembly mapq of < 10, even if disc cluster was
high-confidence
-- fixed bug where germline indels that fell at same point as a
PASS somatic SV were being used to mark it as germline
(inappropriate cross talk between INDEL and SV)
-- switched to MEDIAN score in DiscordantCluster, instead of MEAN
-- don't count RF reads if insert is near size of read length
-- filter out discordant pairs in clusters that deviate
substantially from the other pairs. Fixes issue where small insert
RF pairs in normal were being group with larger insert RF pairs in
tumor, causing false negative somatic calls.
-- fixed issue with some wonky discordant read clusters where L
and R clusters overlapped (set them to invalid)
-- removed one round of assembly, second round is exact
-- removed SOMATIC info field from VCF
-- dont print READNAMES if equal to 'x'
-- minor bug where local breaks between two non-secondary mappings
were being called as secondary because of a non-reset bp.secondary
flag from a previous local breakpoint
-- added a << method for break-end for easier debugging
-- added 'reuse' method for faster debugging
-- dont count secondary reads when getting coverage
-- retrieve small discordant FR pairs where insert size is >
1.96sd + mean(isize)
-- orientation filtering uses position now, rather than
first/second flag.
-- add a pad of 10 when deduplicating rearrangements in vcf.cpp
-- added splitcounter module
-- added NM and MAPQ filters to assembly and assembly_dscrd
-- read-group specific filtering rules and parameter learning, and discordant size cutoff
-- filtering out read-pairs that entirely overlap each other
-- InsertSize calculations now use "FullInsertSize" for less
ambiguity
-- fixed bug that was tossing TD type discordant read pairs
-- fixed bug that was tossing some inter-chromosomal read pairs
## snowman 95
-- added repeat masker simple-sequence filter
-- added discordant read realignment to reference
## snowman 97
-- added genotyping
-- added read-to-reference mapping to compete with read-to-contig
-- turned on kmer filtering
-- added all reads indexing for kmer correciton (learn from all,
correct only weird)
-- loosened strictness on split-read boundaries on contig, because
of the new read-to-reference decoy
-- some bug fixes to deal with contigs with 'x' in the name
(changed SmartTag delimiter)
-- region BED file uses header from reference, not header from BAM
## snowman 98
-- fixed issue where split&discordant reads were getting removed
from the discordant realignment. Instead, we only want to remove
discordant reads that realign to mate region WITHOUT clips
-- added hard limit on all_reads in SnowmanBamWalker, and only
keep non-hardclipped reads and qcpass reads
## snowman 99-101
[firehose issues]
## snowman 102
-- don't take reads that intersect 30+ bases with simple repeats
## snowman 103
-- hold normal reads for kmer correction as char* to save memory
-- subsample normal reads for kmer correction for memory / time
-- made robust to cases where there is discrepancy between aligned
genome from reads and reference (fixed issue with seg faults in
old bams)
-- fixed bug that was tossing FR interchromosomal reads during
discordant clustering step
## snowman 104
-- fixed bug that was parsing read-groups incorrectly. Now prefers
parsing from RG tag, unless not present OR contains "tumor" (a
hack to deal with simulated data)
-- don't remove discordant clusters that fall below FR min isize,
just keep them
## snowman 105
-- added one trim round with trim length of 100. Insensitive
parameter
-- added back better kmer sampling
## snowman 106
-- changed BWA to have -x intractg values
-- -I flag now also limits on large intra-chr lookup
-- added Fermi-kit option
-- label pieces of complex breaks as such
-- fixed bug where BWAWrapper was not recieving correct options
-- added aligned coverage filter, so that contigs with large
sections (10+ bp) of unaligned bases are rejected (they are
unreliable)
-- Switched LOD cutoff from using SLO to using regular LO
(i.e. turned of scaled LOD)
-- set PAD on discordant read overlap with breaks to 50 (from
400). 400 was way too spurious for short data
-- added local alginment check. Do initiaul BWA on just the local
region where the reads were assembled. If aligns without a clip,
then inelgible for SV
-- Added limit on homology to be < 25% of readlength. Anything
more an start to see mis-assemblies from areas where homology is
too great
## snowman 110
-- improved re-alignment filtering of discordant reads.
-- if discordant read has a better alignment else where with
BWA-MEM,
-- accept that alignment
-- fixed bug in this implementation from 109 that was causing
assert fail
## snowman 115
-- fixed bug introdued in 114 that didn't correctly parse contigs
-- added BX tag tracking
## snowman 116
-- turned off unfiltered VCF generation (memory nightmare for BX
tags)
-- added highly-parallel option
-- moved to light-weight seqlib (no functional changes)
## snowman 117
-- actually fixed the unfiltered memory issue....