Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error rate calculation bug #100

Open
benbrulotte opened this issue Jun 23, 2017 · 6 comments
Open

Error rate calculation bug #100

benbrulotte opened this issue Jun 23, 2017 · 6 comments

Comments

@benbrulotte
Copy link

When calculating phiX error rates, we're consistently seeing what appears to be artifacts at cycles 20, 52, 65, and 136 whenever we use NextSeq data. Generally the spikes are more pronounced in read 2, but are often noticeable in read1 as well. This doesn't seem to be a problem with the HiSeq data. I've included example output from BMF tools and USeq for comparison. Both were run using the same bam file.
Example:
#Cycle Read 1 Error Rate Read 2 Error Rate Read 1 Error Count Read 1 Obs Count Read 2 Error Count Read 2 Obs Count
1 0.010257257754 0.006137456503 931 90765 545 88799
2 0.003623427829 0.005892606129 329 90798 524 88925
3 0.002961348349 0.005289341584 269 90837 471 89047
4 0.002860600726 0.005526534090 260 90890 493 89206
5 0.002638754508 0.005000895015 240 90952 447 89384
6 0.003232474272 0.006084966226 294 90952 545 89565
7 0.002987960278 0.005968051403 272 91032 535 89644
8 0.003162784568 0.006263652980 288 91059 562 89724
9 0.003151145172 0.005967689858 287 91078 536 89817
10 0.003369849180 0.006118385191 307 91102 550 89893
11 0.003643267089 0.006545971838 332 91127 589 89979
12 0.003039280228 0.006942515968 277 91140 625 90025
13 0.003192364654 0.006550097141 291 91155 590 90075
14 0.003027477650 0.006877884274 276 91165 620 90144
15 0.003224462041 0.007327915923 294 91178 661 90203
16 0.003147240407 0.007144835226 287 91191 645 90275
17 0.003552787403 0.006631754924 324 91196 599 90323
18 0.003431415541 0.007337317397 313 91216 663 90360
19 0.003409153193 0.007698178319 311 91225 696 90411
20 0.005973060947 0.145218352681 545 91243 13135 90450
21 0.003769986958 0.007556925525 344 91247 684 90513
22 0.003802531368 0.007731816425 347 91255 700 90535
23 0.003638116946 0.007938873983 332 91256 719 90567
24 0.003408443295 0.008355869529 311 91244 757 90595
25 0.003804574260 0.008541632180 347 91206 774 90615
26 0.005882159279 0.009058710596 536 91123 821 90631
27 0.005826160560 0.009112371202 530 90969 826 90646
28 0.006049693124 0.008924336190 551 91079 809 90651
29 0.005505152823 0.008570672197 500 90824 777 90658
30 0.008405857695 0.008966878798 764 90889 813 90667
31 0.006187493131 0.008733913389 563 90990 792 90681
32 0.006174538010 0.009384338851 561 90857 851 90683
33 0.006389741221 0.008909471827 581 90927 808 90690
34 0.006172907308 0.009394433908 562 91043 852 90692
35 0.006430550059 0.009261403103 585 90972 840 90699
36 0.006870664676 0.011192713158 626 91112 1015 90684
37 0.007102787323 0.009407431099 647 91091 853 90673
38 0.007247572228 0.009572016189 659 90927 868 90681
39 0.007201770711 0.010444353763 654 90811 947 90671
40 0.007105871984 0.010112594978 645 90770 917 90679
41 0.007213894732 0.010023597909 655 90797 909 90686
42 0.007272607246 0.010125856230 661 90889 918 90659
43 0.007241830929 0.010545125637 658 90861 956 90658
44 0.007678220771 0.010213535692 696 90646 926 90664
45 0.007400789712 0.010192038297 671 90666 924 90659
46 0.007415417890 0.010885749264 672 90622 987 90669
47 0.007657931034 0.010093544544 694 90625 915 90652
48 0.007263922518 0.010568002559 660 90860 958 90651
49 0.007496945077 0.010709991948 681 90837 971 90663
50 0.010012969598 0.010435741864 911 90982 946 90650
51 0.008955387482 0.010139347066 817 91230 919 90637
52 0.010418264280 0.147721129340 951 91282 13389 90637
53 0.008390015225 0.011100077237 766 91299 1006 90630
54 0.008788078019 0.011300432595 802 91260 1024 90616
55 0.008714477241 0.010614821027 793 90998 962 90628
56 0.008493478810 0.011102037213 773 91011 1006 90614
57 0.008839827601 0.010984401047 804 90952 995 90583
58 0.008350822446 0.011889516912 760 91009 1077 90584
59 0.010258604610 0.011006723264 935 91143 997 90581
60 0.009520040455 0.011306921072 866 90966 1024 90564
61 0.008628923361 0.011860719374 786 91089 1074 90551
62 0.009570689353 0.011422385224 873 91216 1034 90524
63 0.008889619643 0.012031155057 811 91230 1089 90515
64 0.009263130102 0.011593466104 842 90898 1049 90482
65 0.012086185490 0.149828672488 1100 91013 13555 90470
66 0.009521605448 0.011772508705 864 90741 1065 90465
67 0.009673119085 0.012461024745 878 90767 1127 90442
68 0.010371517028 0.013237636026 938 90440 1197 90424
69 0.008924528093 0.012322054708 808 90537 1114 90407
70 0.011817800631 0.012558227022 1071 90626 1135 90379
71 0.009564938866 0.012219012518 866 90539 1104 90351
72 0.009926872043 0.012962573475 900 90663 1171 90337
73 0.010742004217 0.012620811053 973 90579 1140 90327
74 0.010231252831 0.012235632820 926 90507 1105 90310
75 0.010319757806 0.011951176288 934 90506 1079 90284
76 0.010361206230 0.012531022868 938 90530 1131 90256
77 0.009745317938 0.012267287234 882 90505 1107 90240
78 0.010045197865 0.012059811344 909 90491 1088 90217
79 0.009869911690 0.012508455405 893 90477 1128 90179
80 0.010781372799 0.012743021283 974 90341 1149 90167
81 0.011006150170 0.014853517033 995 90404 1339 90147
82 0.011340023874 0.014726118608 1026 90476 1327 90112
83 0.010597973234 0.013135687320 959 90489 1183 90060
84 0.011071211535 0.013563803197 1002 90505 1221 90019
85 0.010472013874 0.013702435989 948 90527 1233 89984
86 0.010530154030 0.013932284428 953 90502 1253 89935
87 0.010867041799 0.012638090046 983 90457 1136 89887
88 0.010635592752 0.013276650678 962 90451 1193 89857
89 0.011537227747 0.013573703316 1043 90403 1219 89806
90 0.010371583853 0.014124986076 937 90343 1268 89770
91 0.011972532949 0.014007443892 1081 90290 1257 89738
92 0.011490050082 0.013959971010 1037 90252 1252 89685
93 0.011315276177 0.014668644796 1021 90232 1315 89647
94 0.011720352609 0.014561806780 1057 90185 1305 89618
95 0.011609211971 0.014323832490 1047 90187 1283 89571
96 0.011902516944 0.014321782068 1073 90149 1282 89514
97 0.011473972724 0.014192957164 1034 90117 1270 89481
98 0.011812509714 0.014850267260 1064 90074 1328 89426
99 0.011855423829 0.014771048744 1067 90001 1320 89364
100 0.015396424559 0.015642495633 1384 89891 1397 89308
101 0.012741549257 0.015398063475 1144 89785 1374 89232
102 0.012466825754 0.015262809658 1118 89678 1361 89171
103 0.011242603550 0.015128898666 1007 89570 1348 89101
104 0.011684855533 0.014814565222 1045 89432 1319 89034
105 0.012264235473 0.015647657910 1095 89284 1392 88959
106 0.012765145992 0.015246132208 1138 89149 1355 88875
107 0.012538339681 0.015443363071 1116 89007 1371 88776
108 0.012894804946 0.016052613767 1146 88873 1423 88646
109 0.012590313236 0.015771160016 1117 88719 1396 88516
110 0.012712964740 0.014941241673 1126 88571 1321 88413
111 0.013595131993 0.016170309138 1202 88414 1428 88310
112 0.013396803808 0.016147138532 1182 88230 1424 88189
113 0.013398184029 0.016075161320 1179 87997 1415 88024
114 0.012763292725 0.015605436292 1121 87830 1371 87854
115 0.013684090390 0.015976007481 1199 87620 1401 87694
116 0.013297020186 0.015427866155 1162 87388 1350 87504
117 0.012878640052 0.015966097812 1122 87121 1394 87310
118 0.013704309372 0.016355435169 1190 86834 1425 87127
119 0.013963157591 0.017120371862 1209 86585 1488 86914
120 0.014360162131 0.016598994140 1240 86350 1439 86692
121 0.014187940831 0.016353626943 1221 86059 1414 86464
122 0.014806869455 0.016883116883 1270 85771 1456 86240
123 0.014263315548 0.015399646413 1219 85464 1324 85976
124 0.013284314877 0.016115104556 1131 85138 1381 85696
125 0.015095407694 0.016358696925 1280 84794 1397 85398
126 0.014723650233 0.016136235427 1243 84422 1373 85088
127 0.014539137884 0.016837758112 1222 84049 1427 84750
128 0.014783211435 0.017077821334 1237 83676 1442 84437
129 0.015368544911 0.017245890721 1280 83287 1450 84078
130 0.015140182406 0.016865399076 1255 82892 1413 83781
131 0.015161798570 0.016859789343 1251 82510 1407 83453
132 0.015439169429 0.016674492241 1267 82064 1385 83061
133 0.015246219085 0.015965939329 1244 81594 1320 82676
134 0.014337138490 0.016697047029 1163 81118 1374 82290
135 0.014681265267 0.016650175297 1184 80647 1363 81861
136 0.017511037941 0.169633212216 1404 80178 13819 81464
137 0.014675789342 0.016990381172 1169 79655 1376 80987
138 0.015425041373 0.016408918701 1221 79157 1321 80505
139 0.014164738194 0.016365589786 1114 78646 1310 80046
140 0.019243079917 0.017072741445 1503 78106 1358 79542
141 0.015764575465 0.017400541238 1223 77579 1376 79078
142 0.014646635303 0.016187417151 1127 76946 1270 78456
143 0.014582760112 0.015797240937 1113 76323 1231 77925
144 0.014575027064 0.017045307694 1104 75746 1319 77382
145 0.014016516616 0.016150232298 1054 75197 1241 76841
146 0.015958088245 0.017192614901 1191 74633 1313 76370
147 0.012588832487 0.013000581918 930 73875 983 75612
148 0.013825514358 0.013982841666 1012 73198 1048 74949
149 0.015234369619 0.014474622328 1106 72599 1076 74337
150 0.017427351375 0.016374142387 1257 72128 1210 73897
151 0.064963878142 0.062301963439 4685 72117 4601 73850

USeq output:
Cycle# phixbam_1 phixbam_2
1 0.018018018 0.007077983
2 0.003370511 0.004633688
3 0.002022059 0.004507889
4 0.002144345 0.004503659
5 0.001531487 0.004499438
6 0.003308013 0.005181671
7 0.002756508 0.004742294
8 0.00208244 0.004365723
9 0.002388389 0.003988782
10 0.003000796 0.00404934
11 0.002939015 0.004606287
12 0.002510409 0.00466534
13 0.002020573 0.004664759
14 0.00244903 0.00397812
15 0.002816384 0.005841775
16 0.002693768 0.004907747
17 0.002081293 0.005215771
18 0.002509487 0.00478053
19 0.002815005 0.005523148
20 0.003181401 0.004776082
21 0.002936139 0.004836609
22 0.00256865 0.004772826
23 0.002690638 0.005081175
24 0.002446034 0.006256582
25 0.002262305 0.00613079
26 0.003668379 0.007367509
27 0.00372929 0.007241443
28 0.00372929 0.00625
29 0.003240005 0.004764262
30 0.006418485 0.006557783
31 0.003178484 0.004885288
32 0.00385062 0.006612694
33 0.003545016 0.006178179
34 0.003667033 0.008090415
35 0.003177707 0.005556242
36 0.003910785 0.006295908
37 0.004093603 0.006356063
38 0.004642923 0.005429753
39 0.00507056 0.008080933
40 0.004764232 0.007338431
41 0.003724736 0.006227648
42 0.003846858 0.007706535
43 0.004213483 0.007828874
44 0.004945659 0.007210205
45 0.005189889 0.006838344
46 0.004762777 0.007698941
47 0.004151658 0.006467508
48 0.005129771 0.006898251
49 0.003786491 0.006650246
50 0.00641378 0.008432326
51 0.005436775 0.00683161
52 0.004154194 0.007322626
53 0.006168692 0.006705629
54 0.006290845 0.008304115
55 0.005373718 0.007627953
56 0.005129145 0.008182098
57 0.00531168 0.009165846
58 0.005493499 0.009410752
59 0.005065918 0.008425066
60 0.005432792 0.009040034
61 0.005005799 0.007133633
62 0.004517153 0.009346944
63 0.005616263 0.008856089
64 0.004944149 0.008486563
65 0.00561592 0.009531423
66 0.00604248 0.009101531
67 0.00598108 0.009285451
68 0.006470121 0.008795129
69 0.006714277 0.00867052
70 0.005371749 0.009717695
71 0.005799048 0.010392965
72 0.00555386 0.011127505
73 0.006591797 0.010204709
74 0.00665405 0.011252536
75 0.006592602 0.009038367
76 0.005738355 0.008855544
77 0.005678349 0.010884939
78 0.006167562 0.01020973
79 0.006900342 0.007997539
80 0.006839695 0.01058136
81 0.007450837 0.009782207
82 0.008183706 0.010154471
83 0.006230149 0.010894319
84 0.006294305 0.011636498
85 0.006233957 0.011761084
86 0.00702934 0.010839441
87 0.007703595 0.010593742
88 0.006298538 0.009920513
89 0.008013213 0.01177486
90 0.007219775 0.011346118
91 0.007713026 0.011408485
92 0.008572128 0.01289248
93 0.007286755 0.011784304
94 0.008882083 0.012096525
95 0.00710915 0.012347966
96 0.007539537 0.01185844
97 0.008338954 0.011984186
98 0.007788544 0.012663702
99 0.0084647 0.012917182
100 0.008650837 0.012800693
101 0.008836524 0.014844137
102 0.008654554 0.014416533
103 0.007674833 0.01460848
104 0.007738132 0.013126935
105 0.008968059 0.014123769
106 0.009338904 0.015494267
107 0.01026429 0.015376031
108 0.009283167 0.015633724
109 0.008609027 0.014959652
110 0.009163592 0.014902204
111 0.010150098 0.014595367
112 0.009968004 0.014975455
113 0.009909522 0.015540499
114 0.009664512 0.015982587
115 0.010407686 0.017109438
116 0.009916846 0.016183244
117 0.009921735 0.016753861
118 0.010177018 0.017011466
119 0.011599926 0.019456223
120 0.010863527 0.016414929
121 0.011485026 0.016796753
122 0.01131026 0.017004251
123 0.011128972 0.016448809
124 0.011504206 0.016209788
125 0.012810199 0.017284569
126 0.010465692 0.015922768
127 0.011031918 0.016437669
128 0.01246357 0.018452269
129 0.012347211 0.018085908
130 0.012537239 0.01815897
131 0.012670807 0.019559748
132 0.013179162 0.019078202
133 0.012753515 0.018590875
134 0.012949819 0.01948912
135 0.012960309 0.018939394
136 0.014212692 0.019020537
137 0.015097636 0.019867131
138 0.013858543 0.019631436
139 0.013614789 0.019212479
140 0.01488058 0.019874278
141 0.013395931 0.018882319
142 0.013171925 0.019315357
143 0.013062045 0.018137693
144 0.012322394 0.018822023
145 0.013153754 0.019947406
146 0.013237519 0.019980726
147 0.012450231 0.015824829
148 0.013177067 0.018151173
149 0.01206196 0.015821343
150 0.013987792 0.017161501
151 0.04317965 0.058938712

@dnbaker
Copy link
Contributor

dnbaker commented Jun 28, 2017

If it's machine-specific, I'm not certain it's a bug... but could I look at such a bam to investigate?

@benbrulotte
Copy link
Author

What's the best way to send you a bam file? It'll be about 25Mb

@dnbaker
Copy link
Contributor

dnbaker commented Jul 1, 2017

Is that too big to attach here? If not, you could put it somewhere on CHPC for me to get.

@dnbaker
Copy link
Contributor

dnbaker commented Jul 7, 2017

Any update on this?

@benbrulotte
Copy link
Author

I've sent you a download link to your email listed here

@dnbaker
Copy link
Contributor

dnbaker commented Jul 28, 2017

Hi Ben,

I've glanced over the data and tried to mess around with it, I don't have a copy of your genome; if you'd like, I can give you a patched version which writes reads with errors to a separate file with annotations for the number of errors it found. Then you could visually inspect errors and see whether or not you think it's working correctly.

Might you be willing to try something of that sort?

On a simpler scale, running the code on a dataset with valgrind (valgrind bmftools err <...>) would let you know if/what memory concerns there might be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants