You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having some with the Fermi assembler. I've tried various versions of the attached code; none manage to produce any contigs. Further, calling GetSequences() after calling PerformAssembly produces mangled strings (e.g., ascii value of 4 for an 'A'), which is probably a bad sign!
UnalignedSequenceVector seqs = f.GetSequences();
cout << seqs.size() << " also is the size!\n";
for (unsigned i = 0; i < seqs.size(); ++i)
cout << seqs[i].Seq << endl;
Thank you for the detailed report. I was able to recreate the issue, and see two things to address.
The first is easy enough. fermi-lite actually deallocates the input sequences after assembly. This is why calling GetSequences afterward is giving garbled answers. I need to 1) make this clear in the documentation and 2) have GetSequences return a zero-length vector if the sequences have been deallocated from the FermiAssembler object.
The second, the failure to assemble contigs in this example, is more baffling to me. I think this is more in the scope of figuring out the right parameter settings in fermi-lite for this assembly. I'll look into this more, but this will likely take longer to address.
I'm having some with the Fermi assembler. I've tried various versions of the attached code; none manage to produce any contigs. Further, calling GetSequences() after calling PerformAssembly produces mangled strings (e.g., ascii value of 4 for an 'A'), which is probably a bad sign!
Here's the (very rough) source:
#include
#include
#include "SeqLib/BamReader.h"
#include "SeqLib/FastqReader.h"
#include "SeqLib/BWAWrapper.h"
#include "SeqLib/FermiAssembler.h"
#include "SeqLib/UnalignedSequence.h"
using namespace SeqLib;
using namespace std;
const string REFGENOME = "chrAll.standardChroms.fa";
int
main(int argc, char **argv) {
/* Mapping w/ BWA
BWAWrapper bwa;
bwa.LoadIndex(REFGENOME);
bwa.SetGapOpen(4);
bwa.SetGapExtension(1);
bwa.Set3primeClippingPenalty(1000);
bwa.Set5primeClippingPenalty(1000);
BamRecordVector results;
for (string line; getline(cin, line) ; ) {
bwa.AlignSequence(line, "m", results, false, 0.9, 0);
}
*/
if (argc < 2) {
cerr << "I need a fastq file to work!\n" << endl;
return 1;
}
FermiAssembler f;
FastqReader q(argv[1]);
UnalignedSequence s;
while (q.GetNextSequence(s) )
f.AddRead(s);
UnalignedSequenceVector seqs = f.GetSequences();
cout << seqs.size() << " also is the size!\n";
for (unsigned i = 0; i < seqs.size(); ++i)
cout << seqs[i].Seq << endl;
f.SetMinOverlap(10);
f.SetKmerMinThreshold(1);
f.SetKmerMaxThreshold(1000);
f.PerformAssembly();
// retrieve the contigs
std::vectorstd::string contigs = f.GetContigs();
cerr << contigs.size() << " contigs made! " << endl;
seqs = f.GetSequences();
const char *seq = seqs[0].Seq.c_str();
unsigned len = seqs[0].Seq.length();
cout << len << " is the length...? " << ((int)seq[0]) << " is the first char" << endl;
return 0;
}
Here's the fastq file (please forgive the copy pasting!):
@M01451:39:000000000-AGNHG:1:1101:18315:2050 1:N:0:25
ATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATT
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFDFFGGGGEEFGFGGGGGGGGGGGGGGGGGGGGGFF
@M01451:39:000000000-AGNHG:1:1101:15489:2067 1:N:0:25
ATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATT
CEGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@M01451:39:000000000-AGNHG:1:1101:11753:2476 1:N:0:25
ATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATT
GGGGFFGGGGFGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGFGGGGGGGGGFECFGGGFGGFFGGGGGGGGGFGGGGGGGGGGGG
@M01451:39:000000000-AGNHG:1:1101:8601:2804 1:N:0:25
ATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATT
GCGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGFGGGFFGFGGGGGGGGGGGGGGFGGGGGGFG
@M01451:39:000000000-AGNHG:1:1101:19616:3182 1:N:0:25
ATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATT
GGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGFGGGGGGGGGGGEFFGGGGEGG
@M01451:39:000000000-AGNHG:1:1101:12281:3408 1:N:0:25
ATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATT
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFGGGGGGGGG
@M01451:39:000000000-AGNHG:1:1101:9468:3637 1:N:0:25
ATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATT
FFFGFGGGGGGEFE<EGGGGGGFEGGGGGGGFGGGFFGGDFCFFAFGGGGDEEFGGFFGGGGGGFFFCFFCFDFGCG@EFFDFGGFGGGGGGGGGG
@M01451:39:000000000-AGNHG:1:1101:15793:4032 1:N:0:25
ATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATT
C6C@,CFFEFGGFFEFEGGGCFEF<EGGCGFFGGDFFFFGFGGGGGGGGFFGFGFGCFGGGGGGBFFFGFGGGGFFGGGGGGGGGGGGFGGGGGGGGGFCFFFG
The text was updated successfully, but these errors were encountered: