Skip to content

Commit

Permalink
use .fai index to determine total query length for logging/progress
Browse files Browse the repository at this point in the history
  • Loading branch information
ekg committed Nov 3, 2020
1 parent 7b5ebf9 commit 5c6ae5d
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 6 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,13 @@ Self-mapping of sequences:
wfmash -X query.fa query.fa >aln.paf
```

## sequence indexing

`wfmash` provides a progress log that estimates time to completion.
This depends on determining the total query sequence length.
To prevent lags when starting a mapping process, users should apply `samtools index` to index query and target FASTA sequences.
The `.fai` indexes are then used to quickly compute the sum of query lengths.

## installation

Follow [`INSTALL.txt`](INSTALL.txt) to compile and install wfmash.
Expand Down
15 changes: 15 additions & 0 deletions src/common/filesystem.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#pragma once

#include <sys/stat.h>

namespace fs {
/**
* Check if a file exists
* @return true if and only if the file exists, false else
*/
bool file_exists(const std::string& file) {
struct stat buf;
return (stat(file.c_str(), &buf) == 0);
}

}
29 changes: 23 additions & 6 deletions src/map/include/computeMap.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
//External includes
#include "common/seqiter.hpp"
#include "common/progress.hpp"
#include "common/filesystem.hpp"

namespace skch
{
Expand Down Expand Up @@ -124,13 +125,29 @@ namespace skch
uint64_t total_seqs = 0;
uint64_t total_seq_length = 0;
for(const auto &fileName : param.querySequences) {
seqiter::for_each_seq_in_file(
fileName,
[&](const std::string& seq_name,
const std::string& seq) {
// check if there is a .fai
std::string fai_name = fileName + ".fai";
if (fs::file_exists(fai_name)) {
// if so, process the .fai to determine our sequence length
std::string line;
std::ifstream in(fai_name.c_str());
while (std::getline(in, line)) {
++total_seqs;
total_seq_length += seq.size();
});
auto p1 = line.find('\t');
auto p2 = line.find('\t', p1);
total_seq_length += std::stoi(line.substr(p1, p2));
}
} else {
// if not, warn that this is expensive
std::cerr << "[wfmash::skch::Map::mapQuery] WARNING, no .fai index found for " << fileName << ", reading file to sum sequence length (slow)" << std::endl;
seqiter::for_each_seq_in_file(
fileName,
[&](const std::string& seq_name,
const std::string& seq) {
++total_seqs;
total_seq_length += seq.size();
});
}
}

progress_meter::ProgressMeter progress(total_seq_length, "[wfmash::skch::Map::mapQuery] mapped");
Expand Down

0 comments on commit 5c6ae5d

Please sign in to comment.