Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #22

Merged
merged 103 commits into from
Sep 25, 2024
Merged

Dev #22

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
d1e7eb3
Need rust >= v1.77.0 to build sbwt
tmaklin Sep 21, 2024
a92fd0e
Add integration test for sablast::map.
tmaklin Sep 21, 2024
240776c
Remove overlapping cases from ms_to_run, write documentation.
tmaklin Sep 22, 2024
972f1d8
Add docs to random_match_threshold
tmaklin Sep 22, 2024
4895729
Add bounds checking
tmaklin Sep 22, 2024
09f1fc0
Characters -> bases in documentation.
tmaklin Sep 22, 2024
ff2847b
Add k-bounded.
tmaklin Sep 22, 2024
24efa30
Clarify documentation
tmaklin Sep 22, 2024
529f9c2
Merge run_to_aln and translate_runs.
tmaklin Sep 22, 2024
280e4a3
Update test values.
tmaklin Sep 22, 2024
9b66150
Rewrite translate_runs and re-enable its test.
tmaklin Sep 22, 2024
9ddf0d6
Use slice arguments in translate_runs
tmaklin Sep 22, 2024
eb45d4f
More documentation.
tmaklin Sep 22, 2024
542b723
Create new module format for formatting results and move run_lengths.
tmaklin Sep 22, 2024
cee4b2a
Update integration test map_clbs.rs
tmaklin Sep 22, 2024
2c4b016
Remove use of TranslateParams, replace with access to k, threshold.
tmaklin Sep 22, 2024
797f04c
Rename map.rs -> translate.rs
tmaklin Sep 22, 2024
1264c75
Split translate.rs into translate.rs and derandomize.rs
tmaklin Sep 22, 2024
de6ce43
Docs formatting.
tmaklin Sep 23, 2024
397e739
Better variable names.
tmaklin Sep 23, 2024
0249eab
Remove passing the noisy MS vec to translate_ms_vec (not needed).
tmaklin Sep 23, 2024
dd6c462
Add asserts ti translate_ms_val
tmaklin Sep 23, 2024
0f841f0
Update parameter names in documentation.
tmaklin Sep 23, 2024
e2504c6
Fix compiler warnings.
tmaklin Sep 23, 2024
81e6aee
Fix initialising derandomized ms vec when last MS is noise.
tmaklin Sep 23, 2024
1842d0a
Update test
tmaklin Sep 23, 2024
ffc15e0
Add tests for translate_ms_val.
tmaklin Sep 23, 2024
3ba1264
Add a more comprehensives test for translate_ms_vec that fails.
tmaklin Sep 23, 2024
13c4b9b
Take &[u8] in query_sbwt instead of the file name.
tmaklin Sep 23, 2024
dcb3816
Rewrite and document load_sbwt
tmaklin Sep 23, 2024
d21abfa
Document query_sbwt
tmaklin Sep 23, 2024
744ecdc
Rewrote serialize_sbwt
tmaklin Sep 23, 2024
bd6f65e
Rewrite build_index
tmaklin Sep 23, 2024
9e9343a
Rename build_sbwt to build_sbwt_from_file
tmaklin Sep 23, 2024
aa49d34
Add build_sbwt_from_vecs to build from sequences in memory.
tmaklin Sep 23, 2024
90d0c37
Always use sbwt::IndexVariant outside of index.rs.
tmaklin Sep 23, 2024
04e5ad5
Unwrap the lcs array when returning.
tmaklin Sep 23, 2024
5ac3fc4
Update calls to index::
tmaklin Sep 23, 2024
80ead9b
Document index.rs with examples.
tmaklin Sep 23, 2024
1f2eb29
Add tests.
tmaklin Sep 23, 2024
cbef3f5
Fix calling fns that take Some(BuildOpts) with None.
tmaklin Sep 23, 2024
a33a0fb
Fix failing test for translate_ms_vec.
tmaklin Sep 23, 2024
dc0ef3d
Clear todo in translate_ms_vec test.
tmaklin Sep 23, 2024
9620776
Add documentation examples to tests.
tmaklin Sep 23, 2024
fef4ebe
Add input checks to build_sbwt_from_vecs and query_sbwt.
tmaklin Sep 23, 2024
347bc50
Add examples to translate.rs documentation.
tmaklin Sep 23, 2024
1b28adb
Always run all tests.
tmaklin Sep 23, 2024
8fc1d73
Add more documentation and examples.
tmaklin Sep 23, 2024
4f32c0c
Fix race condition in documentation tests.
tmaklin Sep 23, 2024
55e95e1
Add test cases for derandomize_ms_val.
tmaklin Sep 23, 2024
6042335
Documentation and examples for derandomize.rs
tmaklin Sep 23, 2024
c458b25
Fix comparison in build_serialize_load_sbwt.
tmaklin Sep 23, 2024
f2fcf07
Patch needletail to respect compression features; disable xz,bzip2.
tmaklin Sep 23, 2024
af3e231
Remove patch.crates-io (use build.rs script).
tmaklin Sep 23, 2024
2a1f592
Revert "Remove patch.crates-io (use build.rs script)."
tmaklin Sep 23, 2024
48f4f1d
Revert "Patch needletail to respect compression features; disable xz,…
tmaklin Sep 23, 2024
71a05e5
Fix clippy suggestions.
tmaklin Sep 23, 2024
ce30f61
Add examples and tests with recombination.
tmaklin Sep 23, 2024
f8f75a4
Handle multiple files in seq_files by indexing them all at once.
tmaklin Sep 24, 2024
62a897e
Remove --input list from build and index everything to same file.
tmaklin Sep 24, 2024
1588644
Move sablast build implementation to lib.rs.
tmaklin Sep 24, 2024
249743e
Add documentation to build.
tmaklin Sep 24, 2024
59be17c
Minor fixes.
tmaklin Sep 24, 2024
98193b6
Implement multiple queries in map and parallelise over input files.
tmaklin Sep 24, 2024
0c1df97
Move implementation of sablast map to find() in lib.rs.
tmaklin Sep 24, 2024
9ee2e68
Rename sablast map to sablast find.
tmaklin Sep 24, 2024
fccc315
Take &str instead of &String.
tmaklin Sep 24, 2024
127fe64
Add documentation to find.
tmaklin Sep 24, 2024
a7d046e
Rename map to matches.
tmaklin Sep 24, 2024
af8c9fa
Document matches.
tmaklin Sep 24, 2024
171b52d
Fix typos.
tmaklin Sep 24, 2024
73cfef3
Use matches in the test instead of map.
tmaklin Sep 24, 2024
29facac
Revert "Use matches in the test instead of map."
tmaklin Sep 24, 2024
221958d
Call map via the new find syntax.
tmaklin Sep 24, 2024
dc281f9
Take &[u8] in find, matches and handle input reading elsewhere.
tmaklin Sep 25, 2024
6bcc2e3
Update test
tmaklin Sep 25, 2024
f569730
Handle fragmented queries in sablast find.
tmaklin Sep 25, 2024
872b97a
Don't add strand in find.
tmaklin Sep 25, 2024
f2a69b2
Add back reverse complements to sablast find.
tmaklin Sep 25, 2024
f0063d8
Update documentation
tmaklin Sep 25, 2024
989e4e8
Update test.
tmaklin Sep 25, 2024
24d62bd
Update examples
tmaklin Sep 25, 2024
4421615
Take seq data instead of file names in build
tmaklin Sep 25, 2024
96a8de9
Read input files in main.
tmaklin Sep 25, 2024
85661e7
Move input format detection docs to main.rs
tmaklin Sep 25, 2024
8b3d658
Remove build_sbwt_from_file and the needletail wrapper.
tmaklin Sep 25, 2024
5a8d8c7
Update
tmaklin Sep 25, 2024
f6c6991
Add todo to fix a bug
tmaklin Sep 25, 2024
f346124
Implement sablast map to map a query to a reference.
tmaklin Sep 25, 2024
112956a
Only a single file at a time to sablast map.
tmaklin Sep 25, 2024
f66759e
Move formatting to format.rs
tmaklin Sep 25, 2024
5ce8b3e
Read all data from query in sablast map, only take 1 query at a time
tmaklin Sep 25, 2024
75a0f53
Move implementation of most file reads to read_fastx_file.
tmaklin Sep 25, 2024
b16cf1a
Document map
tmaklin Sep 25, 2024
e72dbc7
Map doesn't use rayon so remove it.
tmaklin Sep 25, 2024
9d5dcfd
Fix derandomizing the first matching statistic.
tmaklin Sep 25, 2024
4234bf2
Add hidden asserts to documentation tests.
tmaklin Sep 25, 2024
2d71495
Add hidden asserts to documentation tests.
tmaklin Sep 25, 2024
0948da9
Add hidden asserts where possible.
tmaklin Sep 25, 2024
7485d65
Add hidden asserts.
tmaklin Sep 25, 2024
1147b8a
Disable compression in needletail.
tmaklin Sep 25, 2024
b24463b
Comment out the test.
tmaklin Sep 25, 2024
8ba9862
Add reminder to implement refine_translation.
tmaklin Sep 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,14 @@ jobs:
steps:
- uses: actions/checkout@v4

- name: Setup
- name: Setup toolchain
run: rustup update ${{ matrix.toolchain }} && rustup default ${{ matrix.toolchain }}
- name: Build

- name: Build binary
run: cargo build --verbose

- name: Run tests
run: cargo test --verbose
- name: Run unit and integration tests
run: cargo test --no-fail-fast --verbose

- name: Run documenation examples as tests
run: cargo test --doc --verbose
10 changes: 7 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
name = "sablast"
version = "0.1.0"
edition = "2021"
rust-version = "1.77.0"
authors = ["Tommi Mäklin <[email protected]>"]
description = "Spectral Burrows-Wheeler transform accelerated local alignment search"
readme = "README.md"
Expand All @@ -13,16 +14,19 @@ license = "MIT OR Apache-2.0"

[dependencies]
## core
needletail = "0.5.1"
# TODO Re-enable reading compressed sequences in needletail
# This requires resolving the libllzma linker issue in build_artifacts.yml
needletail = { version = "0.5.1", default-features = false }
rayon = "1"
sbwt = "0.3.1"

## cli
clap = { version = "4.4.18", features = ["derive"] }
clap = { version = "4", features = ["derive"] }

## logging
log = "0.4.20"
stderrlog = "0.6.0"

[dev-dependencies]
## tests
assert_approx_eq = "1.1.0"
assert_approx_eq = "1"
44 changes: 31 additions & 13 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,43 +29,61 @@ pub enum Commands {
#[arg(group = "input", required = true)]
seq_files: Vec<String>,

// Input sequence list
#[arg(short = 'l', long = "input-list", group = "input", required = true, help_heading = "Input")]
input_list: Option<String>,

// Outputs
#[arg(short = 'o', long = "out-prefix", required = false, help_heading = "Output")]
output_prefix: Option<String>,

// Resources
// Threads
// // Threads
#[arg(short = 't', long = "threads", default_value_t = 1)]
num_threads: usize,
// Memory in GB
// // Memory in GB
#[arg(short = 'm', long = "memory", default_value_t = 4)]
mem_gb: usize,
// Temporary directory
// // Temporary directory
#[arg(long = "tmp-dir", required = false)]
temp_dir: Option<String>,

// Verbosity
#[arg(long = "verbose", default_value_t = false)]
verbose: bool,
},

// Map query against SBWT index
Map {
// Find indexed k-mers in a query
Find {
// Input fasta or fastq query file(s)
#[arg(group = "input", required = true)]
seq_files: Vec<String>,

// Input sequence list
#[arg(short = 'l', long = "input-list", group = "input", required = true, help_heading = "Input")]
input_list: Option<String>,

// Index name
#[arg(short = 'i', long = "index", required = true, help_heading = "Index")]
index_prefix: Option<String>,

// Resources
// // Threads
#[arg(short = 't', long = "threads", default_value_t = 1)]
num_threads: usize,

// Verbosity
#[arg(long = "verbose", default_value_t = false)]
verbose: bool,
},

// Map a query or queries to a reference and return the alignment
Map {
// Input fasta or fastq query file(s)
#[arg(group = "input", required = true)]
query_file: String,

// Reference fasta
#[arg(short = 'r', long = "reference", required = true, help_heading = "Input")]
ref_file: String,

// Resources
// // Threads
#[arg(short = 't', long = "threads", default_value_t = 1)]
num_threads: usize,

// Verbosity
#[arg(long = "verbose", default_value_t = false)]
verbose: bool,
Expand Down
Loading
Loading