-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add probability of overlap and weighted containment for Multisearch m…
…atches (#458) * Add probability of overlap and weighted containment to multisearch result * Start writing prob_overlap * Couldn't figure out how to get prob_overlap.rs to import .. putting into utils.rs for now * Trying to get prob overlap to at least import properly * Start writing a merge_all_minhashes function * Write in commented code what needs to happen * Remove mut from unused variables for now * wrote function to merge all minhashes of a vector of signatures * Added mege_all_minhashes to multisearch * Add crates for stable calculation of log values * Add dependencies for stable calculation of log values in Cargo.lock * Add rust decimal with math feature * Add function to get probability of overlap between specific intersection hashes of all queries and all database minhash * Call probability of overlap between queries and database * I'm getting too confused by rust_decimal .. let's go back to using the standard library * Add adjusted prob_overlap to MultiSearchResult * Getting prob_overlap to actually work * Add failing test for test_multisearch.py * Fix n_comparisons to be float, remove commented out pseudocode * Remove unnecessary parens * Added prob_overlap, prob_overlap_adjusted, containment_adjusted, containment_adjusted_log10 values to test_multisearch * Add print statements * Add containment_adjusted_log10 * Fix compiler errors * Fix rounding for prob_overlap, prob_overlap_adjusted, containment_adjusted, containment_adjusted_log10 * Move probability of overlap code into separate search_significance module * add tf_idf_score to test_multisearch.py * Add tf_idf_score to MultiSearchResult * Make separate "againsts" as Vec<Sig> * Get TF-IDF running * remove print statements and commented out code * Remove print statements, commented out code, add todos * Fix optional boolean types for prob_overlap and tf idf * Add multisearch test of protein with abundance * Remove part_001 from signature filename * Delete old part_001 file * Remove too big sig from test data * Add test of probability of overlap with multisearch * Add --prob argument * Precompute frequencies for queries and againsts, save as HashMaps for fast lookups * Use L2 norm for tf idf, add more print messages * Use par_iter whenever possible * Remove logsumexp from files * Add failing test to make sure prob_overlap only gets computed when --prob-overlap specified ' * Remove logsumexp from rust file * Try to make prob_overlap calculation optional * Make prob_overlap an optional column * remove unused and commented out code * add comment for estimate_prob_overlap * Remove `let` keyword to stop "shadowing" the variables * add par_bridge() after iter_mins() for parallel computation * Remove `let` from creating precomupted HashMaps for search significance and TF-IDF * Remove checking for non-existence of prob_overlap when it really should be there * remove unsed 'mut' * Add float_round function * Fix missing bracket * Rename unused hashval variable -> _hashval * Update protein fasta paths in test_sketch.py ... but also run black formatting * Add comment about minhash not being defined * remove commented out code * Add clarification about squaring 1 * Apply `cargo fix --lib -p sourmash_plugin_branchwater` * Remove unused import * Just kidding, that import was used * Fix SmallSignature import * Fix weirdness for test_simple_ani and test_simple_prob_overlap caused by merge conflicts * Run black and fix zip True/False in test_against_multisigfile * whitespace * formatting * "syn" package appeared twice * Trailing whitespace * Add protein k5 signature * Apply black formatting to everytthing * Merge black-applied python test files * Missed some merge markers * Missed more merge markers... * Fix black in test_multisearch.py * Remove commented out code * unwrap -> expect * Modularize the probability of overlap computation into functions * set values for prob_overlap results in the if statement * Add longer argument name and description * Cargo fmt * Borrow 'selection' * Clone selection * Add longer argument name * Use `new_selection` to set scaled * Add @pytest.mark.xfail(reason="should work, bug") to `test_fastgather.py:test_against_multisigfile` * Revert test_against_multisigfile back to main * Remove .clone() from selection
- Loading branch information
Showing
11 changed files
with
1,054 additions
and
93 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.