-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support standalone manifests containing zip files #266
Comments
note: #364 temporarily removes the text from the docs that suggests using manifest CSVs. |
I dug into this some more today. The chain of code is: In sourmash_plugin_branchwater/src/utils.rs Lines 567 to 576 in 2ea4683
then, the resulting sourmash_plugin_branchwater/src/utils.rs Lines 444 to 455 in 2ea4683
this, in turn, hits pub fn sig_from_record(&self, record: &Record) -> Result<SigStore> {
let match_path = record.internal_location().as_str();
let selection = Selection::from_record(record)?;
let sig = self.storage.load_sig(match_path)?.select(&selection)?;
assert_eq!(sig.signatures.len(), 1);
Ok(sig) which executes fn load_sig(&self, path: &str) -> Result<SigStore> {
let raw = self.load(path)?;
let sig = Signature::from_reader(&mut &raw[..])?
// TODO: select the right sig?
.swap_remove(0);
Ok(sig.into()) which only understands JSON files stored on the file system. It seems to me that the key problem with dealing with this generically (i.e. in sourmash, letting manifests load sketches from other .zip files) is this set of calls, self.storage.load_sig(match_path)?.select(&selection)?; Here, (Then again, this puzzles me, because we do support loading multiple sketches from one .sig file, so ... more digging needed ;)) There is a separate problem in that Anyway: in looking at a short-term fix, I think I need to override the |
over in sourmash-bio/sourmash#3303, I am trying out the following code (in pub fn sig_from_record2(&self, record: &Record) -> Result<SigStore> {
eprintln!("fetching: {:?}", record);
let match_path = record.internal_location().as_str();
Ok(match match_path {
x if x.ends_with(".sig") || x.ends_with(".sig.gz") => {
let selection = Selection::from_record(record)?;
let sig = self.storage.load_sig(x)?.select(&selection)?;
assert_eq!(sig.signatures.len(), 1);
sig
}
x if x.ends_with(".zip") => {
let zipcoll = Collection::from_zipfile(x)?;
let ziprec = zipcoll.manifest().iter().find(|r| {
r.md5() == record.md5() && r.name() == record.name()
}).unwrap();
eprintln!("ziprec: {}", ziprec.md5());
let sig = zipcoll.sig_from_record(ziprec)?;
sig
// todo!("more zip better")
}
_ => todo!("unknown, dying now")
})
} This implementation is clearly Bad, and violates the way We could imagine implementing a
A sourmash focused solution would be to add a new implementation of One solution that would work entirely within the branchwater plugin: build our own |
Also: right now it's hard to see the Rust And, also, supports the idea of building a new type of |
#430 has settled on the |
build a manifest containing zip files with
sig collect
:try to run manysearch on it:
and you will get:
Also note that we should load files from within manifests as if they are relative to the manifest dir per sourmash-bio/sourmash#3054 and sourmash-bio/sourmash#3008 (comment).
Originally noted in #237.
The text was updated successfully, but these errors were encountered: