Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get -A to work? #199

Open
subwaystation opened this issue Sep 11, 2023 · 3 comments
Open

How to get -A to work? #199

subwaystation opened this issue Sep 11, 2023 · 3 comments

Comments

@subwaystation
Copy link
Contributor

subwaystation commented Sep 11, 2023

I am trying to use -A to only map a subpart of my input sequences. So I can parallelize the process. However, I was not able to do so.

wfmash cerevisiae.pan.fa.gz cerevisiae.pan.fa.gz -m -X -A seq_names-00.txt 
[mashmap] MashMap v3.1.1
[mashmap] Reference = [cerevisiae.pan.fa.gz]
[mashmap] Query = [cerevisiae.pan.fa.gz]
[mashmap] Kmer size = 19
[mashmap] Sketch size = 298
[mashmap] Segment length = 5000 (read split allowed)
[mashmap] Block length min = 25000
[mashmap] Chaining gap max = 20000
[mashmap] Mappings per segment = 1
[mashmap] Percentage identity threshold = 90%
[mashmap] Skip self mappings
[mashmap] Hypergeometric filter w/ delta = 0 and confidence 0.999
[mashmap] Mapping output file = /dev/stdout
[mashmap] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[mashmap] Execution threads  = 1
[wfmash::for_each_seq_in_file] could not fetch S288C.chrX from index
[wfmash::for_each_seq_in_file] could not fetch S288C.chrIX from index
[wfmash::for_each_seq_in_file] could not fetch S288C.chrVIII from index
[wfmash::for_each_seq_in_file] could not fetch S288C.chrVII from index
[wfmash::for_each_seq_in_file] could not fetch S288C.chrVI from index
[wfmash::for_each_seq_in_file] could not fetch S288C.chrV from index
[wfmash::for_each_seq_in_file] could not fetch S288C.chrIV from index
[wfmash::for_each_seq_in_file] could not fetch S288C.chrIII from index
[wfmash::for_each_seq_in_file] could not fetch S288C.chrII from index
[wfmash::for_each_seq_in_file] could not fetch S288C.chrI from index
[mashmap::skch::Sketch::build] ERROR: No sequences indexed!

cat cerevisiae.pan.fa.gz.fai | grep "S288C.chrX"
S288C.chrX	751611	6027531	50	51
S288C.chrXI	666862	6794188	50	51
S288C.chrXII	1075542	7474402	1075542	1075543
S288C.chrXIII	930506	8549960	50	51
S288C.chrXIV	777615	9499091	50	51
S288C.chrXV	1091343	10292272	50	51
S288C.chrXVI	954457	11405456	50	51
wfmash cerevisiae.pan.fa.gz cerevisiae.pan.fa.gz -m -X -A prefixes-0.txt 
[mashmap] MashMap v3.1.1
[mashmap] Reference = [cerevisiae.pan.fa.gz]
[mashmap] Query = [cerevisiae.pan.fa.gz]
[mashmap] Kmer size = 19
[mashmap] Sketch size = 298
[mashmap] Segment length = 5000 (read split allowed)
[mashmap] Block length min = 25000
[mashmap] Chaining gap max = 20000
[mashmap] Mappings per segment = 1
[mashmap] Percentage identity threshold = 90%
[mashmap] Skip self mappings
[mashmap] Hypergeometric filter w/ delta = 0 and confidence 0.999
[mashmap] Mapping output file = /dev/stdout
[mashmap] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[mashmap] Execution threads  = 1
[wfmash::for_each_seq_in_file] could not fetch DBVPG6765 from index
[wfmash::for_each_seq_in_file] could not fetch S288C from index
[mashmap::skch::Sketch::build] ERROR: No sequences indexed!

cat prefixes-0.txt
S288C
DBVPG6765

Any help would be appreciated!

@subwaystation
Copy link
Contributor Author

subwaystation commented Sep 11, 2023

It looks like you didn't implement this option out?

for (const auto& name : keep_seq) {
if (found_seq.find(name) == found_seq.end())
{
std::cerr << "[wfmash::for_each_seq_in_file] could not fetch " << name << " from index" << std::endl;
}
}

@subwaystation
Copy link
Contributor Author

subwaystation commented Sep 11, 2023

At least -P works, but wfmash cerevisiae.pan.fa.gz cerevisiae.pan.fa.gz -m -X -P "S288C.chrX" -t 16 also gives me results for "S288C.chrXI". Which is in the given prefix. But maybe for better parallelization I don't want to have that. So please fix -A.

@subwaystation
Copy link
Contributor Author

Comparing the concatenated PAF of all prefix with a full run, I can't spot any line that looks the same. Do I get different mapping results applying -P?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant