-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0 sequences from selection found #8
Comments
This suggests that your metadata doesn't match the sequences provided. Could also be that the GISAID formatting has changed again. Could you check whether the sequence identifiers in your fasta file are of the form |
Sure...There are quite some changes is which affect the pipeline. could please have a look at it. I am using the latest script in the pipeline folder |
Ok I see the problem, the sequence identifiers are again formatted differently. It's an easy fix, I'll try to do it tomorrow. |
Did get a moment to fix it? |
Just made some changes, could you pull the latest commit and give it a try? |
I just downloaded the most recent GISAID data and the formatting hasn't changed. It seems the data you have shown above is actually older, it corresponds to the format I encountered ~9 months ago. So you could try the files in the |
The N-Content line also affects select_sample script |
Yes, it uses this information. In the version from 9 months ago (see |
Could you please download hcov_africa.fasta and hcov_africa.tsv and try running on the scripts without changing anything? That is what I am using and getting errors. There is data in the manuscript folder.. What I have is from GISAID and I think it latest since I downloaded it last week. I had narrow down to specific regions since I wanted just a few data to test the pipeline with my data firs |
Ah so what you're using is not actually the full GISAID data, also not for Africa. These are Auspice files which are used for visualisation with Nextstrain (https://docs.nextstrain.org/projects/auspice/en/stable/) and they only have very few sequences. For building a good reference set you need the full GISAID database [GISAID -> EpiCoV -> Downloads -> Download packages -> FASTA (for the sequences) and metadata (for the metadata)] |
If it helps, I can build an Africa-specific reference set and share the sequences / sequence identifiers. |
Weird enough I can't see the download packages tab . GISAID -> EpiCoV ->
Downloads -> The download tab takes me to below. I can not see the option
download packages then FASTA (for the sequences) and metadata (for the
metadata)].
After downloads, it shows a tab for Alignment and proteins, submission and
variant stats and finally genomic epidemiology tab. I have share a screen
snap to your email
…On Tue, Nov 30, 2021 at 12:13 PM jbaaijens ***@***.***> wrote:
Ah so what you're using is not actually the full GISAID data, also not for
Africa. These are Auspice files which are used for visualisation with
Nextstrain (https://docs.nextstrain.org/projects/auspice/en/stable/) and
they only have very few sequences. For building a good reference set you
need the full GISAID database [GISAID -> EpiCoV -> Downloads -> Download
packages -> FASTA (for the sequences) and metadata (for the metadata)]
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZTQ2U3INQDXRZ5DYHCEOLUOSITTANCNFSM5I27YN2Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I will appreciate it if I can get this. |
I have tried downloading few sequences around 100 per lineage, but I am getting 0 sequences found from selection. Could please help out on this. |
Could you again post what your sequence identifiers look like in the fasta file? |
I will build it. Can you send me an email at j.a.baaijens[at]tudelft.nl? |
I have sent you an email. Please have a look at it |
|
I am facing the exact same issue. I couldn't get the desired details from GISAID, so I prepared the .tsv using details and accession IDs plus Fasta for those sequences. It gave me the above-mentioned error. Now, I am trying to use fasta and tsv for region Asia and country India from GISAID download section. Will it help me to run the code if I rearrange the .tsv as given in the example? |
Unfortunately the GISAID metadata headers have changed over time, so yes, it should be resolved by renaming columns in the metadata. You could also try VLQ-nf, a nextflow implementation of our pipeline: https://github.com/rki-mf1/vlq-nf |
Why I am getting below when I run the preprocessing script?
1323 sequences selected
searching fasta and writing sequences to output directory...
3679 sequences from input fasta processed
0 sequences from selection found
The text was updated successfully, but these errors were encountered: