0 sequences from selection found #8

chirrie · 2021-11-26T16:23:38Z

Why I am getting below when I run the preprocessing script?

1323 sequences selected
searching fasta and writing sequences to output directory...
3679 sequences from input fasta processed
0 sequences from selection found

jbaaijens · 2021-11-26T20:12:31Z

This suggests that your metadata doesn't match the sequences provided. Could also be that the GISAID formatting has changed again. Could you check whether the sequence identifiers in your fasta file are of the form <Virus name>|<Collection date>|<Submission date> as given in the metadata?

chirrie · 2021-11-27T09:05:50Z

Sure...There are quite some changes is which affect the pipeline. could please have a look at it. I am using the latest script in the pipeline folder

jbaaijens · 2021-11-28T19:46:15Z

Ok I see the problem, the sequence identifiers are again formatted differently. It's an easy fix, I'll try to do it tomorrow.

chirrie · 2021-11-29T16:20:12Z

Did get a moment to fix it?
Many thanks

jbaaijens · 2021-11-29T21:06:35Z

Just made some changes, could you pull the latest commit and give it a try?

jbaaijens · 2021-11-30T08:38:33Z

I just downloaded the most recent GISAID data and the formatting hasn't changed. It seems the data you have shown above is actually older, it corresponds to the format I encountered ~9 months ago. So you could try the files in the manuscript folder for processing your data. However, I strongly recommend you to download the latest version of the full GISAID database and work with the scripts in pipeline.

chirrie · 2021-11-30T08:41:33Z

The N-Content line also affects select_sample script

jbaaijens · 2021-11-30T08:44:37Z

Yes, it uses this information. In the version from 9 months ago (see manuscript) we calculated N-content ourselves, but in the mean time it's part of the GISAID metadata.

chirrie · 2021-11-30T08:50:44Z

Could you please download hcov_africa.fasta and hcov_africa.tsv and try running on the scripts without changing anything? That is what I am using and getting errors.
I downloaded from region-specific Auspice source files

There is data in the manuscript folder.. What I have is from GISAID and I think it latest since I downloaded it last week. I had narrow down to specific regions since I wanted just a few data to test the pipeline with my data firs

jbaaijens · 2021-11-30T09:13:19Z

Ah so what you're using is not actually the full GISAID data, also not for Africa. These are Auspice files which are used for visualisation with Nextstrain (https://docs.nextstrain.org/projects/auspice/en/stable/) and they only have very few sequences. For building a good reference set you need the full GISAID database [GISAID -> EpiCoV -> Downloads -> Download packages -> FASTA (for the sequences) and metadata (for the metadata)]

jbaaijens · 2021-11-30T09:27:47Z

If it helps, I can build an Africa-specific reference set and share the sequences / sequence identifiers.

chirrie · 2021-11-30T09:35:07Z

Weird enough I can't see the download packages tab . GISAID -> EpiCoV -> Downloads -> The download tab takes me to below. I can not see the option download packages then FASTA (for the sequences) and metadata (for the metadata)]. After downloads, it shows a tab for Alignment and proteins, submission and variant stats and finally genomic epidemiology tab. I have share a screen snap to your email

…

On Tue, Nov 30, 2021 at 12:13 PM jbaaijens ***@***.***> wrote: Ah so what you're using is not actually the full GISAID data, also not for Africa. These are Auspice files which are used for visualisation with Nextstrain (https://docs.nextstrain.org/projects/auspice/en/stable/) and they only have very few sequences. For building a good reference set you need the full GISAID database [GISAID -> EpiCoV -> Downloads -> Download packages -> FASTA (for the sequences) and metadata (for the metadata)] — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#8 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZTQ2U3INQDXRZ5DYHCEOLUOSITTANCNFSM5I27YN2Q> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

chirrie · 2021-11-30T10:23:26Z

If it helps, I can build an Africa-specific reference set and share the sequences/sequence identifiers.

I will appreciate it if I can get this.

chirrie · 2021-12-01T07:58:41Z

I just downloaded the most recent GISAID data and the formatting hasn't changed. It seems the data you have shown above is actually older, it corresponds to the format I encountered ~9 months ago. So you could try the files in the manuscript folder for processing your data. However, I strongly recommend you to download the latest version of the full GISAID database and work with the scripts in pipeline.

I have tried downloading few sequences around 100 per lineage, but I am getting 0 sequences found from selection. Could please help out on this.

jbaaijens · 2021-12-01T09:01:22Z

Could you again post what your sequence identifiers look like in the fasta file?

jbaaijens · 2021-12-01T09:07:53Z

If it helps, I can build an Africa-specific reference set and share the sequences/sequence identifiers.

I will appreciate it if I can get this.

I will build it. Can you send me an email at j.a.baaijens[at]tudelft.nl?

chirrie · 2021-12-02T07:47:02Z

If it helps, I can build an Africa-specific reference set and share the sequences/sequence identifiers.

I will appreciate it if I can get this.

I will build it. Can you send me an email at j.a.baaijens[at]tudelft.nl?

I have sent you an email. Please have a look at it

chirrie · 2021-12-02T07:47:57Z

Could you again post what your sequence identifiers look like in the fasta file?

hCoV-19/Reunion/HCL021109894801/2021|EPI_ISL_2676670|2021-06-08

Dipti-IISERpune · 2022-06-02T12:59:45Z

I am facing the exact same issue. I couldn't get the desired details from GISAID, so I prepared the .tsv using details and accession IDs plus Fasta for those sequences. It gave me the above-mentioned error. Now, I am trying to use fasta and tsv for region Asia and country India from GISAID download section. Will it help me to run the code if I rearrange the .tsv as given in the example?

jbaaijens · 2023-09-21T12:25:06Z

Unfortunately the GISAID metadata headers have changed over time, so yes, it should be resolved by renaming columns in the metadata. You could also try VLQ-nf, a nextflow implementation of our pipeline: https://github.com/rki-mf1/vlq-nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0 sequences from selection found #8

0 sequences from selection found #8

chirrie commented Nov 26, 2021

jbaaijens commented Nov 26, 2021

chirrie commented Nov 27, 2021 •

edited

Loading

jbaaijens commented Nov 28, 2021

chirrie commented Nov 29, 2021

jbaaijens commented Nov 29, 2021

jbaaijens commented Nov 30, 2021

chirrie commented Nov 30, 2021

jbaaijens commented Nov 30, 2021 •

edited

Loading

chirrie commented Nov 30, 2021 •

edited

Loading

jbaaijens commented Nov 30, 2021

jbaaijens commented Nov 30, 2021

chirrie commented Nov 30, 2021 via email •

edited

Loading

chirrie commented Nov 30, 2021

chirrie commented Dec 1, 2021

jbaaijens commented Dec 1, 2021

jbaaijens commented Dec 1, 2021

chirrie commented Dec 2, 2021

chirrie commented Dec 2, 2021

Dipti-IISERpune commented Jun 2, 2022

jbaaijens commented Sep 21, 2023

0 sequences from selection found #8

0 sequences from selection found #8

Comments

chirrie commented Nov 26, 2021

jbaaijens commented Nov 26, 2021

chirrie commented Nov 27, 2021 • edited Loading

jbaaijens commented Nov 28, 2021

chirrie commented Nov 29, 2021

jbaaijens commented Nov 29, 2021

jbaaijens commented Nov 30, 2021

chirrie commented Nov 30, 2021

jbaaijens commented Nov 30, 2021 • edited Loading

chirrie commented Nov 30, 2021 • edited Loading

jbaaijens commented Nov 30, 2021

jbaaijens commented Nov 30, 2021

chirrie commented Nov 30, 2021 via email • edited Loading

chirrie commented Nov 30, 2021

chirrie commented Dec 1, 2021

jbaaijens commented Dec 1, 2021

jbaaijens commented Dec 1, 2021

chirrie commented Dec 2, 2021

chirrie commented Dec 2, 2021

Dipti-IISERpune commented Jun 2, 2022

jbaaijens commented Sep 21, 2023

chirrie commented Nov 27, 2021 •

edited

Loading

jbaaijens commented Nov 30, 2021 •

edited

Loading

chirrie commented Nov 30, 2021 •

edited

Loading

chirrie commented Nov 30, 2021 via email •

edited

Loading