-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on NGSpID output #24
Comments
|
Thanks. some follow-up |
|
I'm looking at the output from the pipeline and referring to text on the main page :
"The final cluster information is given in a tsv file final_clusters.tsv present in the specified output folder.
In the cluster TSV-file, the first column is the cluster ID and the second column is the read accession.
if there are n reads there will be n rows. Some reads might be singletons. The rows are ordered with respect to the size of the cluster (largest first)."
please do correct me if my understanding is lacking here..
so let's say my final_clusters.tsv has 31k rows. so that's 31k reads from the fastq.
the first column goes from 0-22, so there's 23 clusters. 0 and 1 has far more rows than the others. so more reads were used for those. the remaining rows are just sequencing run details.
supposedly each cluster is processed in its own medaka_cl_id_# folder. but I only see 17 folders. why not 23, one for each cluster? this sample has 9 singleton clusters.
another sample has 0-36 clusters, but 28 medaka folders, 12 of those clusters are singletons.
I assume that the first medaka folder in the directory corresponds to the first cluster (cluster 0). Is there a way to verify this?
or perhaps it is practical to parse the read number from the consensus.fasta headers?
does anyone have a loop handy to cat all the consensus.fastas from a single sample, and loop through multiple samples?
I guess im needing more explanation on how these clusters are designated. but its a very cool tool for sure!
The text was updated successfully, but these errors were encountered: