-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnboundLocalError: local variable 'polishing_pattern' referenced before assignment #20
Comments
Hi @gabyrech, Looks like you get several large clusters from the clustering step, so the It is strange to me that you after the clustering get the outcome:
Which suggests you have an older version. While there should not be any major updates lately, perhaps it would be good to use the latest version (v0.1.3) and remove the output directory for a fresh rerun.
|
Let me know how it goes, as my answer might not tackle the cause of the problem. |
Hi @ksahlin ! First let me explain a little bit more so maybe I can even ask you for advice :-). About the data: These are targeted ONT sequences with A LOT of repetitive sequences (simple repeats most of them). What I want to do is to obtain as many consensus sequences as possible, but avoiding clustering reads that actually don't came from the same genomic region (which is very hard because they share the repetitive sequence with other genomic regions). This is why I though that using high About your suggestions:
Any suggestion is very welcome! |
Okay, I see! What's the rough error rate of your reads? We do have IsoCon for a similar purpose if the reads have a relatively low error rate (say <5%). IsoCon assumes that reads are not to different in ends (i.e. roughly full-length over the targeted region). We have also developed isONcorrect that could correct the reads before clustering (e.g. with IsoCon). isONcorrect are sort of allele (SNP/indel) aware and is in general robust, but it's main purpose is not to preserve SNPs (esp low-abundant mutations) but to reduce errors in reads. We had one analysis with targeted gene families (but pacbio IsoSeq data) where we ran isONclust first and then ran IsoCon on each cluster individually. It worked for our data because isONclust first separated reads into different gene families, then IsoCon did a more fine-tuned separation of several alleles/transcripts from each gene. |
oh! that sounds even better! I think your approach (isONclust + IsoCon) using corrected reads might also work in our case, since we can say our data is something like having sequences from different gene families, but with the complexity added that they are full of simple tandem repeats. I will give it a try and let you know how it goes. |
Hi there!
I am trying to use NGSpeciesID with some custom parameters but I am not sure if maybe I am doing something wrong... here my command:
NGSpeciesID --consensus --t 24 --fastq fastq_pass.fastq.fq --outfolder out --k 15 --w 50 --min_shared 40 --mapped_threshold 0.99 --aligned_threshold 0.80
This is what I get:
I was suspecting that maybe I am too strict with the alignment thresholds, so I tried changing them a little bit (not too much) but I keep getting the same error.
Really appreciate any clue on what is going on...
Thanks!
Gabriel
The text was updated successfully, but these errors were encountered: