Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

primer pairs left after secondary amplicon QC #14

Open
Longhx1112 opened this issue Nov 1, 2021 · 3 comments
Open

primer pairs left after secondary amplicon QC #14

Longhx1112 opened this issue Nov 1, 2021 · 3 comments

Comments

@Longhx1112
Copy link

core genes: 2017
single copy core genes: 1836
Number of conserved sequences: 1812
species specific conserved sequences: 536
potential primer pair(s): 4578
primer pairs with good target binding: 4260
primer pairs left after non-target QC: 615
primer pairs left after secondary amplicon QC: 0
primer pairs left after mfold: 0
primer pairs left after primer QC: 0

Hi,
I have a question when using speciesprimer.
"primer pairs left after secondary amplicon QC" is zero, which parameters can be modified?
I have already tried "ignore_qc" and“skip_tree”, but it didn't work.
Looking forward to your reply, thank you very much!

@biologger
Copy link
Owner

Hi,
The ignore_qc and skip_tree options only affect the input quality control not the primer quality control.
The secondary amplicon check takes 10 input assemblies and uses MFEprimer to check if only one amplicon is created for each assembly with the primer pairs.
You can check the results in the /primerdesign/your_target/Pangenome/results/primer/primerQC/MFEprimer_assembly.csv file.
A less stringent selection can be achieved using a lower mfethreshold for the --mfethreshold option. Default is 90, you could try also 85 or 80, I would not recommend to go below 70.
If the MFEprimer_assembly.csv is empty there is probably a problem with the database.

@lanying
Copy link

lanying commented Nov 7, 2021

I want to know why just takes 10 input assemblies to check secondary amplicon check?

@biologger
Copy link
Owner

It is a matter of speed and computing power.
In cases where we have for example 500 input assemblies the MFEprimer database would get too large and the QC would take forever.
The pipeline selects the 10 assemblies according to the completeness: Complete Genomes > Chromosome > Scaffolds > Contigs

The number of assemblies could be changed by changing the speciesprimer.py script in the PrimerQualityControl class:

class PrimerQualityControl:
    def __init__(self, configuration):
         self.referencegenomes = 10 <-- change this number

To check more than 20 input assemblies I would recommend to split the assemblies in several DBs, to speed up the database indexing, however it will still take a lot of time.

Maybe an additional option to define the number of assemblies can be implemented in a future version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants