Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STRait Razor #6

Open
bioinformatic-list opened this issue Nov 29, 2023 · 20 comments
Open

STRait Razor #6

bioinformatic-list opened this issue Nov 29, 2023 · 20 comments

Comments

@bioinformatic-list
Copy link

I have another question about the Excel workbag for version V3, but I found that the link is invalid. Is there still this Excel workbag,thank.

@ExpectationsManaged
Copy link
Collaborator

Apologies for the dated hyperlink. The link is updated and functional (tested). Thanks

@bioinformatic-list
Copy link
Author

Thank you for your reply. Another question is whether Excel workbooks can be used in Linux systems, such as using the LibreOffice software

@ExpectationsManaged
Copy link
Collaborator

The control buttons use ActiveX for the Excel book and, thus, will not work on Linux systems. We've since realized an R Shiny application (https://github.com/ExpectationsManaged/STRaitRazorOnline/tree/master) which covers all of the functionality of the Excel workbook and is compatible with Linux OS. You might consider this alternative. Thanks and best of luck.

@bioinformatic-list
Copy link
Author

Thank you for your reply. I have downloaded STRait Razor online for use. Thank you very much

@bioinformatic-list
Copy link
Author

Hello, I always make errors when running the "batch" of STReat RazorOnline. How can I solve this problem?Looking forward to your reply.
2024-05-11 10-23-55屏幕截图

@ExpectationsManaged
Copy link
Collaborator

Which version of the app are you using?
E.g., version ID: 0.2.7

What amplification/library kit was used to generate the libraries prior to sequencing?

Thanks!

@bioinformatic-list
Copy link
Author

Thank you for your reply.Before sequencing, a custom gene locus amplification kit and STReat RazorOnline version 0.2.7 were used, and the file used was in fastq.gz format.

@bioinformatic-list
Copy link
Author

Hello, may I ask if STRait Razor can detect mitochondrial hypervariable regions and if the type of Configuration file should be written as M? Looking forward to your reply, thank you.

@Ahhgust
Copy link
Owner

Ahhgust commented Jun 20, 2024

Hmm. Good question! I've done some digging-- Jonathan is our config maintainer. However, he's on vacation until early July (so it'll be a few weeks before you will get an answer. apologies). My recollection is that we have something for mito, but I don't know if it's "stable" (ie, ready for public consumption).
We also wrote a tool that is alignment based (https://github.com/Ahhgust/Pycision). It's command-line only and will likely need a unix machine, but you can also use it in a way analogous to strait razor (but it is alignment-based, so you'd need genomic coordinates). And for some more shameless self-promotion, we also have a tool to remove (obvious, and only some) Numts (https://github.com/Ahhgust/RtN); again, it'll need a unix-like environment.
I hope this helps, and let me know if you want me to dig around more for the mito config file (again, I feel like we have it, but I am unsure how well it will perform).

@bioinformatic-list
Copy link
Author

Thank you for your reply. Yes, we may need to analyze the high variability region of mitochondria, so we would like to inquire about the input status of the mitochondrial configuration file.
Also, I have another question. Do we need to avoid SNP sites in the flanking sequence of the STRait Razor configuration file?

@bioinformatic-list
Copy link
Author

Hello, I recently encountered an issue while running STRait Razor. When I used a pair of flanking sequences, I only found a few dozen reads. However, when I replaced the flanking sequence with another pair, I found hundreds or even thousands of reads at this locus. Why is this situation happening? Do software algorithms have requirements for flank sequences?Looking forward to your reply,Thanks

@ExpectationsManaged
Copy link
Collaborator

ExpectationsManaged commented Aug 2, 2024

My apologies for not responding sooner. An early-access version of the configuration file for processing the poly-C stretch of HVS-I and HVS-II of the mitochondria is available under ~/db/configs. This file is labeled mitoCstretcher.config. It is not active in the UI as I have not been able to stress test the configuration file to verify it's robustness. So, any feedback you have on issues related to this file would be greatly appreciated.

As far as the differences observed with different anchor sequences, this is expected performance as more distal anchors generally have fewer reads covering these positions. Reduced performance may also be expected if the anchor sequence is not sufficiently unique (e.g., the duplicate primer of DYS389I/II will cause issue if not annotated properly as the sequence is present twice in the amplicon). In addition to lack of uniqueness, excessive noise (e.g., single-base errors, homopolymer errors) may make an anchor perform poorly. It is recommended that users assess alignment of loci during development of configuration files for suitability. This can be done manually using read visualization software (e.g., IGV) to ensure that not only depth but read lengths allow for coverage for the entirety of the anchors and the insert between the anchors all within the same read.

I hope this helps some with your questions. Thanks and sorry again for the delay.

@bioinformatic-list
Copy link
Author

Thank you very much for your reply. Your answer has helped me a lot. Thank you again.

@bioinformatic-list
Copy link
Author

Hello, I recently discovered an issue with the STRait Razor regarding the DSF399S1 locus. This locus is a multi copy locus, but its motifs at positions 22950282-22950381 and 24584039-24584125 on the Y chromosome are not the same, with offsets of 8 and 7. However, since the flanks at these two positions are the same, the results obtained using the flanks are somewhat problematic. I would like to ask, how should this locus be classified?

@ExpectationsManaged
Copy link
Collaborator

So, the issue of how to assign flanking offset is, potentially, cosmetic and not operational. The real question is how this is being amplified. Do you use one pair of primers to amplify both regions (e.g., DYS385). Or are the loci called separately? The purpose of the offset is to assign a LB-allele (or CE allele). So, that becomes the precedent. If you have up to two alleles on the CE, that observation is based on one of the two offset results. I would type a known sample (i.e., with CE data) and determine which of the offsets gives you concordant results. ALTERNATIVELY, if the CE kit uses different primers for each copy, I would use separate anchors and offsets and treat them as the CE does. I hope this helps.

@bioinformatic-list
Copy link
Author

For the DSF399S1 locus, a pair of primers should be used for amplification, but the problem is that the offset of the two copies is different. For example, if 7 is used as the offset, the typing result of DSF399S1a is incorrect. Similarly, if 8 is used as the offset, the typing result of DSF399S1b is also incorrect, and it will have an additional base. The motif of DSF399S1a is [GAAA] aAAGAAAAG [GAAA] b, and the motif of DSF399S1b is [GAAA]aAAGAAAA[GAAA]b. Among them, AAGAAAAG and AAGAAAA are not counted as duplicates, that is, DSF399S1a has one more G than DSF399S1b.

@bioinformatic-list
Copy link
Author

So, for this situation, if only the lateral wings are used for localization or search, is the typing result not very accurate? Can the software add the position of the gene locus for further search and localization? This is just my little idea, looking forward to your reply.

@ExpectationsManaged
Copy link
Collaborator

For the DSF399S1 locus, a pair of primers should be used for amplification, but the problem is that the offset of the two copies is different. For example, if 7 is used as the offset, the typing result of DSF399S1a is incorrect. Similarly, if 8 is used as the offset, the typing result of DSF399S1b is also incorrect, and it will have an additional base. The motif of DSF399S1a is [GAAA] aAAGAAAAG [GAAA] b, and the motif of DSF399S1b is [GAAA]aAAGAAAA[GAAA]b. Among them, AAGAAAAG and AAGAAAA are not counted as duplicates, that is, DSF399S1a has one more G than DSF399S1b.

If a single primer pair is used, the results you have are correct. The offset ONLY matters in comparison to CE data. If you're not comparing to CE, the offset does not matter. You simply report the string and, optionally, the bracketed motif. If you want to compare a length-based allele, you need to determine the "name" of the CE allele. You need a known DNA sample and a genotype to assess which of the two offsets is concordant. But only one of the two offsets is consistent with the CE.

@ExpectationsManaged
Copy link
Collaborator

So, for this situation, if only the lateral wings are used for localization or search, is the typing result not very accurate? Can the software add the position of the gene locus for further search and localization? This is just my little idea, looking forward to your reply.

If one set of primers is used for both loci, there is likely no "distinguishing feature" to allow for mapping. Instead you'd need more distal primers that were specific for each locus.

@bioinformatic-list
Copy link
Author

Okay, I roughly understand what you mean. Thank you for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants