Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-write HIVSeqinR in python #13

Closed
CBeelen opened this issue Nov 30, 2022 · 17 comments
Closed

Re-write HIVSeqinR in python #13

CBeelen opened this issue Nov 30, 2022 · 17 comments
Assignees
Labels
wontfix This will not be worked on

Comments

@CBeelen
Copy link
Contributor

CBeelen commented Nov 30, 2022

Now that we are automatically launching the proviral pipeline as part of MiCall, it would make sense to re-write HIVSeqinR in python, or at least re-structure the existing R code. This would make the code a lot more maintainable for us, and would allow us to diagnose bugs more easily. Since the launch of the pipeline, we have encountered 2 errors that led to the pipeline failing, see #12.

Alternatively, re-visit replacing HIVSeqinR by another tool, as outlined in #10.

@Donaim Donaim self-assigned this Jun 5, 2023
@shengquan-Tang
Copy link

Sorry to disturb you. Has the python version been completed?Thank you.

@donkirkby
Copy link
Member

We're switching to HIVIntact instead, @shengquan-Tang. It's written in Python, and we've contributed a couple of pull requests.

@shengquan-Tang
Copy link

We're switching to HIVIntact instead, @shengquan-Tang. It's written in Python, and we've contributed a couple of pull requests.

Get it. Thank you for your reply. By the way, HIVIntact can distinguish intact and defective sequcences, but can't distinguish, such as, "LargeDeletion, nonHIV". Is it right? Thanks again!

@donkirkby
Copy link
Member

One of our pull requests adds the large deletion error, @shengquan-Tang. If you need that feature, you could use our fork of the project.

@shengquan-Tang
Copy link

I truly appreciate your assistance.
Best regards.

@Donaim
Copy link
Member

Donaim commented Feb 26, 2024

Hi @shengquan-Tang, also here is the repository of our HIVIntact fork: https://github.com/cfe-lab/HIVIntact

We have added almost all previous error codes (such as LongDeletion, NonHIV) into our version of HIVIntact, so that it is more compatible with HIVSeqinR. It has some new ones, too.

But there are still many differences between the two programs.
If you are going to use it, take a look at our documentation page https://cfe-lab.github.io/HIVIntact/

@shengquan-Tang
Copy link

Hi, @Donaim. Your response is really significant and thrilling to me. I'll learn how to use it. I appreciate your assistance.

@donkirkby
Copy link
Member

Closing this issue, because we're switching to HIVIntact.

@donkirkby donkirkby added the wontfix This will not be worked on label Feb 27, 2024
@Donaim
Copy link
Member

Donaim commented Feb 27, 2024

Superseeded by #10

@shengquan-Tang
Copy link

Hi, @donkirkby. Today, I intended to utilize HIVIntact, however, I encountered difficulties in accessing it. It remains unclear whether your have altered the pathway or discontinued its availability due to other reasons. Thanks for your help!
image

@Donaim
Copy link
Member

Donaim commented Apr 19, 2024

Hi @shengquan-Tang. The URL you linked to, is to the older, original version of HIVIntact. Please use the following repository: https://github.com/cfe-lab/CFEIntact . We will base our proviral pipeline on it.

So the command should be git clone https://github.com/cfe-lab/CFEIntact.

Pathways have changed, sorry for inconveniences caused by that.

@shengquan-Tang
Copy link

Thank you @Donaim. Yes, I have also tried https://github.com/cfe-lab/CFEIntact, it can be successfully installed. But it will show some errors whether it's my data or your test data. I don't know what went wrong.
image

image

image

@Donaim
Copy link
Member

Donaim commented Apr 22, 2024

Hi @shengquan-Tang . Sorry for this error, it's due to a difference in python version. I have fixed it today.

Please clone the repository again, to get the newest version of CFEIntact (v1.18.2), and then install it again.

Thank you for letting us know about this issue!

@shengquan-Tang
Copy link

Thanks @Donaim, I will try again!

@shengquan-Tang
Copy link

shengquan-Tang commented Apr 27, 2024

Hi, @Donaim. I ran the command "cfeintact check --subtype B test.fasta" which includes the HXB2 (K03455.1) reference sequence. However, I noticed that CFEIntact was unable to recognize it as an intact sequence. The HXB2 sequence was categorized under "error.csv" and had issues like "FrameshiftInOrf" and "DeletionInOrf" (as shown in the first picture). Interestingly, despite having 9719 matches (as seen in the second picture), this outcome seems quite unreasonable.
Moreover, when using CFEIntact with my sequencing data containing over ten thousand sequences, it failed to identify even a single intact sequence. In contrast, HIVSeqinR successfully identified over one thousand intact sequences with the same data.
Importantly, if CFEIntact cannot recognize intact HIV sequences accurately, I am concerned about whether other functions, such as ORFs analysis and check mentioned on https://cfe-lab.github.io/CFEIntact/workflow.html, can be relied upon for correct identification of HIV sequences.
I apologize for any inconvenience caused by these questions; however, your response is incredibly valuable to me and will assist us in identifying HIV sequencing data with greater ease and efficiency.

image

image

@Donaim
Copy link
Member

Donaim commented Apr 29, 2024

Hi @shengquan-Tang !

Thank you for your feedback.
I have already made improvements to CFEIntact based on this conversation, and will continue to do so. We hope you can benefit from CFEIntact as much as we do.

Addressing the issues you've brought up:

HXB2 intactness

The orignal HXB2 sequence is indeed defective.
One of the defects was reported by CFEIntact, as shown on your first screenshot.
That defect, and all the other ones, are documented here: https://www.hiv.lanl.gov/content/sequence/HIV/REVIEWS/HXB2.html
All of them are either a SNP or indel mutations, which is why the number of matches (9719) is so high.
Nevertheless, these defects are severe, and are likely to prevent the virus from replicating, so CFEIntact makes the right call here.

When CFEIntact performs it analysis, it relies on a modified version of HXB2 that does not contain any known defects.
We intentionally edited the original sequence when we have discovered that they sometimes influence subsequent analysis.

No intact sequences detected

CFEIntact's default configuration is rather strict.
All the checks that CFEIntact can perform, are performed by default, and if any of those check result in an error, it marks the input sequence defective.
One particular check that can cause many false positives is the sequence divergence check.
It's the only one that we disable for our proviral pipeline, and is also what we recommend doing in the documentation.
Please see if disabling that option produces less surprising results.

Reliability

Let me present some points that made us trust CFEIntact.

We've done multiple rounds of validations as we transitioned from HIVSeqinR.
One of the main validation steps was manual comparison, involving a human reviewer, of over 1000 subtype B sequences.
In conclusion, CFEIntact compared overwhelmingly positively compared to HIVSeqinR.
This means that CFEIntact was almost always giving the answer of our human reviewer, except for the cases where we agreed that the final decision is not obtainable with the available algorithms. We have not found a single case where HIVSeqinR has made an obviously better judgment than CFEIntact.

CFEIntact is reliable and stable. Its calls are backed by evidence-based studies. It follows software best practices, including static analysis, continuous integration, high test coverage, and is written in a safe and well-established programming language Python.

Still, if you find obviously wrong calls (like a discrepancy between the error code and the description in the documentation), do inform us and we'd be happy to investigate further.

More issues

As I have already mentioned, we benefit from user feedback.
And we believe it can likewise benefit other CFEIntact users who may face similar issues in the future.
To that end, could you please submit any future queries or issues directly to our CFEIntact's issues page? This way, other users can also easily find them.

@shengquan-Tang
Copy link

Thanks @Donaim .
The answer you provided is extremely valuable, and I am reviewing the information you mentioned.
I will submit queries or issues directly to CFEIntact's issues page if I have further puzzled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants