Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query on handling of DUP event #5

Open
Solyris83 opened this issue Mar 25, 2022 · 3 comments
Open

Query on handling of DUP event #5

Solyris83 opened this issue Mar 25, 2022 · 3 comments

Comments

@Solyris83
Copy link

Hi, may I know how should I present the DUP events in my VCF files which is coming from a population based study? Say for example sample1 has annotated DUP of CN=1 and another sample2 has CN=3 for this stretch of DNA chr1:600342-600816, and the "normal" reference is annotated as CN=2 or hom_ref.

Does the length of the mutation, in this case, number of times the dup event happens affected the prediction?

@kleinertp
Copy link
Collaborator

Hi Solyris,

That is a very good question.
CADD-SV uses annotations in the human reference. So most of the features (like gene based metrics and max values of scores) will remain the same between different copy numbers of a duplication. So I suspect that the scores will not vary much. However, some features are affected like the amount of conserved sequence (will be double for instance in an additional copy number). So that might have a very strong effect.
However, loss of a duplication (CN=1 in sample1) might even best be queried as a deletion of sequence (chr1:600342-600816).
In other words: Length (and choice of SV type) might affect the prediction. As SVs and their effects can be very complex it really depends on the sequence. So I cannot recommend you what would be the appropriate choice in this specific example.

@Solyris83
Copy link
Author

Hi @kleinertp , thanks for the response. Not sure, if I understood you correctly, just to make sure I got you correctly

  1. in CADD-SV prediction of SV's effect, DUP event with CN=3 of higher will not have any difference in what the value of CN is and the correct way to represent these event in the input to CADD-SV will be
    chr1 600342 600816 DUP

  2. I see, so these CN=1 (het deletion) or CN=0 (hom deletion) can be represented as below for input to CADD-SV
    chr1 600342 600816 DEL

What we are concern about, is the way CNV can be represented to CADD-SV in the right way. In our case, our CNV caller calls DUP and DEL and annotates them with a copy number value, eg CN=[0,infinity) - theoretically, but I guess there would exist some upper limit in real biological system.

Do comment on the best way to represent these kinds of events as input to CADD-SV to maximise the predictive gains.

@kleinertp
Copy link
Collaborator

Hi Solyris,

Yes, Point 1 and 2 are correct.
CADD-SV relies on annotation from the human reference genome. When the duplication is not present in the reference, CADD-SV cannot make use of it. There is no option to tell CADD-SV the exact copy number of a region. I would proceed using these entries:

chr1 600342 600816 DEL
chr1 600342 600816 DUP

My comment:
CADD-SV prioritizes SVs, so you will have to manually inspect your results afterwards. So take the copy numbers into account once you received CADD-SV scores and annotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants