Query on handling of DUP event #5

Solyris83 · 2022-03-25T09:01:44Z

Hi, may I know how should I present the DUP events in my VCF files which is coming from a population based study? Say for example sample1 has annotated DUP of CN=1 and another sample2 has CN=3 for this stretch of DNA chr1:600342-600816, and the "normal" reference is annotated as CN=2 or hom_ref.

Does the length of the mutation, in this case, number of times the dup event happens affected the prediction?

kleinertp · 2022-03-25T10:19:42Z

Hi Solyris,

That is a very good question.
CADD-SV uses annotations in the human reference. So most of the features (like gene based metrics and max values of scores) will remain the same between different copy numbers of a duplication. So I suspect that the scores will not vary much. However, some features are affected like the amount of conserved sequence (will be double for instance in an additional copy number). So that might have a very strong effect.
However, loss of a duplication (CN=1 in sample1) might even best be queried as a deletion of sequence (chr1:600342-600816).
In other words: Length (and choice of SV type) might affect the prediction. As SVs and their effects can be very complex it really depends on the sequence. So I cannot recommend you what would be the appropriate choice in this specific example.

Solyris83 · 2022-03-28T02:00:57Z

Hi @kleinertp , thanks for the response. Not sure, if I understood you correctly, just to make sure I got you correctly

in CADD-SV prediction of SV's effect, DUP event with CN=3 of higher will not have any difference in what the value of CN is and the correct way to represent these event in the input to CADD-SV will be
chr1 600342 600816 DUP
I see, so these CN=1 (het deletion) or CN=0 (hom deletion) can be represented as below for input to CADD-SV
chr1 600342 600816 DEL

What we are concern about, is the way CNV can be represented to CADD-SV in the right way. In our case, our CNV caller calls DUP and DEL and annotates them with a copy number value, eg CN=[0,infinity) - theoretically, but I guess there would exist some upper limit in real biological system.

Do comment on the best way to represent these kinds of events as input to CADD-SV to maximise the predictive gains.

kleinertp · 2022-03-28T08:16:01Z

Hi Solyris,

Yes, Point 1 and 2 are correct.
CADD-SV relies on annotation from the human reference genome. When the duplication is not present in the reference, CADD-SV cannot make use of it. There is no option to tell CADD-SV the exact copy number of a region. I would proceed using these entries:

chr1 600342 600816 DEL
chr1 600342 600816 DUP

My comment:
CADD-SV prioritizes SVs, so you will have to manually inspect your results afterwards. So take the copy numbers into account once you received CADD-SV scores and annotations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query on handling of DUP event #5

Query on handling of DUP event #5

Solyris83 commented Mar 25, 2022

kleinertp commented Mar 25, 2022

Solyris83 commented Mar 28, 2022

kleinertp commented Mar 28, 2022

Query on handling of DUP event #5

Query on handling of DUP event #5

Comments

Solyris83 commented Mar 25, 2022

kleinertp commented Mar 25, 2022

Solyris83 commented Mar 28, 2022

kleinertp commented Mar 28, 2022