Skip to content

Advanced usage

Keiran Raine edited this page Dec 17, 2020 · 3 revisions

UTR and non-coding targetons

UTR regions and non-coding sequence are treated as if intronic sequence. Targetons that do not overlap any GTF/GFF2 feature won't be associated with a gene or transcript and their target regions will be considered intronic for the purposes of mutation generation. Mutators functions on such targetons will also work is the region is intronic sequence, unless they are CDS-specific functions (such as snvre), in which case they'll be rejected with a warning, example:

INFO:root:No transcript information found for region chr3:52401466-52401710.
INFO:root:No transcript information found for region chr3:52407284-52407528.
INFO:root:No transcript information found for region chr3:52402203-52402496.
CRITICAL:root:Invalid mutator 'snvre' for targeton!
CRITICAL:root:Failed to generate oligonucleotides!

When the transcript and gene ID cannot be determined the oligonucleotide name contains a placeholder string: NO_TRANSCRIPT

Using dinucleotide deletions to remove splice donor/acceptor sites

2del0 and 2del1 functions operate from upstream to downstream in target sequence.

Typically, a targeton will contain intronic sequence in r1 and r3 either side of r2 exon sequence. To remove the terminal two nucleotides from the r1 region, immediately flanking the r2 exon region the number of nucleotides in r1 needs to be considered. If r1 contains an even number of nucleotides then 2del0 will move sequentially through r1 deleting tandem nucleotides including the final two nucleotides immediately adjacent to r2. If r1 contains an odd number of nucleotides then the systemic removal of tandem nucleotides must be off-set by 1 nucleotide using 2del1 (the first nucleotide of r1 will not be removed by 2del1) to remove the final two nucleotides of r1.

In the typical case noted, removal of the first two nucleotides of r3 will remove the splice site; as the tandem deletion functions operate from upstream to downstream 2del0 should be used. If r3 contains an odd number of the nucleotides the final downstream nucleotide of r3 extensible region will not be removed by 2del0.

Tiling targetons over long exonic regions

When performing SGE over a long exon (where one targeton <300bp cannot cover all of the exonic sequence) targetons can be tiled over the exon.

For a typical example the terminal targetons will cover both flanking intron sequence and the start and end of the exon and internal targetons will cover only exon sequence. Targetons can be partially overlapped, inframe over CDS regions, to provide continuity of saturation coverage (see figure below). In such a case, targeton_1 contains an r2 and an r1 extension, targeton_2 contains only an r2 region and no extensible regions and targeton_3 contains an r2 and r3 extensible region. This will be defined in the targeton file action vector as:

targeton_1: (r1_mutators), (r2_ mutators), ()
targeton_2: (), (r2_ mutators), ()
targeton_3: (), (r2_ mutators), (r3_ mutators)

tiling example

_Tiling of targetons for SGE of long exon (>300nt). The exon and flanking sequence is shown in black. Targetons are shown in grey, region 1 and region 3 in blue and region 2 ranges in red, sgRNAs to perform SGE with each targeton are shown in yellow. Inframe overlapping sequence within CDS is highlighted. _