Release Bookend v1.2.0: merge update · Gregor-Mendel-Institute/bookend

Feature addition for Bookend to implement bookend merge. This new utility lets you integrate one or more assemblies into a reference annotation, following gene and transcript naming conventions. Reference transcripts with a matching assembly will have their 5' and 3' ends updated, and they will be given evidence attributes that describe how many times they were assembled and in which samples.

Merge behavior:

Process transcripts in descending order of total genomic length
Merge assemblies first, applying no filters
- Combine the attributes of merged transcripts according to --attr_merge: sum or mean of expression values (TPM, cov, S.reads, E.reads)
- All assemblies classified as 'full_match' to another assembly will be combined into a single transcript model
Integrate merged assemblies with reference (decreasing length):
- Determine the class of each merged transcript vs. reference
- Name the transcript and add it to the reference list
  - 'full_match' retain the original transcript_id
  - 'exon_match' transcripts with 5' and/or 3' variation are named after the matching transcript_id with an extra suffix '_<count>'
  - Novel isoforms are given the gene_id with a suffix '.i<count>'
  - Novel antisense transcripts receive the '-AS' suffix
  - Intronic transcripts: '-IT' suffix
  - Intergenic transcripts are named 'BOOKEND_<count>'
Apply filters:
- Isoforms must have been found at least --rep_filter times
- The sum/max TPM must be at least --tpm_filter
- Multiply these filters by --high_conf for suspected artifacts (fragments and fusions)
- The spliced transcript length must be at least --min_len nucleotides
- The percentage of capped 5' signal must be at least --cap_percent

Bugfixes

Changes to bookend elr --sj_shift in v1.1 allowed malformed exons with zero or negative length.
Added --max_intron to utilities elr, assemble, and condense
assemble, condense and elr utilities now check for and discard malformed entries with negative exon lengths
bookend elr: terminal exons with noncanonical gaps are discarded
bookend elr: it is now possible to use all three sources of splice junction evidence together (--splice, --reference, --genome)
Summary log of bookend label no longer counts --discard_untrimmed reads in Total Output
bookend elr: refactored softclipping decision tree to better identify untrimmed 5' and 3' oligos
bookend label: now retreives UMIs from adapters in either forward or reverse orientation
bookend label: extended the maximum phred score from 40 to 60
bookend label: the UMI sequence can be comprised of IUPAC ambiguity characters other than N
bookend label: oligomer extensions (e.g. TTTT+) cannot exceed --max_end
bookend label: mismatches are no longer tolerated in the last 5nt of an oligomer
bookend label: best trim is now determined by closest sequence match, not by maximum trim length
bookend classify: now treats single-exon transcripts less than half the length of their matching transcript as a 'fragment'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bookend v1.2.0: merge update

Merge behavior:

Bugfixes