Releases: vastgroup/vast-tools
v2.5.1
New VASTDB library release to improve PSI quantifications.
NEW
-
A new release of VASTDB libraries for all species (vastdb.sp.23.06.20).
- These new libraries only involve changes in two files (Sp_COMBI-M-50-gDNA.eff and Sp_COMBI-M-50-gDNA-SS.eff), which are used by
combine
to quantify PSIs for alternative exons through the "a posteriori" and "annotated" modules, as well as for ALTA and ALTD events. - In these new files, a subset of pre-built exon-exon junctions was identified as likely to result in spurious mappings and thus excluded from the PSI quantifications by setting their mappability to 0. Spurious exon-exon junctions were identified using large sets of RNA-seq data for each species based on their mapped read distribution along the junction (in such spurious junctions, mapped reads accumulate only in the first or last positions of the exon-exon junction).
- These new libraries are thus expected to reduce "skipping noisy" in PSI estimates, particularly for Hs2 and Mm2 (with nearly no effect for Hsa and Mmu).
- There is no need to rerun
align
to generate updated PSI estimates. Re-running onlycombine
on an existingalign
output is enough. - Different
vast-tools
versions and VASTDB libraries are retro- and forward- compatible. - To report version information in your Methods section, simply state the
vast-tools
and VASTDB library version you have used forcombine
(e.g. "we have used vast-tools v2.5.1 with the VASTDB library vastdb.hs2.23.06.20"). This information is stored in the VTS_LOG_commands.txt file.
- These new libraries only involve changes in two files (Sp_COMBI-M-50-gDNA.eff and Sp_COMBI-M-50-gDNA-SS.eff), which are used by
-
combine
has a new option (--add_version
) to add thevast-tools
version to the output INCLUSION file name (e.g. INCLUSION_LEVELS_FULL-hg38-4-v251.tab). This option is off by default.
Updates and fixes
-
These new libraries also include updated exon VastIDs for Hs2 and Mm2 (~500 and ~300, respectively) to more accurately match the lifted ones from Hsa and Mmu.
-
Default value for
--extra_eej
option incombine
has been set from 10 to 5.
v2.5.0
Correction for stack reads
NEW
combine
now performs a new correction for reads that disproportionately map to individual positions of exon-exon junctions. This change may affect PSIs of exons quantified through the COMBI ("a posteriori") and ANN ("annotation") sub-modules, and to ALT3 and ALT5 events, and it is expected to reduce false positive calls.
v2.4.2
Improvements in speed and performance.
Improvements
-
A batch of improvements has been implemented in
combine
to increase computing speed. First,combine
can take up to 6 cores to parallelize the generation of inclusion tables. Second, the quantification of pseudo-PSIs for ALTA and ALTD events (added in version v2.2.1, and used for the--min_ALT_use
option) has been simplified to reduce computing time. It currently uses more local quantification, as the ANN module. PSIs of individual alternative splice sites do not change. Finally, a few complex events were removed. -
align
includes a sanity check when paired reads are provided and it will die if the number of R1 and R2 reads are different. -
It is now possible to provide group names in
compare
(otherwise, they will still be automatically generated from the first replicate name).
Updates and fixes
- Silent change in
align
to use a more efficient hash system to store counts by position. This has seemingly given issues to a user with hg38.
v2.4.1
VASTDB version control and minor updates
NEW
-
align
andcombine
now print the VASTDB version that is used in the LOG file. The VASTDB file for each species now includes a file calledVASTDB_VERSION
which contains the version of the library (e.g. "vastdb.hs2.20.12.19"). If you had already downloaded version of VASTDB, you can simply add a file calledVASTDB_VERSION
into the VASTDB/Sp/ folder and add the version name on it. -
The help messages of
align
andcombine
now print the available species for vast-tools in the local VASTDB installation.
Updates and fixes
- For ALTA events, the alternative splice acceptors separated by "+" in the full coordinate (FullCO, 5th column in the INCLUSION table) are now sorted differently to make them consistent with the sorting of ALTD events. In both cases, the coordinates are now sorted from internal to external.
v2.4.0
Updates on species and assemblies.
NEW
-
All modules that require the species key to be provided (i.e.
align
,combine
,merge
with the --expr option, andcompare
with --GO option) now take the assembly (e.g. hg19, mm10, danRer10) as preferred input. For instance, instead of using -sp Hsa, it is possible and recommended to provide -sp hg19 instead. However, the 3-letter species key can still be provided as in previous versions.vast-tools
will still use the 3-letter species key internally, but the output tables will show the assembly instead of the species key. For instance, an inclusion table namedINCLUSION_LEVELS_FULL-Hsa5.tab
will now be namedINCLUSION_LEVELS_FULL-hg19-5.tab
. -
VASTDB libraries for new species and assemblies have been released (more information about them in README):
- Homo sapiens (hg38, Hs2).
- Mus musculus (mm10, Mm2).
- Bos taurus (bosTau6, Bta).
- Gallus gallus (galGal4, Gg4).
- Xenopus tropicalis (xenTro3, Xt1).
- Arabidopsis thaliana (araTha10, Ath).
-
In particular, newer versions for human and mouse are now available, with many more events. These have been built already in hg38 (internal species key "Hs2") and mm10 (internal species key "Mm2"), respectively. As with updates for other species, the EventIDs are lifted and maintained across versions.
-
The option -a in
combine
has been deprecated. This option lifted the hg19 and mm9 coordinates in the INCLUSION tables to hg38 and mm10. To avoid issues with the new assemblies, this option is now called-lift_coord
and it applies only to hg19 and mm9 (Hsa and Mmu). If used, the name of the INCLUSION table will be, e.g.INCLUSION_LEVELS_FULL-hg19-3-lifted_hg38.tab
. -
The installer script
install.R
has been updated to better accommodate the larger number of available species and assemblies (currently 17). -
PSI quantifications and the content of the files do NOT change in this version.
Updates and fixes
-
Correction in
compare_expr
of a bug due to which a few genes were missed in the GO backgrounds. -
Change in
tidy
: when using groups, the --min_SD is calculated for the two groups together, not each group individually.
v2.3.0
Changes in combine
(related to ANN exons) and compare
(related to ALTA and ALTD events). This release includes updated VASTDB files for all species.
NEW
compare
implements a different logic to define differentially used alternative donors (ALTD) and acceptors (ALTA). In previous versions, to make it more comparable to AltEx and IR events, only the donor/acceptor sites that resulted in inclusion of alternative sequence (i.e. length >= 0 or excluding the most internal [1/X] sites) were considered, and they could go up or down in the comparison. While this is useful for events involving only two alternative sites, it is not optimal for ALTA/ALTD involving more sites. Therefore,compare
now evaluates by default each alternative site independently and only reports those with increased usage. The previous mode can still be invoked by using the option--legacy_ALT
.
Updates and fixes
combine
includes some improvements and fixes with respect to ANN exons:
- A bug was corrected that reported incorrect C1 (reference upstream) and C2 (reference downstream) exon coordinates for ANN exons in the FullCo column, especially for genes in the positive strand. Given that
vast-tools
uses multiple neighboring exon-exon junctions for quantification of PSIs, this has little to no effect in the quantification of the vast majority of ANN exons (most of which have PSI ~ 100). Other events are not affected. - The way to define the closest local upstream donor and downstream acceptor for PSI quantification is slightly modified to make exons with and without associated ALTA/ALTD events more comparable. In addition, the default number of additional donors/acceptors to consider for PSI quantification (
--extra_eej
) was set to 10, instead of 5. - Some exons were deprecated in the current VASTDB files due to various new filters applied (more strict coordinate overlaps, overlap with ALTA and ALTD events and alternative polyAdenylation/start sites).
COMPATIBILITY NOTES: While it is recommended that the new VASTDB libraries are installed (vastdb.*.20.12.19.tar.gz), the old libraries give nearly identical results when used with vast-tools
v2.3.0. Similarly, old versions of combine
also give very similar results if run with the new VASTDB libraries. That is: different versions of combine
and VASTDB libraries are backward compatible. Older libraries are also accessible in the github webpage (README). Only the following files have changed for each species (Sp
) VASTDB folder: Sp.Event-Gene.IDs.txt, New_ID-Sp.txt.gz, Sp.ANNOT.Template.txt and Sp.FULL.Template.txt.gz, and
lftOvr_dict_from_hg19_to_hg38.pdat (Hsa) and lftOvr_dict_from_mm9_to_mm10.pdat (Mmu).
- The species key for chicken galGal3 in the new VASTDB library is
Gg3
(formerlyGga
).
v2.2.2
This is a minor update incorporating a new function in vast-tools
secondary modules.
NEW
-
compare
andtidy
have a new option,--noB3
, by which exons are filtered out if they contain 0 reads supporting the upstream or downstream inclusion and at least 15 supporting the other. These cases are usually due to alternative first or last exons that in some cases behave as true cassette exons. -
To allow the
--noB3
option, the fourth score for alternative exons (excluding those quantified by the microexon pipeline), quantifying the imbalance between inclusion read sets, now includes the B3 score (exons now considered B3 were previously considered B2). In addition, the third and fourth scores of other event types have been modified to provide more meaningful information (specifically, the number exon-exon junction reads). These format changes are silent for all modules, as this information is not used by any module. See README for further information about the scores.
Updates and fixes
- IR files run with
IR_version 1
with release v1 or previous were not compatible with the currentcombine
module.
v2.2.1
NEW
-
The version of
vast-tools
used by each module is now printed when each process starts. In addition, a log file (VTS_LOG_commands.txt
) is created in the output folder and registers version, date and main options used for eachvast-tools
run. -
compare
has a new option to remove Alt3 (ALTA) and Alt5 (ALTD) events with differential splice site usage if their overall impact in the transcript pool is predicted to be minor. Therefore,compare
now requires that the alternative splice sites belong to an exon with a minimum inclusion level (~PSI) across ALL compared samples. This minimum PSI is set to 25 by default, and can be modified using the--min_ALT_use
option. The equivalent value to previous versions is--min_ALT_use 0
.
v2.2.0
NEW
-
Five new species have been added:
-
European amphioxus,
Branchiostoma lanceolatun
(assembly Bl71nemr; species key:Bla
): http://vastdb.crg.eu/libs/vastdb.bla.01.12.18.tar.gz -
Centipede,
Strigamia maritima
(assembly Smar1; species key:Sma
): http://vastdb.crg.eu/libs/vastdb.sma.01.12.18.tar.gz -
Fruit fly,
Drosophila melanogaster
(assembly BDGP6; species key:Dme
): http://vastdb.crg.eu/libs/vastdb.dme.01.12.18.tar.gz -
Worm,
Caenorhabditis elegans
(assembly WBcel235; species keyCel
): http://vastdb.crg.eu/libs/vastdb.cel.01.12.18.tar.gz -
Sea anemone,
Nematostella vectensis
(assembly GCA_000209225; species keyNve
): http://vastdb.crg.eu/libs/vastdb.nve.01.12.18.tar.gzFurther information about the assemblies and species can be found in the source publication:
Torres-Méndez, A., Bonnal, S., Marquez, Y., Roth, J., Iglesias, M., Permanyer, J., Almudí, I., O’Hanlon, D., Guitart, T., Soller, M., Gingras, A.-C., Gebauer, F., Rentzsch, F., Blencowe, B.J.B., Valcárcel, J., Irimia, M. (2019). A novel protein domain in an ancestral splicing factor drove the evolution of neural microexons. Nature Ecol Evol, 3:691-701.
-
-
Added official Docker container image at: https://cloud.docker.com/u/vastgroup/repository/docker/vastgroup/vast-tools.
-
Included continuous integration tests with TravisCi: https://travis-ci.org/vastgroup/vast-tools.
-
tidy
allows filtering events by coverage using a groups config file. Using this option, the minimum number or fraction of samples with coverage (--min_N
or--min_Fr
) applies to EACH group of samples. -
combine
allows to create tables of normalized cRPKMs usingnormalizebetweenarrays
from thelimma
R package. -
compare_expr
provides a list of ranked genes sorted by fold change to be used directly in GSEA (with rnk.txt extension). -
compare
provides the list of all AS events with coverage, when the option--GO
is provided.
Updates and fixes
- The VASTDB libraries for zebrafish (
Dre
) and sea urchin (Spu
) have been updated (http://vastdb.crg.eu/libs/ vastdb.dre.01.12.18.tar.gz and http://vastdb.crg.eu/libs/vastdb.spu.01.12.18.tar.gz). A few new microexons are now included in the MIC module. - Quality control to make sure the expression files have the same number of genes when combining.
- Improvements in README and help messages.
- Various other minor fixes and improvements.
v2.1.3
NEW
-
compare
has a new option,--use_int_reads
, to increase the stringency when calling differentially regulated intron retention events. It requires that the average number of corrected intron body read counts of the group with the higher PIR is at least 0.4 times the average of the exon-intron junction reads (used to calculate PIR). This fraction can be modified using--fr_int_reads
. Intron body reads come from mapping reads to 200bp in the middle of the intron, or the whole intron when shorter. (This mapping was already implemented from the first release ofvast-tools
, but only used when doing the binomial test for the balance score [5th score in IR]. Therefore, there is no need to re-runalign
to have access to this feature, onlycombine
; see next). -
combine
has a few changes to include the corrected number of reads in the quality score for IR (3rd score). Also, the 4th score now has the corrected number of read for EI, IE and EE, not the raw counts.