-
Notifications
You must be signed in to change notification settings - Fork 11
Examples
The GeneRax repository contains several examples that illustrate different use cases. This page describes the examples provided in the GeneRax repository. Each example runs different parts of the pipeline and corresponds to a specific use case.
To run the examples, you need to get GeneRax from github, compile it, and run the examples from the root directory of the repository.
Run with:
./examples/gene_tree_correction/run_plants.sh
The script will call GeneRax as follow (paths are shortened):
mpiexec -np 2 generax --families families_plants.txt --species-tree speciesTree.newick --rec-model UndatedDL --per-family-rates --prefix output --max-spr-radius 3
The family file must contain at least the alignments and substitution models for each family because they are required to compute the joint likelihood score. In this case, we already have gene trees inferred with raxml-ng, so we can add them to the family file (if they are not given by the user, the starting gene trees will be inferred from the alignments). In this example the gene-species mapping files are also required because GeneRax cannot infer the mappings from the gene names.
mpiexec -np 2
parallelizes with two cores. UndatedDL is used because we do not expect any horizontal gene transfers in this plant dataset. --per-family-rates
indicates that each gene family will have its own DTL rates. --max-spr-radius 3
reduces the search space (the default radius is 5) to make the example faster (not recommended for real analyses!).
To visualize the inferred gene tree reconciled with the input species tree (for family Phy003AED5_CUCME), copy paste the content of the output file examples/gene_tree_correction/output/reconciliations/Phy003AED5_CUCME_reconciliated.xml
into this online viewer. Note that this viewer is not developed by our lab.
Run with:
./examples/species_tree_inference/run_speciesrax.sh
The script will call GeneRax as follow (paths are shortened):
generax --families families_speciesrax.txt --strategy SKIP --si-strategy HYBRID --species-tree MiniNJ --rec-model UndatedDTL --per-family-rates --prune-species-tree --si-estimate-bl --si-quartet-support --prefix output
This time, we provide the input gene trees and the mapping files in the family file, but not the alignments and substitution models. (alternatively, one could provide the alignments and substitution models instead of the gene trees, and the gene trees will be inferred from the alignments).
--strategy SKIP
skips the gene trees optimization (note that if you enable it, the gene tree optimization will be run AFTER the species tree inference, using the inferred species tree). --si-strategy HYBRID
defines the search strategy for the species tree inference (HYBRID is the combination of local search and transfer-guided search presented in our preprint). --species-tree MiniNJ
asks GeneRax to generate a starting species tree with our distance matrix method MiniNJ (alternatively, you can specify RANDOM to start from a random species tree, or you can specify the path to your own starting species tree). --prune-species-tree
better accounts for missing data (check this page). --si-estimate-bl --si-quartet-support
enables branch length estimation and quartet support computation (more information in our preprint). We recommend to use the model UndatedDTL even in absence of transfers.
The output species tree can be found in examples/species_tree_inference/output/species_trees/inferred_species_tree.newick
.
Run with:
./examples/compare_two_species_trees/compare_joint_likelihood.sh
This example is very similar to the gene tree correction example, but runs GeneRax twice, with two different species trees. Each run outputs the joint likelihood score of each species tree. The first species tree has the best joint likelihood (-28789.3
) and is thus more plausible than the second one (-28801.3
).
Run with:
./examples/compare_two_species_trees/compare_speciesrax.sh
This example is very similar to the species tree inference example. However, we now use --si-strategy EVAL
to disable the search and to compute the reconciliation likelihood of each of the candidate species tree. The first species tree has the best reconciliation likelihood (-138.366
) and is thus more plausible than the second one (-141.462
).
To compare two (or more) species trees, we recommend comparing them using the alignments (if available). The first example compares the joint likelihood of the species tree, which is more informative than the reconciliation likelihood used in the second example. In particular, the second method is more vulnerable to gene tree reconstruction error. However, the second method is much faster because it does not optimize the gene trees.