Skip to content

Examples

BenoitMorel edited this page Apr 7, 2021 · 5 revisions

The GeneRax repository contains several examples that illustrate different use cases. This page describes the examples provided in the GeneRax repository. Each example runs different parts of the pipeline and corresponds to a specific use case.

To run the examples, you need to get GeneRax from github, compile it, and run the examples from the root directory of the repository.

Gene tree correction and reconciliation

Run with: ./examples/gene_tree_correction/run_plants.sh

The script will call GeneRax as follow (paths are shortened): mpiexec -np 2 generax --families families_plants.txt --species-tree speciesTree.newick --rec-model UndatedDL --per-family-rates --prefix output --max-spr-radius 3

The family file must contain at least the alignments and substitution models for each family because they are required to compute the joint likelihood score. In this case, we already have gene trees inferred with raxml-ng, so we can add them to the family file (if they are not given by the user, the starting gene trees will be inferred from the alignments). In this example the gene-species mapping files are also required because GeneRax cannot infer the mappings from the gene names.

mpiexec -np 2 parallelizes with two cores. UndatedDL is used because we do not expect any horizontal gene transfers in this plant dataset. --per-family-rates indicates that each gene family will have its own DTL rates. --max-spr-radius 3 reduces the search space (the default radius is 5) to make the example faster (not recommended for real analyses!).

To visualize the inferred gene tree reconciled with the input species tree (for family Phy003AED5_CUCME), copy paste the content of the output file examples/gene_tree_correction/output/reconciliations/Phy003AED5_CUCME_reconciliated.xml into this online viewer. Note that this viewer is not developed by our lab.

Species tree inference with SpeciesRax

Run with: ./examples/species_tree_inference/run_speciesrax.sh

The script will call GeneRax as follow (paths are shortened): generax --families families_speciesrax.txt --strategy SKIP --si-strategy HYBRID --species-tree MiniNJ --rec-model UndatedDTL --per-family-rates --prune-species-tree --si-estimate-bl --si-quartet-support --prefix output

This time, we provide the input gene trees and the mapping files in the family file, but not the alignments and substitution models. (alternatively, one could provide the alignments and substitution models instead of the gene trees, and the gene trees will be inferred from the alignments).

--strategy SKIP skips the gene trees optimization (note that if you enable it, the gene tree optimization will be run AFTER the species tree inference, using the inferred species tree). --si-strategy HYBRID defines the search strategy for the species tree inference (HYBRID is the combination of local search and transfer-guided search presented in our preprint). --species-tree MiniNJ asks GeneRax to generate a starting species tree with our distance matrix method MiniNJ (alternatively, you can specify RANDOM to start from a random species tree, or you can specify the path to your own starting species tree). --prune-species-tree better accounts for missing data (check this page). --si-estimate-bl --si-quartet-support enables branch length estimation and quartet support computation (more information in our preprint). We recommend to use the model UndatedDTL even in absence of transfers.

The output species tree can be found in examples/species_tree_inference/output/species_trees/inferred_species_tree.newick.

Comparing two (or more) candidate species trees

IN CONSTRUCTION

With the alignments

Run with: ./examples/compare_two_species_trees/compare_joint_likelihood.sh

Without the alignments

Run with: ./examples/compare_two_species_trees/compare_speciesrax.sh

Clone this wiki locally