Skip to content

Command line

Johan Nylander edited this page Feb 28, 2024 · 24 revisions

Please note that we provide two ParGenes scripts, depending on your hardware (e.g. personal computer or cluster). Read this section.

Minimum command line

Specify at least the directory containing the MSAs (-a), the output directory (-o), the number of cores (-c) and the type of the sequences (-d aa or nt).

In addition, you need to specify a model: either with modeltesting (-m), to automatically determine the best-fit model, or using raxml global parameter option (-r or -R).

example: python pargenes/pargenes.py -a msa_dir -o output_dir -c 32 -d nt -R "--model GTR"

ParGenes arguments

Unless stated otherwise, we strongly recommend you to use absolute paths when giving a file or directory location.

Common arguments

Command Meaning
-a, --alignments-dir Directory containing the input MSA files (fasta or phylip). ParGenes will try to parse all the files in this directory.
-o, --output-dir Output directory. If the directory does not exist, it ParGenes creates it. Else, ParGenes will abort unless you are running from a checkpoint (see also this page)
-c, --cores Number of cores allocated for this job. Please read this page
-d, --data-type Alignments type: nucleotides or amino acids. Possible values: {nt,aa}.
--dry-run Special mode to parse the MSA, and compute some statistics without running the analysis. In particular, outputs an estimation of the maximum number of cores that a user could assign to this job without losing parallel efficiency. See also this section
--continue Restart the analysis from the last checkpoint (see this page)

Arguments to customize RAxML-NG runs

Command Meaning
-r, --raxml-global-parameters Path to a file containing one single line with the arguments to pass to all the raxml runs. See also this section
-R, --raxml-global-parameters-string Alternative to --raxml-global-parameters: a quoted string with the arguments to pass to all the raxml runs. For instance: -R "--model GTR --brlen unscaled".
--per-msa-raxml-parameters Path to a file containing per-msa raxml parameters. See also this section
-s, --random-starting-trees Number of random starting trees
-p, --parsimony-starting-trees Number of parsimony starting trees
-b, --bs-trees Number of bootstrap trees to compute
--autoMRE autoMRE bootstrap convergence test. You need to specify the maximum number of boostraps with --bs-tree. Note that in this mode, ParGenes do NOT parallelize over the bootstraps, which might greatly affect parallel efficiency.
--raxml-binary Override raxml-ng binary location

Arguments to customize ModelTest-NG runs

Command Meaning
-m, --use-modeltest Autodetect the model with ModelTest-NG before running raxml
--modeltest-global-parameters A file containing the parameters to pass to ModelTest-NG. See also this section
--per-msa-modeltest-parameters A file containing per-msa modeltest parameters. See also this section
--modeltest-criteria {AICc,AIC,BIC} The criterion to use for best-fit model selection
--modeltest-perjob-cores Number of cores to assign to each modeltest core (at least 4)
--modeltest-binary Override modeltest-ng binary location

Arguments to customize ASTRAL/ASTER run

Command Meaning
--use-astral Run ASTRAL III at the end, to generate a species tree from all the gene trees inferred with ParGenes.
--astral-global-parameters Path to a file containing arguments to pass to Astral. See also this section.
--astral-jar Override ASTRAL jar location.
--use-aster Run ASTER (instead of ASTRAL) at the end, to generate a species tree from all the gene trees inferred with ParGenes.
--aster-bin <name or path> Name or path to programs astral, astral-hybrid, or astral-pro.
--aster-global-parameters <text file> Pass extra parameters to any of the chosen ASTER programs in a text file.

Advanced arguments

Command Meaning
--msa-filter Path to a file with a list of filenames to process. The file should not contain paths, but filenames. ParGenes will only process MSAs that are both present in the list and in the initial input directory.
--core-assignment {high,medium,low} Policy to decide the per-job number of cores (low favors a low per-job number of cores)
--percentage-jobs-double-cores Percentage (between 0 and 1) of jobs that will receive twice more cores

Subprograms global options

This paragraph applies to raxml, modeltest and astral. We take the example of raxml.

To specify some parameters to all raxml runs (for instance if you want to set the same model for all the MSAs), you can use --raxml-global-parameters <file>. The file should contain one unique line with all the raxml arguments you want to add.

Example of the content of this file: --model GTR --brlen unscaled

All raxml runs started with ParGenes will be called with these arguments.

The same applies to modeltest options (--modeltest-global-parameters), astral options (--astral-global-parameters), and aster options (--aster-global-parameters).

Per-msa subprograms options

This applies to raxml and modeltest. We take the example of raxml.

To apply different options to each MSA, please use the option--per-msa-raxml-parameters <file>. The file should contain one line per MSA for which you want to add options. Each line starts with the MSA file name (without its path!) followed with the arguments.

For instance:

msa1.fasta --model partition1.part 
msa2.fasta --model partition2.part 
msa3.fasta --model partition3.part 

The same applies to modeltest options (--per-msa-modeltest-parameters)

Overriding raxml-ng and modeltest-ng binary locations

If you want to use a custom version of raxml-ng or modeltest-ng (for instance if you already have them installed on your machine, or if you want to use a specific version), you can use --raxml-binary and --modeltest-binary options. In this case, you don't need to install all the dependencies ./install.sh but you have to run ./install_scheduler_only.

Please pay attention that if you use the MPI version of ParGenes pargenes-hpc.py, the binaries should be installed as libraries. In this case, when compiling your custom raxml-ng or modeltest-ng repository, please add the cmake flag -DBUILD_AS_LIBRARY=ON: cmake -DUSE_TERRAPHAST=OFF -DUSE_MPI=ON -DBUILD_AS_LIBRARY=ON .. If you use the non-MPI version of ParGenes, the binary is the normal executable.

examples:

python pargenes/pargenes.py -a msa_dir -o output_dir -c 32 -d nt -R "--model GTR" --raxml-binary /home/benoit/raxml-ng/bin/raxml-ng
python pargenes/pargenes-hpc.py -a msa_dir -o output_dir -c 512 -d nt -R "--model GTR" --raxml-binary /home/benoit/raxml-ng/bin/raxml-ng-mpi.so

If you built your own raxml-ng library, you can test it with:

cd tests # from the ParGenes repository root directory
./test_custom_raxml_library.sh path_to_your_library