-
Notifications
You must be signed in to change notification settings - Fork 6
Command line
Please note that we provide two ParGenes scripts, depending on your hardware (e.g. personal computer or cluster). Read this section.
Specify at least the directory containing the MSAs (-a
), the output directory (-o
), the number of cores (-c
) and the type of the sequences (-d
aa
or nt
).
In addition, you need to specify a model: either with modeltesting (-m
), to automatically determine the best-fit model, or using raxml global parameter option (-r
or -R
).
example: python pargenes/pargenes.py -a msa_dir -o output_dir -c 32 -d nt -R "--model GTR"
Unless stated otherwise, we strongly recommend you to use absolute paths when giving a file or directory location.
Command | Meaning |
---|---|
-a , --alignments-dir
|
Directory containing the input MSA files (fasta or phylip). ParGenes will try to parse all the files in this directory. |
-o , --output-dir
|
Output directory. If the directory does not exist, it ParGenes creates it. Else, ParGenes will abort unless you are running from a checkpoint (see also this page) |
-c , --cores
|
Number of cores allocated for this job. Please read this page |
-d , --data-type
|
Alignments type: nucleotides or amino acids. Possible values: {nt ,aa }. |
--dry-run |
Special mode to parse the MSA, and compute some statistics without running the analysis. In particular, outputs an estimation of the maximum number of cores that a user could assign to this job without losing parallel efficiency. See also this section |
--continue |
Restart the analysis from the last checkpoint (see this page) |
Command | Meaning |
---|---|
-r , --raxml-global-parameters
|
Path to a file containing one single line with the arguments to pass to all the raxml runs. See also this section |
-R , --raxml-global-parameters-string
|
Alternative to --raxml-global-parameters : a quoted string with the arguments to pass to all the raxml runs. For instance: -R "--model GTR --brlen unscaled" . |
--per-msa-raxml-parameters |
Path to a file containing per-msa raxml parameters. See also this section |
-s , --random-starting-trees
|
Number of random starting trees |
-p , --parsimony-starting-trees
|
Number of parsimony starting trees |
-b , --bs-trees
|
Number of bootstrap trees to compute |
--autoMRE |
autoMRE bootstrap convergence test. You need to specify the maximum number of boostraps with --bs-tree . Note that in this mode, ParGenes do NOT parallelize over the bootstraps, which might greatly affect parallel efficiency. |
--raxml-binary |
Override raxml-ng binary location |
Command | Meaning |
---|---|
-m , --use-modeltest
|
Autodetect the model with ModelTest-NG before running raxml |
--modeltest-global-parameters |
A file containing the parameters to pass to ModelTest-NG. See also this section |
--per-msa-modeltest-parameters |
A file containing per-msa modeltest parameters. See also this section |
--modeltest-criteria {AICc,AIC,BIC} |
The criterion to use for best-fit model selection |
--modeltest-perjob-cores |
Number of cores to assign to each modeltest core (at least 4) |
--modeltest-binary |
Override modeltest-ng binary location |
Command | Meaning |
---|---|
--use-astral |
Run ASTRAL III at the end, to generate a species tree from all the gene trees inferred with ParGenes. |
--astral-global-parameters |
Path to a file containing arguments to pass to Astral. See also this section. |
--astral-jar |
Override ASTRAL jar location. |
--use-aster |
Run ASTER (instead of ASTRAL) at the end, to generate a species tree from all the gene trees inferred with ParGenes. |
--aster-bin <name or path> |
Name or path to programs astral , astral-hybrid , or astral-pro . |
--aster-global-parameters <text file> |
Pass extra parameters to any of the chosen ASTER programs in a text file. |
Command | Meaning |
---|---|
--msa-filter |
Path to a file with a list of filenames to process. The file should not contain paths, but filenames. ParGenes will only process MSAs that are both present in the list and in the initial input directory. |
--core-assignment {high,medium,low} |
Policy to decide the per-job number of cores (low favors a low per-job number of cores) |
--percentage-jobs-double-cores |
Percentage (between 0 and 1) of jobs that will receive twice more cores |
This paragraph applies to raxml, modeltest and astral. We take the example of raxml.
To specify some parameters to all raxml runs (for instance if you want to set the same model for all the MSAs), you can use --raxml-global-parameters <file>
. The file should contain one unique line with all the raxml arguments you want to add.
Example of the content of this file:
--model GTR --brlen unscaled
All raxml runs started with ParGenes will be called with these arguments.
The same applies to modeltest options (--modeltest-global-parameters
), astral options (--astral-global-parameters
), and aster options (--aster-global-parameters
).
This applies to raxml and modeltest. We take the example of raxml.
To apply different options to each MSA, please use the option--per-msa-raxml-parameters <file>
.
The file should contain one line per MSA for which you want to add options. Each line starts with the MSA file name (without its path!) followed with the arguments.
For instance:
msa1.fasta --model partition1.part
msa2.fasta --model partition2.part
msa3.fasta --model partition3.part
The same applies to modeltest options (--per-msa-modeltest-parameters
)
If you want to use a custom version of raxml-ng or modeltest-ng (for instance if you already have them installed on your machine, or if you want to use a specific version), you can use --raxml-binary
and --modeltest-binary
options. In this case, you don't need to install all the dependencies ./install.sh
but you have to run ./install_scheduler_only
.
Please pay attention that if you use the MPI version of ParGenes pargenes-hpc.py
, the binaries should be installed as libraries. In this case, when compiling your custom raxml-ng or modeltest-ng repository, please add the cmake flag -DBUILD_AS_LIBRARY=ON
:
cmake -DUSE_TERRAPHAST=OFF -DUSE_MPI=ON -DBUILD_AS_LIBRARY=ON ..
If you use the non-MPI version of ParGenes, the binary is the normal executable.
examples:
python pargenes/pargenes.py -a msa_dir -o output_dir -c 32 -d nt -R "--model GTR" --raxml-binary /home/benoit/raxml-ng/bin/raxml-ng
python pargenes/pargenes-hpc.py -a msa_dir -o output_dir -c 512 -d nt -R "--model GTR" --raxml-binary /home/benoit/raxml-ng/bin/raxml-ng-mpi.so
If you built your own raxml-ng library, you can test it with:
cd tests # from the ParGenes repository root directory
./test_custom_raxml_library.sh path_to_your_library