Skip to content

Command line

BenoitMorel edited this page Nov 11, 2018 · 24 revisions

Minimum command line

Specify at least the directory containing the MSAs (-a), the output directory (-o), the number of cores (-c) and the type of the sequences (-d aa or nt).

In addition, you need to specify a model: either with modeltesting (-m), to automatically determine the best-fit model, or using raxml global parameter option (-r or -R).

example: python pargenes/pargenes.py -a msa_dir -o output_dir -c 32 -d nt -R "--model GTR"

Global raxml (or modeltest) options

To specify some parameters to all raxml runs (for instance if you want to set the same model for all the MSAs), you can use --raxml-global-parameters <file>. The file should contain one unique line with all the raxml arguments you want to add.

Example of the content of this file: --model GTR --brlen unscaled

The same applies to modeltest options (--modeltest-global-parameters)

Per-msa raxml (or modeltest) options

To apply different options to each MSA, please use the option--per-msa-raxml-parameters <file>. The file should contain one line per MSA for which you want to add options. Each line starts with the MSA file name (without its path!) followed with the arguments.

For instance:

msa1.fasta --model partition1.part 
msa2.fasta --model partition2.part 
msa3.fasta --model partition3.part 

The same applies to modeltest options (--per-msa-modeltest-parameters)

ParGenes arguments

Unless stated otherwise, we strongly recommend you to use absolute paths when giving a file or directory location.

Common arguments

Command Meaning
-a, --alignments-dir Directory containing the input MSA files (fasta or phylip). ParGenes will try to parse all the files in this directory.
-o, --output-dir Output directory. If the directory does not exist, it ParGenes creates it. Else, ParGenes will abort unless you are running from a checkpoint (see --continue option)
-c, --cores Number of cores allocated for this job. Do not exceed the number of physical cores available. Should be at least 2.
-d, --data-type Alignments type: nucleotides or amino acids. Possible values: {nt,aa}.
--dry-run Special mode to parse the MSA, and compute some statistics without running the analysis. In particular, outputs an estimation of the maximum number of cores that a user could assign to this job without losing parallel efficiency. See also this section
--continue Restart the analysis from the last checkpoint. Apart from this argument and the number of cores (̀--cores), please avoid changing any of the program inputs. For instance, use this option when your previous run stopped because of a hardware breakdown or a reached wall-time limit.
--scheduler Defines the scheduling strategy. Possible values are: {split,onecore,openmp}. Please read this section
-r, --raxml-global-parameters Path to a file containing one single line with the arguments to pass to all the raxml runs. For instance, the file can contain: --model GTR --brlen unscaled
-R, --raxml-global-parameters-string Alternative to --raxml-global-parameters: a quoted string with the arguments to pass to all the raxml runs. For instance: -R "--model GTR --brlen unscaled".
-a, -- todo
-a, -- todo

Other arguments (todo: move the to the upper array)

usage: pargenes.py 
  
  --msa-filter MSA_FILTER
                        A file containing the names of the msa files to
                        process
  --core-assignment {high,medium,low}
                        Policy to decide the per-job number of cores (low
                        favors a low per-job number of cores)
  --per-msa-raxml-parameters PER_MSA_RAXML_PARAMETERS
                        A file containing per-msa raxml parameters
  -s RANDOM_STARTING_TREES, --random-starting-trees RANDOM_STARTING_TREES
                        The number of starting trees
  -p PARSIMONY_STARTING_TREES, --parsimony-starting-trees PARSIMONY_STARTING_TREES
                        The number of starting parsimony trees
  -b BOOTSTRAPS, --bs-trees BOOTSTRAPS
                        The number of bootstrap trees to compute
  --percentage-jobs-double-cores PERCENTAGE_JOBS_DOUBLE_CORES
                        Percentage (between 0 and 1) of jobs that will receive
                        twice more cores
  -m, --use-modeltest   Autodetect the model with modeltest
  --modeltest-global-parameters MODELTEST_GLOBAL_PARAMETERS
                        A file containing the parameters to pass to modeltest
  --per-msa-modeltest-parameters PER_MSA_MODELTEST_PARAMETERS
                        A file containing per-msa modeltest parameters
  --modeltest-criteria {AICc,AIC,BIC}
                        Alignments datatype
  --modeltest-perjob-cores MODELTEST_CORES
                        Number of cores to assign to each modeltest core (at
                        least 4)