Skip to content

Launching MSFragger

Sarah Haynes edited this page Jul 31, 2019 · 14 revisions

Run MSFragger in the command line

Windows users:

Ensure that you have placed MSFragger.jar in your working directory and have modified the parameter file to reference your protein database.

Determine the amount of system memory available that you would like to make available to MSFragger. This will be specified by the Java maximum heap size parameter -Xmx (e.g. -Xmx3700M for 3700 MB or -Xmx20G for 20 GB).

MSFragger takes the first argument as the input parameters file, followed by a list of one or more MS/MS data files:

java -Xmx20g -jar <path to msfragger.jar file> <path to fragger.params file> <path to mzML/.raw files>

The -Xmx flag is very important to ensure that MSFragger has access to sufficient memory to perform the search efficiently. We recommend that you allocate a minimum of 8G for standard tryptic digestions.

-Xmx20g specifies the maximum memory assigned to the Java virtual machine. In this example, the maximum value is 20 GB. This should be changed to suit your computer configuration.

Detailed command line options can be displayed with:

java -jar <path to msfragger.jar file>

Linux users:

A FragPipe-equivalent shell script can be found here.

Performance notes

Database splitting

When searching very large sequence databases, performing nonspecific searches, and/or specifying many variable modifications, it may be necessary to use the database splitting option in FragPipe. This option requires Python installation. If running using command-line, download a Python script and run MSFragger using the following command:

python3 <path to msfragger_pep_split.py file> <num> "java -Xmx20g -jar" <path to msfragger.jar file> <path to fragger.params file> <path to mzML/mzXML/MGF files>

Replacing <num> with the number (e.g. 4) of slices for database splitting, and also changing the maxium allowed memory as described above.

Batch processing

MSFragger allows multiple MS/MS input files to be processed in a batch. Passing multiple files to MSFragger at once allows MSFragger to reuse the fragment index for subsequent MS/MS run. This is particularly important for narrow window searches which may only take fractions of a second.

On computers or clusters with many processor cores, we highly recommended setting MSFragger to process files sequentially with all available processor cores rather than running multiple instances of MSFragger in parallel (assigning a smaller number of cores to each). This reduces initialization times and allows the fragment index to be re-used, at the same time reducing overall memory requirements.