Skip to content

SDRF to MaxQuant analysis

Marie Locard-Paulet edited this page Jul 8, 2021 · 2 revisions

Example for re-analysing sdrf-annotated data with MaxQuant

​ Here, we show how to re-analyse the proteomic standard data set (publication) using the annotations in the SDRF-file. The procedure can easily adapt to other datasets. ​

We used the following versionf of sdrf-pipelines: sdrf-pipelines (0.0.14) and [MaxQuant] (https://www.maxquant.org/) (1.6.10.43) ​ We recommend using Conda for the installation. ​

Data download

You need to download the SDRF file, a database that contains the yeast proteome and the UPS proteins (e.g. this one, and the raw data files from PRIDE. ​

Create MaxQuant parameter file

​ The following command adds the experimental design, file paths and available search parameters in the sdrf-file to a MaxQuant parameter file with default settings.

parse_sdrf convert-maxquant -s sdrf.tsv -f $PWD/yeast_UPS.fasta -r PATH_TO_RAW_FILES

Here, we assume that the files sdrf.tsv and yeast_UPS.fasta are located in the current folder. Do not forget to change PATH_TO_RAW_FILES accordingly. ​

Important: Always use absolute paths for the fasta file and the folder with the raw files, as MaxQuant can have issues with relative paths. You might need to change the $PWD function if you are in a Windows or a Mac environment. ​ You will get a MaxQuant parameter file named mqpar.xml

The resulted maqpar.xml starts with the following lines:

<?xml version="1.0" encoding="utf-8"?>
<MaxQuantParams xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<fastaFiles>
		<FastaFileInfo>
			<fastaFilePath>/home/veit/Test_sdrf_MQ/yeast_UPS.fasta</fastaFilePath>
			<identifierParseRule>&gt;([^\s]*)</identifierParseRule>
			<descriptionParseRule>&gt;(.*)</descriptionParseRule>
			<taxonomyParseRule></taxonomyParseRule>
			<variationParseRule></variationParseRule>
			<modificationParseRule></modificationParseRule>
			<taxonomyId></taxonomyId>
		</FastaFileInfo>
	</fastaFiles>
	<fastaFilesProteogenomics></fastaFilesProteogenomics>
	<fastaFilesFirstSearch></fastaFilesFirstSearch>
	<fixedSearchFolder></fixedSearchFolder>
	<andromedaCacheSize>350000</andromedaCacheSize>
	<advancedRatios>True</advancedRatios>
	<pvalThres>0.005</pvalThres>
	<neucodeRatioBasedQuantification>False</neucodeRatioBasedQuantification>
	<neucodeStabilizeLargeRatios>False</neucodeStabilizeLargeRatios>
	<rtShift>False</rtShift>
	<separateLfq>False</separateLfq>
	<lfqStabilizeLargeRatios>True</lfqStabilizeLargeRatios>

​ The mqpar.xml for the UPS example can be found here.

Note: Check the description of the sdrf-pipelines for further option like setting the temporary folder or the number of threads to accelerate the MaxQuant analysis ​

Run MaxQuant

The standard command-line procedure is:

maxquant mqpar.xml

​ Running the full UPS data set will take a while (hours to a day) depending on the computer. You will find the output files in a subfolder combined in the given directory of the raw files.