diff --git a/.nojekyll b/.nojekyll
index 112f50c..80ab445 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-e2e7074b
\ No newline at end of file
+48aed234
\ No newline at end of file
diff --git a/TeachingModule/AnalysisMSData_FragPipe.html b/TeachingModule/AnalysisMSData_FragPipe.html
index 5db9a84..d569efc 100644
--- a/TeachingModule/AnalysisMSData_FragPipe.html
+++ b/TeachingModule/AnalysisMSData_FragPipe.html
@@ -20,6 +20,40 @@
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
vertical-align: middle;
}
+/* CSS for syntax highlighting */
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+ { counter-reset: source-line 0; }
+pre.numberSource code > span
+ { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+ { content: counter(source-line);
+ position: relative; left: -1em; text-align: right; vertical-align: baseline;
+ border: none; display: inline-block;
+ -webkit-touch-callout: none; -webkit-user-select: none;
+ -khtml-user-select: none; -moz-user-select: none;
+ -ms-user-select: none; user-select: none;
+ padding: 0 4px; width: 4em;
+ }
+pre.numberSource { margin-left: 3em; padding-left: 4px; }
+div.sourceCode
+ { }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
@@ -243,11 +277,11 @@
Download Data
We recommend to open this material inside Proteomics Sandbox to be able to copy & paste or download the file directly into the environment.
-After you save the list of URLs into a file named urls.txt
. you can use the following code in the terminal:
-wget -i urls.txt
-If you added your own private folder to the UCloud session, you could now move the data into that folder for better management of the data you’re working with.
-Next, we can launch FragPipe, which is located on the desktop. In this tutorial, we are using FragPipe version 22.0 in the October 2024 version of the Proteomics Sandbox Application.
-Now that we have launched FragPipe, we need to configure the settings prior to running the analysis. Therefore, we have provided some guiding questions to help you set up the settings in FragPipe:
+After saving the list of URLs to a file named urls.txt
, you can use the following command in the terminal. Make sure you are in the correct directory where urls.txt
is located before running the code below to ensure the file is found correctly:
+
+If you added your own private folder to the UCloud session, you can now move the data into that folder for better data management.
+Next, we can launch FragPipe, which is located on the desktop. In this tutorial, we are using FragPipe version 22.0 within the October 2024 version of the Proteomics Sandbox application, available here.
+Now that FragPipe is launched, we need to configure the settings before running the analysis. To assist you in setting up the settings in FragPipe, we have provided some guiding questions:
Getting started with FragPipe
@@ -264,17 +298,17 @@ Getting star
Some of the information you will need in this section can be found in Supplementary Information to the study. Open the Supplementary Information and go to page 25, Supplementary Methods.
-Go to the “Workflow” tab to set up the workflow for the analysis and import the data you have just downloaded.
+Go to the Workflow
tab to set up the workflow for the analysis and import the data you just downloaded.
Which workflow should you select? Hint: What labeling method was used in the study?
How does the labeling method affect data processing?
-Click “Load workflow” after you have found and selected the correct workflow to be used.
-Next, add your files by clicking on “Add files” and locate them in the designated folder for your raw files that you previously created. Assign each file to a separate experiment by clicking “Consecutive”.
-Go to the “Quant (Isobaric)” tab. Here, you need to provide annotations for TMT channels. Use the five pool annotations that you downloaded from this page. You will need to upload them to Ucloud and specify the corresponding annotation file for each experiement in order.
-Now you should relocate to the “Database” tab. Here you can either download or browse for an already preexisting database file. In this case, we will simply download the latest database file by clicking the “Download” button in FragPipe. Add contaminants and decoys.
+Click Load workflow
after you have selected the appropriate workflow.
+Next, add your files by clicking on Add files
and locating them in the designated folder for your raw files. Assign each file to a separate experiment by clicking Consecutive
.
+Go to the Quant (Isobaric)
tab. Here, you need to provide annotations for TMT channels. Use the five pool annotations that you downloaded from this page. You will need to upload them to UCloud and specify the corresponding annotation file for each experiment in order.
+Now, navigate to the Database
tab. Here you can either download a new database file or browse for an existing one. In this case, we will download the latest database file by clicking the Download
button in FragPipe. Be sure to add contaminants and decoys.
What is the purpose of the database file used in FragPipe, and why is it important?
diff --git a/search.json b/search.json
index a2c77ab..f7c4bae 100644
--- a/search.json
+++ b/search.json
@@ -53,7 +53,7 @@
"href": "teachingmodule.html#part-2-analysis-of-ms-data-using-fragpipe",
"title": "Breast Cancer Proteomics Module",
"section": "Part 2: Analysis of MS Data Using FragPipe",
- "text": "Part 2: Analysis of MS Data Using FragPipe\nIn this section of the teaching module, we will work with data from the paper. The first task is to download sample files from the paper, guided by the questions provided below:\n\nWhere can the data be found?\n\n\nWhat is the ProteomeXChange database?\n\n\nWhat accession code is used for the data deposited in ProteomeXChange?\n\nBy examining the accession code for the data deposited on ProteomeXChange, we can access and download the data using FTP.\n\nWhat is FTP, and what is its functionality?\n\nFor downloading the data, we will use the Proteomics Sandbox Application on UCloud. This platform allows us to access the necessary storage capacity as well as the computational power required to execute this process.\nThe Proteomics Sandbox Application is a virtual environment that includes multiple software tools, including FragPipe for analyzing proteomics data.\nYou can find the Proteomics Sandbox Application on UCloud here.\nFirst, we will download the data for the sample files to be used in FragPipe. Then, we will launch FragPipe to run the first analysis of the data. Before doing so, we have some questions regarding FragPipe and its usability:\n\nWhat is FragPipe, and what are its applications?\n\n\nIf FragPipe were not used for this part of the teaching module, which alternative software tools could be employed? Please provide a few examples.\n\n\nWhat are the benefits of using FragPipe?\n\nNow that we know what we want to do and why, it is time to start the Proteomics Sandbox application, or job. Simple analyses in FragPipe may only require 8 GB of RAM, while large-scale or complex analyses may require 24 GB of memory or more (FragPipe Documentation), which is why we will allocate 24 GB for this exercise.\nIn UCloud, the settings should look like this:\n\n\n\n\n\nBefore submitting the job, it is highly recommended to create a personal folder to securely store both your data and the results generated by FragPipe. Follow the step-by-step guide below for an effortless setup:\n\nFirst, click on the vibrant blue Add folder button.\nNext, select the exact directory you wish to mount, as illustrated below:\n\n\n\n\n\n\nUpon clicking, a window similar to the one below will appear. Here, you have the option to either create a specific folder within a particular drive in the workspace you’ve chosen or simply select the entire drive itself. In this example, the drive is labeled as Home and the workspace is My workspace.\n\n\n\n\n\n\n\n\n\n\n\nCaution\n\n\n\nMake sure to allocate the right number of hours before submitting the job. If the time runs out, the job will be canceled, and all progress will be lost. However, you can always extend the job duration if more time is required after submission.\nTime can pass quickly when working, so we recommend initially allocating 2 hours for the job. Now, we are ready to submit the job and launch the virtual environment of the Proteomics Sandbox Application.\n\n\n\nDownload Data from the Paper\nInitially, we will need to download the paper’s data. For this exercise, we will only use one sample file from each Plex Set/Pool.\nWe will use the terminal in the virtual environment for downloading the data.\nNow, we can access the FTP server where the data is located. You will need the server address from the correct FTP-server, which can be found on the site for the accession code PXD008841 in ProteomeXchange, previously visited. At the bottom of the page, you will find the FTP-server address where the data is stored.\n\nPlease locate the address.\n\nClick on the “Dataset FTP location” link.\n\nWe now have access to the data stored on the FTP server. Please provide a brief description of the contents of the folder on the FTP server.\n\nTo download one sample file from each of the Plex Sets, we will need these URLs only:\nhttps://storage.jpostdb.org/JPST000265/HJOS2U_20140410_TMTpool1_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOS2U_20140410_TMTpool2_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150318_TMT_pool3_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150322_TMT_pool4_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150329_TMT_pool5_300ugIPG37-49_7of15ul_fr01.raw\n\n\n(You can also download this list here).\n\n\n\n\n\n\nNote\n\n\n\nWe recommend to open this material inside Proteomics Sandbox to be able to copy & paste or download the file directly into the environment.\n\n\nAfter you save the list of URLs into a file named urls.txt. you can use the following code in the terminal:\nwget -i urls.txt\nIf you added your own private folder to the UCloud session, you could now move the data into that folder for better management of the data you’re working with.\nNext, we can launch FragPipe, which is located on the desktop. In this tutorial, we are using FragPipe version 22.0 in the October 2024 version of the Proteomics Sandbox Application.\nNow that we have launched FragPipe, we need to configure the settings prior to running the analysis. Therefore, we have provided some guiding questions to help you set up the settings in FragPipe:\n\n\nGetting started with FragPipe\n\n\n\n\n\n\nNote\n\n\n\nSome of the information you will need in this section can be found in Supplementary Information to the study. Open the Supplementary Information and go to page 25, Supplementary Methods.\n\n\nGo to the “Workflow” tab to set up the workflow for the analysis and import the data you have just downloaded.\n\nWhich workflow should you select? Hint: What labeling method was used in the study?\n\n\nHow does the labeling method affect data processing?\n\nClick “Load workflow” after you have found and selected the correct workflow to be used.\nNext, add your files by clicking on “Add files” and locate them in the designated folder for your raw files that you previously created. Assign each file to a separate experiment by clicking “Consecutive”.\nGo to the “Quant (Isobaric)” tab. Here, you need to provide annotations for TMT channels. Use the five pool annotations that you downloaded from this page. You will need to upload them to Ucloud and specify the corresponding annotation file for each experiement in order.\nNow you should relocate to the “Database” tab. Here you can either download or browse for an already preexisting database file. In this case, we will simply download the latest database file by clicking the “Download” button in FragPipe. Add contaminants and decoys.\n\nWhat is the purpose of the database file used in FragPipe, and why is it important?\n\n\nWhich organism should you choose when downloading the database file?\n\n\nDescribe the relationship between decoys and false discovery rate (FDR) by answering the following questions:\n\n\nWhat are decoys?\n\n\nWhy should you include decoys?\n\n\nWhat role do decoys play in estimating the FDR?\n\n\n\nNext, you can go to the “MSFragger” tab to adjust the parameter settings for the search and matching of the theoretical and experimental peptide spectra. The search parameters to be used are listed in Supplementary Methods.\n\nWhat parameters did you set?\n\nWhen all settings have been obtained, MSFragger should look something like this:\n\nWhat is MSFragger? What does it do?\n\n\nHow does MSFragger operate?\n\n\n\n\n\n\n\nNote\n\n\n\nYou can also skip configuring MSFragger manually and just use this parameter file. You will need to upload it to UCloud and then load it on the “MSFragger” tab in FragPipe.\n\n\nFinally, we can navigate to the “Run” tab and run the analysis. For that, we must choose an output directory for the results of the search made by FragPipe. Once you have adjusted that, you are ready to click on “Run”.\nThis process might take some time, so make sure that you still have enough hours allocated on your job on UCloud—otherwise, it will get terminated. Meanwhile, you can answer these questions:\n\nWhat are your expectations regarding the output results? Consider the implications of the number of files provided for this search in your response.\n\n\nCan the output from this analysis be reliably used for downstream applications given the limited number of sample files? Justify your answer.\n\n\nWhat does it signify that the sample tissues have been fractionated as described in Supplementary Information?\n\n\n\nOutline the fractionation process utilized.\n\n\nExplain the study design associated with this research.\n\n\nIn your opinion, will increasing the number of fractions improve proteome coverage? Justify your reasoning.\n\n\nWhen the run in FragPipe is done, please locate the output results and get an overview of the output.\n\nWhat types of output are generated by FragPipe?\n\nFor the downstream analysis, we will use the output from the list of combined proteins, which we will explore further in the following section.\n\n\nInterpretation and Analysis of FragPipe Results\nFor this part, we will use output files based on a run with FragPipe using all sample files (i.e., 5x72 raw files). That file can be downloaded here???\nNow, we will look at the output from FragPipe, where we will use the file named combined_proteins.tsv. Initially, we will explore the contents of the file locally. Therefore, you should download the file from UCloud and view it locally in a file editor such as Excel.\nYou can download the file by clicking on the file in your output directory in the UCloud interface, from where you can choose to download it.\n\nProvide a concise overview of the table’s contents. What information is represented in the rows and columns?\n\nFor the downstream analysis, we will use the columns containing the TMT intensities across the proteins identified.\nFor that we will use OmicsQ, which is a toolkit for quantitative proteomics. OmicsQ can be used to facilitate the processing of quantitative data from Omics type experiments. Additionally, it also serves as an entrypoint for using apps like PolySTest [SCHWAMMLE20201396] for statistical testing, VSClust for clustering and ComplexBrowser for the investigation of the behavior of protein complexes."
+ "text": "Part 2: Analysis of MS Data Using FragPipe\nIn this section of the teaching module, we will work with data from the paper. The first task is to download sample files from the paper, guided by the questions provided below:\n\nWhere can the data be found?\n\n\nWhat is the ProteomeXChange database?\n\n\nWhat accession code is used for the data deposited in ProteomeXChange?\n\nBy examining the accession code for the data deposited on ProteomeXChange, we can access and download the data using FTP.\n\nWhat is FTP, and what is its functionality?\n\nFor downloading the data, we will use the Proteomics Sandbox Application on UCloud. This platform allows us to access the necessary storage capacity as well as the computational power required to execute this process.\nThe Proteomics Sandbox Application is a virtual environment that includes multiple software tools, including FragPipe for analyzing proteomics data.\nYou can find the Proteomics Sandbox Application on UCloud here.\nFirst, we will download the data for the sample files to be used in FragPipe. Then, we will launch FragPipe to run the first analysis of the data. Before doing so, we have some questions regarding FragPipe and its usability:\n\nWhat is FragPipe, and what are its applications?\n\n\nIf FragPipe were not used for this part of the teaching module, which alternative software tools could be employed? Please provide a few examples.\n\n\nWhat are the benefits of using FragPipe?\n\nNow that we know what we want to do and why, it is time to start the Proteomics Sandbox application, or job. Simple analyses in FragPipe may only require 8 GB of RAM, while large-scale or complex analyses may require 24 GB of memory or more (FragPipe Documentation), which is why we will allocate 24 GB for this exercise.\nIn UCloud, the settings should look like this:\n\n\n\n\n\nBefore submitting the job, it is highly recommended to create a personal folder to securely store both your data and the results generated by FragPipe. Follow the step-by-step guide below for an effortless setup:\n\nFirst, click on the vibrant blue Add folder button.\nNext, select the exact directory you wish to mount, as illustrated below:\n\n\n\n\n\n\nUpon clicking, a window similar to the one below will appear. Here, you have the option to either create a specific folder within a particular drive in the workspace you’ve chosen or simply select the entire drive itself. In this example, the drive is labeled as Home and the workspace is My workspace.\n\n\n\n\n\n\n\n\n\n\n\nCaution\n\n\n\nMake sure to allocate the right number of hours before submitting the job. If the time runs out, the job will be canceled, and all progress will be lost. However, you can always extend the job duration if more time is required after submission.\nTime can pass quickly when working, so we recommend initially allocating 2 hours for the job. Now, we are ready to submit the job and launch the virtual environment of the Proteomics Sandbox Application.\n\n\n\nDownload Data from the Paper\nInitially, we will need to download the paper’s data. For this exercise, we will only use one sample file from each Plex Set/Pool.\nWe will use the terminal in the virtual environment for downloading the data.\nNow, we can access the FTP server where the data is located. You will need the server address from the correct FTP-server, which can be found on the site for the accession code PXD008841 in ProteomeXchange, previously visited. At the bottom of the page, you will find the FTP-server address where the data is stored.\n\nPlease locate the address.\n\nClick on the “Dataset FTP location” link.\n\nWe now have access to the data stored on the FTP server. Please provide a brief description of the contents of the folder on the FTP server.\n\nTo download one sample file from each of the Plex Sets, we will need these URLs only:\nhttps://storage.jpostdb.org/JPST000265/HJOS2U_20140410_TMTpool1_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOS2U_20140410_TMTpool2_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150318_TMT_pool3_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150322_TMT_pool4_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150329_TMT_pool5_300ugIPG37-49_7of15ul_fr01.raw\n\n\n(You can also download this list here).\n\n\n\n\n\n\nNote\n\n\n\nWe recommend to open this material inside Proteomics Sandbox to be able to copy & paste or download the file directly into the environment.\n\n\nAfter saving the list of URLs to a file named urls.txt, you can use the following command in the terminal. Make sure you are in the correct directory where urls.txt is located before running the code below to ensure the file is found correctly:\nwget -i urls.txt\nIf you added your own private folder to the UCloud session, you can now move the data into that folder for better data management.\nNext, we can launch FragPipe, which is located on the desktop. In this tutorial, we are using FragPipe version 22.0 within the October 2024 version of the Proteomics Sandbox application, available here.\nNow that FragPipe is launched, we need to configure the settings before running the analysis. To assist you in setting up the settings in FragPipe, we have provided some guiding questions:\n\n\nGetting started with FragPipe\n\n\n\n\n\n\nNote\n\n\n\nSome of the information you will need in this section can be found in Supplementary Information to the study. Open the Supplementary Information and go to page 25, Supplementary Methods.\n\n\nGo to the Workflow tab to set up the workflow for the analysis and import the data you just downloaded.\n\nWhich workflow should you select? Hint: What labeling method was used in the study?\n\n\nHow does the labeling method affect data processing?\n\nClick Load workflow after you have selected the appropriate workflow.\nNext, add your files by clicking on Add files and locating them in the designated folder for your raw files. Assign each file to a separate experiment by clicking Consecutive.\nGo to the Quant (Isobaric) tab. Here, you need to provide annotations for TMT channels. Use the five pool annotations that you downloaded from this page. You will need to upload them to UCloud and specify the corresponding annotation file for each experiment in order.\nNow, navigate to the Database tab. Here you can either download a new database file or browse for an existing one. In this case, we will download the latest database file by clicking the Download button in FragPipe. Be sure to add contaminants and decoys.\n\nWhat is the purpose of the database file used in FragPipe, and why is it important?\n\n\nWhich organism should you choose when downloading the database file?\n\n\nDescribe the relationship between decoys and false discovery rate (FDR) by answering the following questions:\n\n\nWhat are decoys?\n\n\nWhy should you include decoys?\n\n\nWhat role do decoys play in estimating the FDR?\n\n\n\nNext, you can go to the “MSFragger” tab to adjust the parameter settings for the search and matching of the theoretical and experimental peptide spectra. The search parameters to be used are listed in Supplementary Methods.\n\nWhat parameters did you set?\n\nWhen all settings have been obtained, MSFragger should look something like this:\n\nWhat is MSFragger? What does it do?\n\n\nHow does MSFragger operate?\n\n\n\n\n\n\n\nNote\n\n\n\nYou can also skip configuring MSFragger manually and just use this parameter file. You will need to upload it to UCloud and then load it on the “MSFragger” tab in FragPipe.\n\n\nFinally, we can navigate to the “Run” tab and run the analysis. For that, we must choose an output directory for the results of the search made by FragPipe. Once you have adjusted that, you are ready to click on “Run”.\nThis process might take some time, so make sure that you still have enough hours allocated on your job on UCloud—otherwise, it will get terminated. Meanwhile, you can answer these questions:\n\nWhat are your expectations regarding the output results? Consider the implications of the number of files provided for this search in your response.\n\n\nCan the output from this analysis be reliably used for downstream applications given the limited number of sample files? Justify your answer.\n\n\nWhat does it signify that the sample tissues have been fractionated as described in Supplementary Information?\n\n\n\nOutline the fractionation process utilized.\n\n\nExplain the study design associated with this research.\n\n\nIn your opinion, will increasing the number of fractions improve proteome coverage? Justify your reasoning.\n\n\nWhen the run in FragPipe is done, please locate the output results and get an overview of the output.\n\nWhat types of output are generated by FragPipe?\n\nFor the downstream analysis, we will use the output from the list of combined proteins, which we will explore further in the following section.\n\n\nInterpretation and Analysis of FragPipe Results\nFor this part, we will use output files based on a run with FragPipe using all sample files (i.e., 5x72 raw files). That file can be downloaded here???\nNow, we will look at the output from FragPipe, where we will use the file named combined_proteins.tsv. Initially, we will explore the contents of the file locally. Therefore, you should download the file from UCloud and view it locally in a file editor such as Excel.\nYou can download the file by clicking on the file in your output directory in the UCloud interface, from where you can choose to download it.\n\nProvide a concise overview of the table’s contents. What information is represented in the rows and columns?\n\nFor the downstream analysis, we will use the columns containing the TMT intensities across the proteins identified.\nFor that we will use OmicsQ, which is a toolkit for quantitative proteomics. OmicsQ can be used to facilitate the processing of quantitative data from Omics type experiments. Additionally, it also serves as an entrypoint for using apps like PolySTest [SCHWAMMLE20201396] for statistical testing, VSClust for clustering and ComplexBrowser for the investigation of the behavior of protein complexes."
},
{
"objectID": "teachingmodule.html#part-3-data-screening-multi-variate-analysis-and-clustering",
@@ -88,7 +88,7 @@
"href": "TeachingModule/AnalysisMSData_FragPipe.html",
"title": "Clinical Proteomics",
"section": "",
- "text": "In this section of the teaching module, we will work with data from the paper. The first task is to download sample files from the paper, guided by the questions provided below:\n\nWhere can the data be found?\n\n\nWhat is the ProteomeXChange database?\n\n\nWhat accession code is used for the data deposited in ProteomeXChange?\n\nBy examining the accession code for the data deposited on ProteomeXChange, we can access and download the data using FTP.\n\nWhat is FTP, and what is its functionality?\n\nFor downloading the data, we will use the Proteomics Sandbox Application on UCloud. This platform allows us to access the necessary storage capacity as well as the computational power required to execute this process.\nThe Proteomics Sandbox Application is a virtual environment that includes multiple software tools, including FragPipe for analyzing proteomics data.\nYou can find the Proteomics Sandbox Application on UCloud here.\nFirst, we will download the data for the sample files to be used in FragPipe. Then, we will launch FragPipe to run the first analysis of the data. Before doing so, we have some questions regarding FragPipe and its usability:\n\nWhat is FragPipe, and what are its applications?\n\n\nIf FragPipe were not used for this part of the teaching module, which alternative software tools could be employed? Please provide a few examples.\n\n\nWhat are the benefits of using FragPipe?\n\nNow that we know what we want to do and why, it is time to start the Proteomics Sandbox application, or job. Simple analyses in FragPipe may only require 8 GB of RAM, while large-scale or complex analyses may require 24 GB of memory or more (FragPipe Documentation), which is why we will allocate 24 GB for this exercise.\nIn UCloud, the settings should look like this:\n\n\n\n\n\nBefore submitting the job, it is highly recommended to create a personal folder to securely store both your data and the results generated by FragPipe. Follow the step-by-step guide below for an effortless setup:\n\nFirst, click on the vibrant blue Add folder button.\nNext, select the exact directory you wish to mount, as illustrated below:\n\n\n\n\n\n\nUpon clicking, a window similar to the one below will appear. Here, you have the option to either create a specific folder within a particular drive in the workspace you’ve chosen or simply select the entire drive itself. In this example, the drive is labeled as Home and the workspace is My workspace.\n\n\n\n\n\n\n\n\n\n\n\nCaution\n\n\n\nMake sure to allocate the right number of hours before submitting the job. If the time runs out, the job will be canceled, and all progress will be lost. However, you can always extend the job duration if more time is required after submission.\nTime can pass quickly when working, so we recommend initially allocating 2 hours for the job. Now, we are ready to submit the job and launch the virtual environment of the Proteomics Sandbox Application.\n\n\n\nDownload Data from the Paper\nInitially, we will need to download the paper’s data. For this exercise, we will only use one sample file from each Plex Set/Pool.\nWe will use the terminal in the virtual environment for downloading the data.\nNow, we can access the FTP server where the data is located. You will need the server address from the correct FTP-server, which can be found on the site for the accession code PXD008841 in ProteomeXchange, previously visited. At the bottom of the page, you will find the FTP-server address where the data is stored.\n\nPlease locate the address.\n\nClick on the “Dataset FTP location” link.\n\nWe now have access to the data stored on the FTP server. Please provide a brief description of the contents of the folder on the FTP server.\n\nTo download one sample file from each of the Plex Sets, we will need these URLs only:\nhttps://storage.jpostdb.org/JPST000265/HJOS2U_20140410_TMTpool1_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOS2U_20140410_TMTpool2_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150318_TMT_pool3_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150322_TMT_pool4_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150329_TMT_pool5_300ugIPG37-49_7of15ul_fr01.raw\n\n(You can also download this list here).\n\n\n\n\n\n\nNote\n\n\n\nWe recommend to open this material inside Proteomics Sandbox to be able to copy & paste or download the file directly into the environment.\n\n\nAfter you save the list of URLs into a file named urls.txt. you can use the following code in the terminal:\nwget -i urls.txt\nIf you added your own private folder to the UCloud session, you could now move the data into that folder for better management of the data you’re working with.\nNext, we can launch FragPipe, which is located on the desktop. In this tutorial, we are using FragPipe version 22.0 in the October 2024 version of the Proteomics Sandbox Application.\nNow that we have launched FragPipe, we need to configure the settings prior to running the analysis. Therefore, we have provided some guiding questions to help you set up the settings in FragPipe:\n\n\nGetting started with FragPipe\n\n\n\n\n\n\nNote\n\n\n\nSome of the information you will need in this section can be found in Supplementary Information to the study. Open the Supplementary Information and go to page 25, Supplementary Methods.\n\n\nGo to the “Workflow” tab to set up the workflow for the analysis and import the data you have just downloaded.\n\nWhich workflow should you select? Hint: What labeling method was used in the study?\n\n\nHow does the labeling method affect data processing?\n\nClick “Load workflow” after you have found and selected the correct workflow to be used.\nNext, add your files by clicking on “Add files” and locate them in the designated folder for your raw files that you previously created. Assign each file to a separate experiment by clicking “Consecutive”.\nGo to the “Quant (Isobaric)” tab. Here, you need to provide annotations for TMT channels. Use the five pool annotations that you downloaded from this page. You will need to upload them to Ucloud and specify the corresponding annotation file for each experiement in order.\nNow you should relocate to the “Database” tab. Here you can either download or browse for an already preexisting database file. In this case, we will simply download the latest database file by clicking the “Download” button in FragPipe. Add contaminants and decoys.\n\nWhat is the purpose of the database file used in FragPipe, and why is it important?\n\n\nWhich organism should you choose when downloading the database file?\n\n\nDescribe the relationship between decoys and false discovery rate (FDR) by answering the following questions:\n\n\nWhat are decoys?\n\n\nWhy should you include decoys?\n\n\nWhat role do decoys play in estimating the FDR?\n\n\n\nNext, you can go to the “MSFragger” tab to adjust the parameter settings for the search and matching of the theoretical and experimental peptide spectra. The search parameters to be used are listed in Supplementary Methods.\n\nWhat parameters did you set?\n\nWhen all settings have been obtained, MSFragger should look something like this:\n\nWhat is MSFragger? What does it do?\n\n\nHow does MSFragger operate?\n\n\n\n\n\n\n\nNote\n\n\n\nYou can also skip configuring MSFragger manually and just use this parameter file. You will need to upload it to UCloud and then load it on the “MSFragger” tab in FragPipe.\n\n\nFinally, we can navigate to the “Run” tab and run the analysis. For that, we must choose an output directory for the results of the search made by FragPipe. Once you have adjusted that, you are ready to click on “Run”.\nThis process might take some time, so make sure that you still have enough hours allocated on your job on UCloud—otherwise, it will get terminated. Meanwhile, you can answer these questions:\n\nWhat are your expectations regarding the output results? Consider the implications of the number of files provided for this search in your response.\n\n\nCan the output from this analysis be reliably used for downstream applications given the limited number of sample files? Justify your answer.\n\n\nWhat does it signify that the sample tissues have been fractionated as described in Supplementary Information?\n\n\n\nOutline the fractionation process utilized.\n\n\nExplain the study design associated with this research.\n\n\nIn your opinion, will increasing the number of fractions improve proteome coverage? Justify your reasoning.\n\n\nWhen the run in FragPipe is done, please locate the output results and get an overview of the output.\n\nWhat types of output are generated by FragPipe?\n\nFor the downstream analysis, we will use the output from the list of combined proteins, which we will explore further in the following section.\n\n\nInterpretation and Analysis of FragPipe Results\nFor this part, we will use output files based on a run with FragPipe using all sample files (i.e., 5x72 raw files). That file can be downloaded here???\nNow, we will look at the output from FragPipe, where we will use the file named combined_proteins.tsv. Initially, we will explore the contents of the file locally. Therefore, you should download the file from UCloud and view it locally in a file editor such as Excel.\nYou can download the file by clicking on the file in your output directory in the UCloud interface, from where you can choose to download it.\n\nProvide a concise overview of the table’s contents. What information is represented in the rows and columns?\n\nFor the downstream analysis, we will use the columns containing the TMT intensities across the proteins identified.\nFor that we will use OmicsQ, which is a toolkit for quantitative proteomics. OmicsQ can be used to facilitate the processing of quantitative data from Omics type experiments. Additionally, it also serves as an entrypoint for using apps like PolySTest [SCHWAMMLE20201396] for statistical testing, VSClust for clustering and ComplexBrowser for the investigation of the behavior of protein complexes."
+ "text": "In this section of the teaching module, we will work with data from the paper. The first task is to download sample files from the paper, guided by the questions provided below:\n\nWhere can the data be found?\n\n\nWhat is the ProteomeXChange database?\n\n\nWhat accession code is used for the data deposited in ProteomeXChange?\n\nBy examining the accession code for the data deposited on ProteomeXChange, we can access and download the data using FTP.\n\nWhat is FTP, and what is its functionality?\n\nFor downloading the data, we will use the Proteomics Sandbox Application on UCloud. This platform allows us to access the necessary storage capacity as well as the computational power required to execute this process.\nThe Proteomics Sandbox Application is a virtual environment that includes multiple software tools, including FragPipe for analyzing proteomics data.\nYou can find the Proteomics Sandbox Application on UCloud here.\nFirst, we will download the data for the sample files to be used in FragPipe. Then, we will launch FragPipe to run the first analysis of the data. Before doing so, we have some questions regarding FragPipe and its usability:\n\nWhat is FragPipe, and what are its applications?\n\n\nIf FragPipe were not used for this part of the teaching module, which alternative software tools could be employed? Please provide a few examples.\n\n\nWhat are the benefits of using FragPipe?\n\nNow that we know what we want to do and why, it is time to start the Proteomics Sandbox application, or job. Simple analyses in FragPipe may only require 8 GB of RAM, while large-scale or complex analyses may require 24 GB of memory or more (FragPipe Documentation), which is why we will allocate 24 GB for this exercise.\nIn UCloud, the settings should look like this:\n\n\n\n\n\nBefore submitting the job, it is highly recommended to create a personal folder to securely store both your data and the results generated by FragPipe. Follow the step-by-step guide below for an effortless setup:\n\nFirst, click on the vibrant blue Add folder button.\nNext, select the exact directory you wish to mount, as illustrated below:\n\n\n\n\n\n\nUpon clicking, a window similar to the one below will appear. Here, you have the option to either create a specific folder within a particular drive in the workspace you’ve chosen or simply select the entire drive itself. In this example, the drive is labeled as Home and the workspace is My workspace.\n\n\n\n\n\n\n\n\n\n\n\nCaution\n\n\n\nMake sure to allocate the right number of hours before submitting the job. If the time runs out, the job will be canceled, and all progress will be lost. However, you can always extend the job duration if more time is required after submission.\nTime can pass quickly when working, so we recommend initially allocating 2 hours for the job. Now, we are ready to submit the job and launch the virtual environment of the Proteomics Sandbox Application.\n\n\n\nDownload Data from the Paper\nInitially, we will need to download the paper’s data. For this exercise, we will only use one sample file from each Plex Set/Pool.\nWe will use the terminal in the virtual environment for downloading the data.\nNow, we can access the FTP server where the data is located. You will need the server address from the correct FTP-server, which can be found on the site for the accession code PXD008841 in ProteomeXchange, previously visited. At the bottom of the page, you will find the FTP-server address where the data is stored.\n\nPlease locate the address.\n\nClick on the “Dataset FTP location” link.\n\nWe now have access to the data stored on the FTP server. Please provide a brief description of the contents of the folder on the FTP server.\n\nTo download one sample file from each of the Plex Sets, we will need these URLs only:\nhttps://storage.jpostdb.org/JPST000265/HJOS2U_20140410_TMTpool1_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOS2U_20140410_TMTpool2_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150318_TMT_pool3_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150322_TMT_pool4_300ugIPG37-49_7of15ul_fr01.raw\nhttps://storage.jpostdb.org/JPST000265/HJOSLO2U_QEHF_20150329_TMT_pool5_300ugIPG37-49_7of15ul_fr01.raw\n\n(You can also download this list here).\n\n\n\n\n\n\nNote\n\n\n\nWe recommend to open this material inside Proteomics Sandbox to be able to copy & paste or download the file directly into the environment.\n\n\nAfter saving the list of URLs to a file named urls.txt, you can use the following command in the terminal. Make sure you are in the correct directory where urls.txt is located before running the code below to ensure the file is found correctly:\nwget -i urls.txt\nIf you added your own private folder to the UCloud session, you can now move the data into that folder for better data management.\nNext, we can launch FragPipe, which is located on the desktop. In this tutorial, we are using FragPipe version 22.0 within the October 2024 version of the Proteomics Sandbox application, available here.\nNow that FragPipe is launched, we need to configure the settings before running the analysis. To assist you in setting up the settings in FragPipe, we have provided some guiding questions:\n\n\nGetting started with FragPipe\n\n\n\n\n\n\nNote\n\n\n\nSome of the information you will need in this section can be found in Supplementary Information to the study. Open the Supplementary Information and go to page 25, Supplementary Methods.\n\n\nGo to the Workflow tab to set up the workflow for the analysis and import the data you just downloaded.\n\nWhich workflow should you select? Hint: What labeling method was used in the study?\n\n\nHow does the labeling method affect data processing?\n\nClick Load workflow after you have selected the appropriate workflow.\nNext, add your files by clicking on Add files and locating them in the designated folder for your raw files. Assign each file to a separate experiment by clicking Consecutive.\nGo to the Quant (Isobaric) tab. Here, you need to provide annotations for TMT channels. Use the five pool annotations that you downloaded from this page. You will need to upload them to UCloud and specify the corresponding annotation file for each experiment in order.\nNow, navigate to the Database tab. Here you can either download a new database file or browse for an existing one. In this case, we will download the latest database file by clicking the Download button in FragPipe. Be sure to add contaminants and decoys.\n\nWhat is the purpose of the database file used in FragPipe, and why is it important?\n\n\nWhich organism should you choose when downloading the database file?\n\n\nDescribe the relationship between decoys and false discovery rate (FDR) by answering the following questions:\n\n\nWhat are decoys?\n\n\nWhy should you include decoys?\n\n\nWhat role do decoys play in estimating the FDR?\n\n\n\nNext, you can go to the “MSFragger” tab to adjust the parameter settings for the search and matching of the theoretical and experimental peptide spectra. The search parameters to be used are listed in Supplementary Methods.\n\nWhat parameters did you set?\n\nWhen all settings have been obtained, MSFragger should look something like this:\n\nWhat is MSFragger? What does it do?\n\n\nHow does MSFragger operate?\n\n\n\n\n\n\n\nNote\n\n\n\nYou can also skip configuring MSFragger manually and just use this parameter file. You will need to upload it to UCloud and then load it on the “MSFragger” tab in FragPipe.\n\n\nFinally, we can navigate to the “Run” tab and run the analysis. For that, we must choose an output directory for the results of the search made by FragPipe. Once you have adjusted that, you are ready to click on “Run”.\nThis process might take some time, so make sure that you still have enough hours allocated on your job on UCloud—otherwise, it will get terminated. Meanwhile, you can answer these questions:\n\nWhat are your expectations regarding the output results? Consider the implications of the number of files provided for this search in your response.\n\n\nCan the output from this analysis be reliably used for downstream applications given the limited number of sample files? Justify your answer.\n\n\nWhat does it signify that the sample tissues have been fractionated as described in Supplementary Information?\n\n\n\nOutline the fractionation process utilized.\n\n\nExplain the study design associated with this research.\n\n\nIn your opinion, will increasing the number of fractions improve proteome coverage? Justify your reasoning.\n\n\nWhen the run in FragPipe is done, please locate the output results and get an overview of the output.\n\nWhat types of output are generated by FragPipe?\n\nFor the downstream analysis, we will use the output from the list of combined proteins, which we will explore further in the following section.\n\n\nInterpretation and Analysis of FragPipe Results\nFor this part, we will use output files based on a run with FragPipe using all sample files (i.e., 5x72 raw files). That file can be downloaded here???\nNow, we will look at the output from FragPipe, where we will use the file named combined_proteins.tsv. Initially, we will explore the contents of the file locally. Therefore, you should download the file from UCloud and view it locally in a file editor such as Excel.\nYou can download the file by clicking on the file in your output directory in the UCloud interface, from where you can choose to download it.\n\nProvide a concise overview of the table’s contents. What information is represented in the rows and columns?\n\nFor the downstream analysis, we will use the columns containing the TMT intensities across the proteins identified.\nFor that we will use OmicsQ, which is a toolkit for quantitative proteomics. OmicsQ can be used to facilitate the processing of quantitative data from Omics type experiments. Additionally, it also serves as an entrypoint for using apps like PolySTest [SCHWAMMLE20201396] for statistical testing, VSClust for clustering and ComplexBrowser for the investigation of the behavior of protein complexes."
},
{
"objectID": "TeachingModule/Preliminarywork.html",
diff --git a/sitemap.xml b/sitemap.xml
index b76c35a..eed6cd1 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,54 +2,54 @@
https://hds-sandbox.github.io/proteomics-sandbox/Create SDRF.html
- 2024-11-12T13:46:53.656Z
+ 2024-11-12T13:59:58.692Z
https://hds-sandbox.github.io/proteomics-sandbox/coursematerials.html
- 2024-11-12T13:46:53.656Z
+ 2024-11-12T13:59:58.696Z
https://hds-sandbox.github.io/proteomics-sandbox/contributors.html
- 2024-11-12T13:46:53.656Z
+ 2024-11-12T13:59:58.696Z
https://hds-sandbox.github.io/proteomics-sandbox/teachingmodule.html
- 2024-11-12T13:46:53.700Z
+ 2024-11-12T13:59:58.740Z
https://hds-sandbox.github.io/proteomics-sandbox/colabfold.html
- 2024-11-12T13:46:53.656Z
+ 2024-11-12T13:59:58.696Z
https://hds-sandbox.github.io/proteomics-sandbox/TeachingModule/AnalysisMSData_FragPipe.html
- 2024-11-12T13:46:53.656Z
+ 2024-11-12T13:59:58.692Z
https://hds-sandbox.github.io/proteomics-sandbox/TeachingModule/Preliminarywork.html
- 2024-11-12T13:46:53.656Z
+ 2024-11-12T13:59:58.696Z
https://hds-sandbox.github.io/proteomics-sandbox/TeachingModule/DataScreening_Multivariate.html
- 2024-11-12T13:46:53.656Z
+ 2024-11-12T13:59:58.696Z
https://hds-sandbox.github.io/proteomics-sandbox/sdrf.html
- 2024-11-12T13:46:53.700Z
+ 2024-11-12T13:59:58.740Z
https://hds-sandbox.github.io/proteomics-sandbox/fragpipe.html
- 2024-11-12T13:46:53.656Z
+ 2024-11-12T13:59:58.696Z
https://hds-sandbox.github.io/proteomics-sandbox/setup.html
- 2024-11-12T13:46:53.700Z
+ 2024-11-12T13:59:58.740Z
https://hds-sandbox.github.io/proteomics-sandbox/index.html
- 2024-11-12T13:46:53.696Z
+ 2024-11-12T13:59:58.732Z
https://hds-sandbox.github.io/proteomics-sandbox/gettingstarted.html
- 2024-11-12T13:46:53.656Z
+ 2024-11-12T13:59:58.696Z
diff --git a/teachingmodule.html b/teachingmodule.html
index 6480258..85f069f 100644
--- a/teachingmodule.html
+++ b/teachingmodule.html
@@ -22,6 +22,40 @@
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
vertical-align: middle;
}
+/* CSS for syntax highlighting */
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+ { counter-reset: source-line 0; }
+pre.numberSource code > span
+ { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+ { content: counter(source-line);
+ position: relative; left: -1em; text-align: right; vertical-align: baseline;
+ border: none; display: inline-block;
+ -webkit-touch-callout: none; -webkit-user-select: none;
+ -khtml-user-select: none; -moz-user-select: none;
+ -ms-user-select: none; user-select: none;
+ padding: 0 4px; width: 4em;
+ }
+pre.numberSource { margin-left: 3em; padding-left: 4px; }
+div.sourceCode
+ { }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
@@ -731,11 +765,11 @@ Download Data
We recommend to open this material inside Proteomics Sandbox to be able to copy & paste or download the file directly into the environment.
-After you save the list of URLs into a file named urls.txt
. you can use the following code in the terminal:
-wget -i urls.txt
-If you added your own private folder to the UCloud session, you could now move the data into that folder for better management of the data you’re working with.
-Next, we can launch FragPipe, which is located on the desktop. In this tutorial, we are using FragPipe version 22.0 in the October 2024 version of the Proteomics Sandbox Application.
-Now that we have launched FragPipe, we need to configure the settings prior to running the analysis. Therefore, we have provided some guiding questions to help you set up the settings in FragPipe:
+After saving the list of URLs to a file named urls.txt
, you can use the following command in the terminal. Make sure you are in the correct directory where urls.txt
is located before running the code below to ensure the file is found correctly:
+
+If you added your own private folder to the UCloud session, you can now move the data into that folder for better data management.
+Next, we can launch FragPipe, which is located on the desktop. In this tutorial, we are using FragPipe version 22.0 within the October 2024 version of the Proteomics Sandbox application, available here.
+Now that FragPipe is launched, we need to configure the settings before running the analysis. To assist you in setting up the settings in FragPipe, we have provided some guiding questions:
Getting started with FragPipe
@@ -752,17 +786,17 @@ Getting star
Some of the information you will need in this section can be found in Supplementary Information to the study. Open the Supplementary Information and go to page 25, Supplementary Methods.
-Go to the “Workflow” tab to set up the workflow for the analysis and import the data you have just downloaded.
+Go to the Workflow
tab to set up the workflow for the analysis and import the data you just downloaded.
Which workflow should you select? Hint: What labeling method was used in the study?
How does the labeling method affect data processing?
-Click “Load workflow” after you have found and selected the correct workflow to be used.
-Next, add your files by clicking on “Add files” and locate them in the designated folder for your raw files that you previously created. Assign each file to a separate experiment by clicking “Consecutive”.
-Go to the “Quant (Isobaric)” tab. Here, you need to provide annotations for TMT channels. Use the five pool annotations that you downloaded from this page. You will need to upload them to Ucloud and specify the corresponding annotation file for each experiement in order.
-Now you should relocate to the “Database” tab. Here you can either download or browse for an already preexisting database file. In this case, we will simply download the latest database file by clicking the “Download” button in FragPipe. Add contaminants and decoys.
+Click Load workflow
after you have selected the appropriate workflow.
+Next, add your files by clicking on Add files
and locating them in the designated folder for your raw files. Assign each file to a separate experiment by clicking Consecutive
.
+Go to the Quant (Isobaric)
tab. Here, you need to provide annotations for TMT channels. Use the five pool annotations that you downloaded from this page. You will need to upload them to UCloud and specify the corresponding annotation file for each experiment in order.
+Now, navigate to the Database
tab. Here you can either download a new database file or browse for an existing one. In this case, we will download the latest database file by clicking the Download
button in FragPipe. Be sure to add contaminants and decoys.
What is the purpose of the database file used in FragPipe, and why is it important?