Absolute quantification of prokaryotes in the microbiome by 16S rRNA qPCR or ddPCR
This repo contains the following directories:
src
contains source codescripts
contains scriptsqpcr_specific_analysis.py
follows the steps in qPCR-specific analysis in the protocol to perform quality control and calculate 16S rRNA copies per reaction from qPCR dataddpcr_specific_analysis.py
follows the steps in ddPCR-specific analysis in the protocol to perform quality control and calculate 16S rRNA copies per reaction from ddPCR datauniversal_analysis.py
follows the steps in Universal analysis in the protocol to assess controls and calculate 16S rRNA copies per dry gram of input stool
artifacts
contains key general files that the scripts rely onexamples
contains examplesqpcr_v1
includes toy data input files, bash files to run the scripts, toy data example expected output, and a template Jupyter notebookddpcr_v1
includes toy data input files, bash files to run the scripts, toy data example expected output, and a template Jupyter notebook
manuscripts
contains data and code specific to associated manuscripts
Typical installation time: 5-10 minutes.
Clone the repository
git clone https://github.com/bhattlab/absolute-abundance-16s.git
Install dependencies
Install conda (https://developers.google.com/earth-engine/guides/python_install-conda/), if not already installed. This will provide Python.
Then execute the following commands.
pip install click
pip install matplotlib
pip install pandas
pip install scipy
pip install seaborn
pip install openpyxl
pip install jupyterlab
Note: Most versions of packages should work. For a list of specific package versions used during testing, refer to requirements_pinned.txt
. Tested on Linux Ubuntu 20.04.
Typical run time: less than 10 seconds.
Navigate to
cd absolute-abundance-16s/examples/qpcr_v1
Then make an output directory
mkdir output
Perform qPCR-specific analysis
bash qpcr_specific_analysis.sh
And then use the output from qPCR-specific analysis for universal analysis
bash universal_analysis_after_qpcr.sh
As each analysis script proceeds, it will output key information to the command line. The output files will appear in output
and can be compared to the output_expected
.
Navigate to
cd absolute-abundance-16s/examples/ddpcr_v1
Then make an output directory
mkdir output
Perform ddPCR-specific analysis
bash ddpcr_specific_analysis.sh
And then use the output from ddPCR-specific analysis for universal analysis
bash universal_analysis_after_ddpcr.sh
As each analysis script proceeds, it will output key information to the command line. The output files will appear in output
and can be compared to the output_expected
.
We highly recommend analyzing data interactively. One way to do this is with a notebook in Jupyter Lab.
To see a notebook version of either the qPCR or ddPCR example analysis, first navigate to absolute-abundance-16s
.
Then launch Jupyter Lab:
jupyter lab
And navigate to http://localhost:8888/lab
in your browser window.
Open the file examples/qpcr_v1/qpcr_notebook_template.ipynb
for qPCR (including universal analysis) or examples/ddpcr_v1/ddpcr_notebook_template.ipynb
for ddPCR (including universal analysis).
These are setup to view the example data, but you may duplicate the notebooks and edit filepaths for your own data. The input files (e.g. Prepare input files) and parameters (e.g. User adjustable parameters) are the same as in the scripts. The notebooks do not automatically save any outputs, but they can be edited to do so.
1. qPCR data Excel spreadsheet with two columns
column name | description |
Well | Well number on the 384-well plate, e.g. A1 through P24 |
Cq | Numerical value or Undetermined if a given well was empty or was a failed technical replicate |
2. qPCR layout Excel spreadsheet with seven columns
column name | description |
Well96 | Well number on the 96-well plate, e.g. A1 through H12 |
Name | Must be unique for each sample, control, and standard dilution. Should be NIST_A for component A of NIST etc. and NIST_mix_A_R for the NIST mixture from Reagent Setup. Other specific names do not matter. |
LiquidHandlerDilution | Numerical value that corresponds to the liquid handler or 96-well format dilution, e.g. 1000 for 1:1000 dilution. Should be 1 if no dilution is performed. |
SinglePipettorDilution | Numerical value that corresponds to the single pipettor dilution, e.g. 1000 for 1:1000 dilution. Should be 1 if no dilution is performed. |
Type | Options in qPCR are "PCRPos" (e.g. for NIST), "PCRNeg" (e.g. for no template control), "DNAPos" (e.g. for Zymo mock), "DNANeg" (e.g. for extraction from water), "Pvul" (e.g. for P. vulgatus standard curve dilution points), "Fpra" (e.g. for F. prausnitzii standard curve dilution points), and "Sample" (e.g. for all samples). |
uLAdded | The number of diluted uL of sample, control, or standard added to the reaction. This value is 6 uL in our protocol. |
ElutionVolume | The elution volume (uL) in the final step of DNA extraction. Must be provided for Type=DNANeg, DNAPos, or Sample. This value is 100 uL in our protocol. |
1. ddPCR data Excel spreadsheet with four columns
column name | description |
Well | Well number on the 96-well plate, e.g. A01 through H12 |
Accepted Droplets | The number of accepted droplets, e.g. the sum of positive droplets and negative droplets |
Positives | The number of positive droplets, e.g. droplets above the threshold set in QX Manager |
Negatives | The number of negative droplets, e.g. droplets below the threshold set in QX Manager |
2. ddPCR layout Excel spreadsheet with seven columns
column name | description |
Well96 | Well number on the 96-well plate, e.g. A1 through H12 |
Name | Must be unique for each sample and control. Should be NIST_A for component A of NIST etc. and NIST_mix_A_R for the NIST mixture from Reagent Setup. Other specific names do not matter. |
LiquidHandlerDilution | Numerical value that corresponds to the liquid handler or 96-well format dilution, e.g. 1000 for 1:1000 dilution. Should be 1 if no dilution is performed. |
SinglePipettorDilution | Numerical value that corresponds to the single pipettor dilution, e.g. 1000 for 1:1000 dilution. Should be 1 if no dilution is performed. |
Type | Options in ddPCR are "PCRPos" (e.g. for NIST), "PCRNeg" (e.g. for no template control), "DNAPos" (e.g. for Zymo mock), "DNANeg" (e.g. for extraction from water), and "Sample" (e.g. for all samples). |
uLAdded | The number of diluted uL of sample, control, or standard added to the reaction. This value is 6 uL in our protocol. |
ElutionVolume | The elution volume (uL) in the final step of DNA extraction. Must be provided for Type=DNANeg, DNAPos, or Sample. This value is 100 uL in our protocol. |
1. The output from qPCR-specific or ddPCR-specific analysis, specifically the "for_universal_analysis" sheet
2. Weights Excel spreadsheet with six columns
column name | description |
Name | Must correspond to the Name from the qPCR or ddPCR layout file |
empty_wt | The mass (g) of the empty tube used for drying to measure moisture content |
filled_wt | The mass (g) of the tube used for drying with the wet stool for drying in it |
dry_wt | The mass (g) of the tube used for drying with the now dry stool from drying in it |
empty_PB | The mass (g) of the empty PowerBead tube for DNA extraction |
filled_PB | The mass (g) of the PowerBead tube with the wet stool for DNA extraction in it |
Typical run time: less than 10 seconds.
Create a new directory for the prepared input files from the previous step and create another directory for the script output.
Edit and execute the following command for qPCR-specific analysis to perform quality control and calculate 16S rRNA copies per reaction.
python full_path_to/scripts/qpcr_specific_analysis.py -q full_path_to/qpcr_data.xlsx -qs Sheet1 -l full_path_to/qpcr_layout.xlsx -ls Sheet1 -f full_path_to/artifacts/format_conversion.tsv -o outputfolder --pcopy insertpcopy --fcopy insertfcopy
The script will output key information to the command line and files to the specified output directory. The qPCR-specific analysis script saves plots of the standard curve after each step:
filename | description |
standard_curve_visualization_post_step_76.pdf | Initial visualization |
standard_curve_visualization_post_step_77.pdf | After removing standard curve dilution points with a large technical replicate span |
standard_curve_visualization_post_step_78.pdf | After removing concentrated dilution points that plateau |
standard_curve_visualization_post_step_79.pdf | After removing dilute dilution points too near the limit of blank |
standard_curve_visualization_final_post_step_81.pdf | The resulting dilution points and final regression line |
At the end, the qPCR-specific analysis script outputs an Excel spreadsheet qpcr_specific_output.xlsx
with the following sheets:
sheet name | description |
for_universal_analysis | All samples and controls that passed quality control |
removed_wells_tech_reps_fails | All wells removed because they do not have at least two successful technical replicates |
removed_standards | Standard curve dilution points removed due to a large technical replicate span, being too concentrated, or being too dilute |
removed_high_variation_samples | Samples and controls removed due to technical replicate variation |
removed_samples | Samples and controls removed due to being too concentrated, too dilute, or low confidence but not undiluted |
After reviewing these, edit and execute the following command for universal analysis to assess controls and calculate 16S rRNA copies per dry gram of input stool.
python full_path_to/scripts/universal_analysis.py -d outputfolder/qpcr_specific_output.xlsx -ds for_universal_analysis -w full_path_to/weights.xlsx -ws Sheet1 -n full_path_to/artifacts/nist_expected_values_03262024.xlsx -ns Sheet1 -o outputfolder
The universal analysis script outputs an Excel spreadsheet univ_analysis_output.xlsx
with the following sheets:
sheet name | description |
from_universal_analysis | All samples |
nist_measured_expected | NIST positive PCR controls measured to expected |
neg_dna_extract_controls | Negative DNA extraction controls |
pos_dna_extract_controls | Positive DNA extraction controls |
extract_input_outside_range | List of samples with DNA extraction input outside of desired range. |
drying_input_outside_range | List of samples with drying aliquot (e.g. for stool moisture content) outside of desired range. |
dry_stool_amount_low | List of samples with a small mass of dried stool after drying, which increases the error |
water_fraction_over_cutoff | List of samples with a water fraction over the cutoff, which similarly indicates increased error |
Note: unlike in qPCR- and ddPCR-specific analysis, the samples identified in the latter four sheets of universal analysis are not removed. It is up to the user to assess the situation on a case-by-case basis.
Edit and execute the following command for ddPCR-specific analysis to perform quality control and calculate 16S rRNA copies per reaction.
python full_path_to/scripts/ddpcr_specific_analysis.py -d full_path_to/ddpcr_data.xlsx -ds Sheet1 -l full_path_to/ddpcr_layout.xlsx -ls Sheet1 -o outputfolder
The script will output key information to the command line and files to the specified output directory. At the end, the ddPCR-specific analysis script outputs an Excel spreadsheet ddpcr_specific_output.xlsx
with the following sheets:
sheet name | description |
for_universal_analysis | All samples and controls that passed quality control |
too_few_droplets | Wells with a low number of droplets |
too_concentrated | Samples and controls removed due to being too concentrated (e.g. need more dilution) |
too_dilute | Samples and controls removed due to being too dilute (e.g. need less dilution) |
After reviewing these, edit and execute the following command for universal analysis to assess controls and calculate 16S rRNA copies per dry gram of input stool.
python full_path_to/scripts/universal_analysis.py -d outputfolder/ddpcr_specific_output.xlsx -ds for_universal_analysis -w full_path_to/weights.xlsx -ws Sheet1 -n full_path_to/artifacts/nist_expected_values_03262024.xlsx -ns Sheet1 -o outputfolder
See the qPCR section for information on the universal analysis script output.
The scripts provide default values for all parameters, which align with the recommendations provided in the protocol. However, if the user would like to modify parameters in a particular case, a full list of parameters that can be passed on the command line and their descriptions can be found in the help text of each script. Use the following commands to see the help text (also displayed below) in the terminal window.
python scripts/qpcr_specific_analysis.py --help
python scripts/ddpcr_specific_analysis.py --help
python scripts/universal_analysis.py --help
Usage: qpcr_specific_analysis.py [OPTIONS]
Options:
-q, --qpcr-path qPCR data path to Excel file. [required]
-qs, --qpcr-sheet qPCR data Excel sheet name. [required]
-l, --layout-path 96-well layout path to Excel file. [required]
-ls, --layout-sheet 96-well layout Excel sheet name. [required]
-f, --format-conversion-path format conversion file path. [required]
-o, --output-path output folder path. [required]
--pcopy 16S rRNA copies in uLAdded (e.g. 6 uL) of stock P. vulgatus standard plasmid.
[100000000<=x<=1000000000000; required]
--fcopy 16S rRNA copies in uLAdded (e.g. 6 uL) of stock F. prausnitzii standard plasmid.
[100000000<=x<=1000000000000; required]
--num-of-tech-reps number of qPCR technical replicates. 2 replicates will select Rep1 and Rep2 (e.g.
A1, A2) from the format conversion file and 3 replicates will select Rep1, Rep2, and
Rep3 (e.g. A1, A2, B1) from the format conversion file. [default: 3; 2<=x<=3]
--max-cq-span-ntc maximum span of median Cq values of all no template controls. [default: 2;
0.1<=x<=3.3]
--max-cq-span-standard-dil-pt maximum span of Cq values of each standard dilution point. [default: 2;
0.1<=x<=3.3]
--max-standard-dil-pts-removed-tech-var
maximum number of standard dilution points removed for technical replicate
variation. [default: 1; 0<=x<=4]
--min-cq-gap-conc-standards minimum Cq gap between concentrated points of the standard curve (e.g. no plateau).
[default: 3.11; 2.81<=x<=3.41]
--cq-cutoff-conc-standards maximum Cq value to consider gap between concentrated points of the standard curve.
[default: 15; 8<=x<=18]
--cq-standards-sep-lob minimum Cq separation between most dilute standard dilution point and limit of
blank. [default: 2; 1<=x<=6.6]
--max-fold-change-pvul-fpra maximum fold change between P. vulgatus and F. prausnitzii standard curves.
[default: 2; 0.1<=x<=4]
--most-steep-slope-allowed most steep slope allowed for the standard curve. [default: -3.58; -3.98<=x<=-3.31]
--least-steep-slope-allowed least steep slope allowed for the standard curve. [default: -3.11; -3.29<=x<=-2.71]
--min-r-squared minimum R squared value for the standard curve. [default: 0.98; 0.93<=x<=0.999999]
--cq-non-standards-sep-lob minimum Cq separation between samples or controls and limit of blank. [default: 2;
1<=x<=6.6]
--overhang-allowed determines whether samples are allowed to overhang the dilute end of the standard
curve, while remaining the minimum distance from the limit of blank. [default:
False]
--max-copies-rxn-lob maximum acceptable apparent 16S rRNA copies per reaction of the no template
controls. [default: 500; 5<=x<=3000]
--max-cq-diff-sample-closest-two-reps
maximum difference between the Cq values of the two closest technical replicates for
samples or controls. [default: 2; 0.1<=x<=3.3]
--cq-low-conf-sep-lob Cq separation between samples or controls and limit of blank to decide a measurement
is low confidence. [default: 3.3; 1<=x<=6.6]
--help Show this message and exit.
Usage: ddpcr_specific_analysis.py [OPTIONS]
Options:
-d, --ddpcr-path ddPCR data path to Excel file. [required]
-ds, --ddpcr-sheet ddPCR data Excel sheet name. [required]
-l, --layout-path 96-well layout path to Excel file. [required]
-ls, --layout-sheet 96-well layout Excel sheet name. [required]
-o, --output-path output folder path. [required]
--droplet-volume volume (nL) of each droplet. [default: 0.795; 0.75<=x<=0.92]
--rxn-volume volume (uL) of ddPCR reaction as setup on the pre-droplet plate. [default: 22;
20<=x<=30]
--min-accepted-droplets minimum number of accepted droplets per well. [default: 10000; 5000<=x<=20000]
--max-copies-rxn-lob maximum 16S rRNA copies per reaction of the no template controls. [default: 25;
0<=x<=100]
--max-copies-rxn-span-ntc maximum span (e.g. fold change) of 16S rRNA copies per reaction of all no template
controls. [default: 4; 1<=x<=10]
--min-negative-droplets minimum number of negative droplets per well. [default: 10; 0<=x<=100]
--copies-rxn-loq-mult multiplied by the limit of blank to define the limit of quantification, under which
samples or controls are removed. [default: 4; 2<=x<=100]
--help Show this message and exit.
Usage: universal_analysis.py [OPTIONS]
Options:
-d, --data-path qPCR- or ddPCR-specific analysis output Excel file path. [required]
-ds, --data-sheet qPCR- or ddPCR-specific analysis output Excel sheet name (e.g.
'for_universal_analysis'). [required]
-w, --weights-path sample weights Excel file path. [required]
-ws, --weights-sheet sample weights Excel sheet name. [required]
-n, --nist-expected-path NIST expected values Excel file path. Must contain 16S rRNA copies per undiluted uL
in a column titled 'copies_uL_expected' and a name that matches the name from the
layout for qPCR- or ddPCR-specific analysis in a column titled 'Name'. [required]
-ns, --nist-expected-sheet NIST expected values Excel sheet name. [required]
-o, --output-path output folder path. [required]
--nist-max-fold-diff maximum fold difference between measured and expected NIST control 16S rRNA copies
per undiluted uL. [default: 5.0; 1.01<=x<=10]
--neg-extract-ctrl-max-copies maximum 16S rRNA copies per DNA extraction for negative DNA extraction controls.
[default: 5000.0; 500<=x<=30000]
--pos-extract-ctrl-max-span maximum span (e.g. fold change) of 16S rRNA copies per DNA extraction for positive
DNA extraction controls. [default: 2; 1.1<=x<=10]
--extract-max-input maximum amount of stool (g) from which to extract DNA. [default: 0.25;
0.025<=x<=0.5]
--extract-min-input minimum amount of stool (g) from which to extract DNA. [default: 0.15;
0.025<=x<=0.5]
--drying-max-input maximum amount of stool (g) to dry for stool moisture content. [default: 0.125;
0.025<=x<=0.5]
--drying-min-input minimum amount of stool (g) to dry for stool moisture content. [default: 0.075;
0.025<=x<=0.5]
--min-dried-dry-mass minimum dried amount of stool (g) from drying for stool moisture content. [default:
0.008; 0.002<=x<=0.05]
--water-fraction-cutoff cutoff for water fraction, given the error in 16S rRNA copies per dry gram as the
water fraction increases. [default: 0.9; 0.8<=x<=0.99]
--help Show this message and exit.