Files for the NHGRI Genomics Short Course - Microbiome - Computational Lab
Now LIVE : Microbiome Virtual Lab Exploration
One of the most basic questions a microbiome researcher can ask is: “What is in my sample?” There are lots of ways you can answer that question and the method you choose will determine how much biological detail you can resolve. While the microbiome refers to the collection of bacteria, fungi, viruses, protists and metazoans in a sample; Researchers are often specifically interested in the bacterial component. Identification of bacteria is based on a taxonomic hierarchy. For instance, most people are familiar with the bacteria Escherichia coli. The bacteria E. coli is in the family Enterobacteriaceae and the phylum Proteobacteria; here is the full taxonomic hierarchy for E. coli:
Bacteria (Kingdom); Proteobacteria (Phylum); Gammaproteobacteria (Class); Enterobacterales (Order); Enterobacteriaceae (Family); Escherichia (Genus); Escherichia coli (Species)
While taxonomic levels often stop at “species”, there are additional taxonomic levels that allow scientists to categorize bacteria in finer detail (e.g., strains). The level of detail you can get from a microbiome experiment depends on the experimental method, see Maiden et al, Nature Reviews 2013.
The files below are part of a larger lesson plan to expose students to microbiome sequences using 16S rRNA sequences. Please visit NHGRI Genomics Short Course for the full lesson plan.
- FASTA sequence files: ba04826.sub100.fasta, st06686.sub100.fasta, to10842.sub100.fasta, vf03604.sub100.fasta, DOK03.fasta
- RDP classifier results: ba_rdp_result.pdf, st_rdp_result.pdf, to_rdp_result.pdf, vf_rdp_result.pdf, DOK03_rdp_result.pdf
- DOK03_piechart.xlsx - Example pie chart derived from the DOK3 data. Note that piecharts are easy to make but aren't great for representing data (see https://en.wikipedia.org/wiki/Pie_chart)
- nihms424103.pdf - Review paper by Grice and Segre
- key.txt - Key describing each of the sequence files (also reproduced below)
The fasta files were generated by subsampling a larger sequence file, produced using pyrosequencing on a Roche 454 instrument. Subsampling was done using seqkit:
seqkit sample -n 100 input.fasta > output.sub100.fasta
ba04826.sub100.fasta
These sequences are from the skin, specifically the back. Students should be able to tell that this is a skin site because it is 97% Actinobacteria. The back, in particular, is considered an oily site and is dominated by the bacterial genus "Propionibacteria" (also called Cutibacteria)
st06686.sub100.fasta
The sequences are from the stool. The dominance of Bacteroidetes (22%) and Firmicutes (74%) is how you know.
to10842.sub100.fasta
This is the mouth (Tongue). Streptococcus (29%) is a common mouth bacterial genus.
vf03604.sub100.fasta
Skin, forearm. You can tell it is skin because of the dominant Actinobacteria (63%). Unlike the back, the forearm is considered more of a dry site and shows more diversity of bacterial genera.
DOK03.fasta is a published Agricultural soil microbiome. Included here as an example to work in class. It has 1904 sequences. Downloaded from the mothur website.