forked from microbiome/OMA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
01-introduction.Rmd
186 lines (135 loc) · 5.84 KB
/
01-introduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# (PART) Introduction {-}
# Data Infrastructure {#data-introduction}
```{r setup, echo=FALSE, results="asis"}
library(rebook)
chapterPreamble()
```
The
[`SummarizedExperiment`](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html)
(`SE`) is a widely used class for analyzing data obtained by common sequencing
techniques. `SE` is common structure for several Bioconductor packages that are
used for analyzing RNAseq, ChIp-Seq data. `SE` class is also used in R packages
for analyzing microarrays, flow cytometry, proteomics, single-cell sequencing
data and many more. The single-cell analysis is facilitated by
[SingelCellExperiment
class](https://bioconductor.org/packages/release/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html)
(`SCE`), which allows the user to store results of dimensionality reduction and
alternative experiments. Alternative experiments (`altExps`) can be differently
processed data within the analysis workflows.
Recently,
[`TreeSummarizedExperiment`](https://www.bioconductor.org/packages/release/bioc/html/TreeSummarizedExperiment.html)
(`TSE`),
[`MicrobiomeExperiment`](https://github.com/FelixErnst/MicrobiomeExperiment)
were developed to extend the `SE` and `SCE` class for incorporating hierarchical
information (including phylogenetic tree) and reference sequences.
The `mia` package implements tools using these classes for analysis of
microbiome sequencing data.
## Installation
Install the development version from GitHub using `remotes` R package.
```{r eval=FALSE, message=FALSE}
# install remotes
#install.packages("remotes")
BiocManager::install("FelixErnst/mia")
```
### Packages
1. `mia` : Microbiome analysis tools
2. `miaViz` : Microbiome analysis specific visualization
**See also:**
[`microbiome`](https://bioconductor.org/packages/devel/bioc/html/microbiome.html)
## Background
The `phyloseq` package and class was around before the [`SummarizedExperiment`](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html)
and the derived
[`TreeSummarizedExperiment`](https://www.bioconductor.org/packages/release/bioc/html/TreeSummarizedExperiment.html)
class. To make the transition easy
here is short description how `phyloseq` and `*Experiment` classes relate to
each other.
`assays` : This slot is similar to `otu_table` in `phyloseq`. In a `SummarizedExperiment`
object multiple assays, raw counts, transformed counts can be stored. See also
[`MultiAssayExperiment`](https://bioconductor.org/packages/release/bioc/html/MultiAssayExperiment.html) for storing data from multiple `experiments` such as RNASeq, Proteomics, etc.
`rowData` : This slot is similar to `tax_table` in `phyloseq` to store taxonomic information.
`colData` : This slot is similar to `sample_data` in `phyloseq` to store information related to samples.
`rowTree` : This slot is similar to `phy_tree` in `phyloseq` to store phylogenetic tree.
In this book, you will come across terms like `FeatureIDs` and `SampleIDs`.
`FeatureIDs` : These are basically OTU/ASV ids which are row names in `assays` and `rowData`.
`SampleIDs` : As the name suggests, these are sample ids which are column names in `assays` and row names in `colData`.
`FeatureIDs` and `SampleIDs` are used but the technical terms `rownames` and
`colnames` are encouraged to be used, since they relate to actual objects we
work with.
## Loading experimental microbiome data
## Metadata
## Microbiome and tree data specific aspects
```{r, message=FALSE}
library(mia)
data("GlobalPatterns")
se <- GlobalPatterns
se
```
### Assays
The `assays` slots contains the experimental data as count matrices. Multiple
matrices can be stored the result of `assays` is actually a list of matrices.
```{r}
assays(se)
```
Individual assays can be accessed via `assay`
```{r}
assay(se, "counts")[1:5,1:7]
```
To illustrate the use of multiple assays, the relative abundance data can be
calcualted and stored along the original count data using `relAbundanceCounts`.
```{r}
se <- relAbundanceCounts(se)
assays(se)
```
Now there are two assays available in the `se` object, `counts` and
`relabundance`.
```{r}
assay(se, "relabundance")[1:5,1:7]
```
### colData
`colData` contains data on the samples.
```{r coldata}
colData(se)
```
### rowData
`rowData` contains data on the features of the samples analyzed. Of particular
interest for the microbiome field this is used to store taxonomic information.
```{r rowdata}
rowData(se)
```
### rowTree
Phylogenetic trees also play an important role for the microbiome field. The
`TreeSummarizedExperiment` class is able to keep track of feature and node
relations via two functions, `rowTree` and `rowLinks`.
A tree can be accessed via `rowTree` as `phylo` object.
```{r rowtree}
rowTree(se)
```
The links to the individual features are available through `rowLinks`.
```{r rowlinks}
rowLinks(se)
```
Please note that there can be a 1:1 relationship between tree nodes and
features, but this is not a must have. This means there can be features, which
are not linked to nodes, and nodes, which are not linked to features. To change
the links in an existing object, the `changeTree` function is available.
## Data conversion
Sometimes custom solutions are need for analyzing the data. `mia` contains a
few functions to help in these situations.
### Tidy data
For several custom analysis and visualization packages such as those from the
`tidyverse` the `SE` data can be converted to long data.frame format with
`meltAssay`.
```{r}
library(mia)
molten_se <- meltAssay(se,
add_row_data = TRUE,
add_col_data = TRUE,
abund_values = "relabundance")
molten_se
```
## Conclusion
Some wrapping up...
## Session Info {-}
```{r sessionInfo, echo=FALSE, results='asis'}
prettySessionInfo()
```