Skip to content

Commit

Permalink
did assgn 7
Browse files Browse the repository at this point in the history
  • Loading branch information
RoyBoy432 committed Feb 25, 2017
1 parent cd5d4bc commit 2e47bc1
Show file tree
Hide file tree
Showing 5 changed files with 1,034 additions and 48 deletions.
2 changes: 1 addition & 1 deletion Week3-Beta/beta_assignment.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ In the R code chunk below, provide the code to:
4. load the `vegan` R package (be sure to install if needed).

```{r}
rm(list = ls())
#rm(list = ls())
getwd()
#setwd("/Users/rzmogerr/GitHub/QB2017_Moger-Reischer/Week3-Beta")
#I'll load all of the packages, following the style of the handout
Expand Down
Binary file modified Week4-Spatial/PondsMap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
226 changes: 179 additions & 47 deletions Week7-PhyloCom/PhyloCom_assignment.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Phylogenetic Diversity - Communities"
author: "RZ Moger-Reischer; BIOL-Z 620: Quantitative Biodiversity, Indiana University"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: pdf_document
output: html_document
geometry: margin=2.54cm
---

Expand Down Expand Up @@ -44,6 +44,7 @@ In the R code chunk below, provide the code to:
#rm(list = ls())
#getwd()
#setwd("~/GitHub/QB2017_Moger-Reischer/Week7-PhyloCom/")
#setwd("C:\\Users\\rmoge\\GitHub\\QB2017_Moger-Reischer\\Week7-PhyloCom")
package.list <- c('picante', 'ape', 'seqinr', 'vegan', 'fossil','devtools', 'simba')
for (package in package.list) {
Expand Down Expand Up @@ -329,11 +330,16 @@ c. Why might MPD show less variation than UniFrac?

> ***Answer 4a***:
>
> For all pairs of taxa in two samples, the mean pairwise distance looks at all those pairwise distances and takes average. On the other hand, UniFrac distance looks at the entire tree topology, and then calculates what proporortion of it comprises branches which are part of both samples and what proportion of its branch length is occupied by only one of those two samples.
> ***Answer 4b***:
> The UniFrac distance can vary a lot, but the mean pair distance changes only a little.
> ***Answer 4c***:
> The mean pairwise distance is an average of all pairs---an average of many calculations. Contrariwise, the UniFrac distance between two samples is a measurement of a single ratio. I would expect that a metric which is an average will tend to be less variable.

### B. Visualizing Phylogenetic Beta-Diversity
Now that we have our phylogenetically based community resemblance matrix, we can visualize phylogenetic diversity among samples using the same techniques that we used in the $\beta$-diversity module from earlier in the course.
Expand All @@ -343,11 +349,13 @@ In the R code chunk below, do the following:
2. calculate the explained variation for the first three PCoA axes.

```{r}
#1
pond.pcoa <- cmdscale(dist.uf, eig = T, k = 3)
#2
explainvar1 <- round(pond.pcoa$eig[1] / sum(pond.pcoa$eig), 3) * 100
explainvar2 <- round(pond.pcoa$eig[2] / sum(pond.pcoa$eig), 3) * 100
explainvar3 <- round(pond.pcoa$eig[3] / sum(pond.pcoa$eig), 3) * 100
sum.eig <- sum(explainvar1, explainvar2, explainvar3)
```

Expand All @@ -360,28 +368,58 @@ In the R code chunk below, do the following:
4. customize the plot.

```{r}
par(mar = c(5, 5, 1, 2) + 0.1)
#1, #2
plot(pond.pcoa$points[ ,1], pond.pcoa$points[ ,2],xlim = c(-0.2, 0.2), ylim = c(-.16, 0.16),xlab = paste("PCoA 1 (", explainvar1, "%)", sep = ""),ylab = paste("PCoA 2 (", explainvar2, "%)", sep = ""),pch = 16, cex = 2.0, type = "n", cex.lab = 1.5, cex.axis = 1.2, axes = FALSE)
axis(side = 1, labels = T, lwd.ticks = 2, cex.axis = 1.2, las = 1)
axis(side = 2, labels = T, lwd.ticks = 2, cex.axis = 1.2, las = 1)
abline(h = 0, v = 0, lty = 3)
box(lwd = 2)
#3
points(pond.pcoa$points[ ,1], pond.pcoa$points[ ,2],
pch = 19, cex = 3, bg = "#000000", col = "#e1793d")
text(pond.pcoa$points[ ,1], pond.pcoa$points[ ,2],
labels = row.names(pond.pcoa$points))
```

In the following R code chunk:
1. perform another PCoA on taxonomic data using an appropriate measure of dissimilarity, and
2. calculate the explained variation on the first three PCoA axes.

```{r}
dist.db <- vegdist(comm, method = "bray")
pond.pcoa.tax <- cmdscale(dist.db, eig = T, k = 3)
#2
explainvar1 <- round(pond.pcoa.tax$eig[1] / sum(pond.pcoa.tax$eig), 3) * 100
explainvar2 <- round(pond.pcoa.tax$eig[2] / sum(pond.pcoa.tax$eig), 3) * 100
explainvar3 <- round(pond.pcoa.tax$eig[3] / sum(pond.pcoa.tax$eig), 3) * 100
sum.eig.tax <- sum(explainvar1, explainvar2, explainvar3)
par(mar = c(5, 5, 1, 2) + 0.1)
#1, #2
plot(pond.pcoa$points[ ,1], pond.pcoa$points[ ,2],xlim = c(-0.2, 0.2), ylim = c(-.16, 0.16),xlab = paste("PCoA 1 (", explainvar1, "%)", sep = ""),ylab = paste("PCoA 2 (", explainvar2, "%)", sep = ""),pch = 16, cex = 2.0, type = "n", cex.lab = 1.5, cex.axis = 1.2, axes = FALSE)
axis(side = 1, labels = T, lwd.ticks = 2, cex.axis = 1.2, las = 1)
axis(side = 2, labels = T, lwd.ticks = 2, cex.axis = 1.2, las = 1)
abline(h = 0, v = 0, lty = 3)
box(lwd = 2)
#3
points(pond.pcoa$points[ ,1], pond.pcoa$points[ ,2],
pch = 19, cex = 3, bg = "#000000", col = "#299d09")
text(pond.pcoa$points[ ,1], pond.pcoa$points[ ,2],
labels = row.names(pond.pcoa$points))
```

In the following R code chunk:
1. perform another PCoA on taxonomic data using an appropriate measure of dissimilarity, and
2. calculate the explained variation on the first three PCoA axes.

***Question 5***: Using a combination of visualization tools and percent variation explained, how does the phylogenetically based ordination compare or contrast with the taxonomic ordination? What does this tell you about the importance of phylogenetic information in this system?

> ***Answer 5***:
> The PCoA clusterings look mostly similar. However, for the taxonomic ordination, the first two PCoA axes explain a lot more (40.4%) of the variation than do the first two axes for the phylogenetic ordination (15.5%). This could mean that phylogenetic information is not very important for this system.
### C. Hypothesis Testing

**i. Categorical Approach**
Expand All @@ -391,8 +429,10 @@ In the R code chunk below, do the following:

```{r}
watershed <- env$Location
adonis(dist.uf ~ watershed, permutations = 999)
cata<-adonis(vegdist(decostand(comm, method = "log"),method = "bray") ~ watershed,permutations = 999)
```
Expand All @@ -405,7 +445,9 @@ In the R code chunk below, do the following:

```{r}
envs <- env[, 5:19]
envs <- envs[, -which(names(envs) %in% c("TDS", "Salinity", "Cal_Volume"))]
env.dist <- vegdist(scale(envs), method = "euclid")
```
Expand All @@ -415,7 +457,7 @@ In the R code chunk below, do the following:

```{r}
my.mantel<-mantel(dist.uf, env.dist)
```

Last, conduct a distance-based Redundancy Analysis (dbRDA).
Expand All @@ -426,22 +468,43 @@ In the R code chunk below, do the following:
3. plot the dbRDA results

```{r}
#1
ponds.dbrda <- vegan::dbrda(dist.uf ~ ., data = as.data.frame(scale(envs)))
#2
anova(ponds.dbrda, by = "axis")
ponds.fit <- envfit(ponds.dbrda, envs, perm = 999)
ponds.fit
# Calculate explained variation
dbrda.explainvar1 <- round(ponds.dbrda$CCA$eig[1] / sum(c(ponds.dbrda$CCA$eig, ponds.dbrda$CA$eig)), 3) * 100
dbrda.explainvar2 <- round(ponds.dbrda$CCA$eig[2] / sum(c(ponds.dbrda$CCA$eig, ponds.dbrda$CA$eig)), 3) * 100
#3
par(mar = c(5, 5, 4, 4) + 0.1)
plot(scores(ponds.dbrda, display = "wa"), xlim = c(-2, 2), ylim = c(-2, 2), xlab = paste("dbRDA 1 (", dbrda.explainvar1, "%)", sep = ""), ylab = paste("dbRDA 2 (", dbrda.explainvar2, "%)", sep = ""), pch = 16, cex = 2.0, type = "n", cex.lab = 1.5, cex.axis = 1.2, axes = FALSE)
# Axes
axis(side = 1, labels = T, lwd.ticks = 2, cex.axis = 1.2, las = 1)
axis(side = 2, labels = T, lwd.ticks = 2, cex.axis = 1.2, las = 1)
abline(h = 0, v = 0, lty = 3)
box(lwd = 2)
# labels, points
points(scores(ponds.dbrda, display = "wa"),
pch = 19, cex = 3, bg = "#4cf076", col = "#4cf076")
text(scores(ponds.dbrda, display = "wa"),
labels = row.names(scores(ponds.dbrda, display = "wa")), cex = 0.5)
#vectors
vectors <- scores(ponds.dbrda, display = "bp")
arrows(0, 0, vectors[,1] * 2, vectors[, 2] * 2,lwd = 2, lty = 1, length = 0.2, col = "#2f5cb5")
text(vectors[,1] * 2, vectors[, 2] * 2, pos = 3,labels = row.names(vectors))
axis(side = 3, lwd.ticks = 2, cex.axis = 1.2, las = 1, col = "#2f5cb5", lwd = 2.2, at = pretty(range(vectors[, 1])) * 2, labels = pretty(range(vectors[, 1])))
axis(side = 4, lwd.ticks = 2, cex.axis = 1.2, las = 1, col = "#2f5cb5", lwd = 2.2, at = pretty(range(vectors[, 2])) * 2, labels = pretty(range(vectors[, 2])))
```

***Question 6***: Based on the multivariate procedures conducted above, describe the phylogenetic patterns of $\beta$-diversity for bacterial communities in the Indiana ponds.

> ***Answer 6***:
> Watershed of origin has a significant influence on phylogenetic distribution of species (P = 0.006). Among the environmental variables, it looks like depth; ORP; temperature; conductivity; pH; and chlorophyll a are significant predictors of phylogenetic composition.
## 6) SPATIAL PHYLOGENETIC COMMUNITY ECOLOGY

### A. Phylogenetic Distance-Decay (PDD)
Expand All @@ -456,13 +519,21 @@ In the R code chunk below, do the following:

```{r}
#1
long.lat <- as.matrix(cbind(env$long, env$lat))
coord.dist <- earth.dist(long.lat, dist = TRUE)
#2
bray.curtis.dist <- 1 - vegdist(comm)
#3
unifrac.dist <- 1 - dist.uf
# Transform all distances into list format:
unifrac.dist.ls <- liste(unifrac.dist, entry = "unifrac")
bray.curtis.dist.ls <- liste(bray.curtis.dist, entry = "bray.curtis")
coord.dist.ls <- liste(coord.dist, entry = "geo.dist")
env.dist.ls <- liste(env.dist, entry = "env.dist")
#4
df <- data.frame(coord.dist.ls, bray.curtis.dist.ls[, 3], unifrac.dist.ls[, 3], env.dist.ls[, 3])
names(df)[4:6] <- c("bray.curtis", "unifrac", "env.dist")
```

Expand All @@ -474,7 +545,23 @@ In the R code chunk below, do the following:

```{r}
par(mfrow=c(2, 1), mar = c(1, 5, 2, 1) + 0.1, oma = c(2, 0, 0, 0))
#1
plot(df$geo.dist, df$bray.curtis, xlab = "", xaxt = "n", las = 1, ylim = c(0.1, 0.9), ylab="Bray-Curtis Similarity", main = "Distance Decay", col = "#4189b4")
#3
DD.reg.bc <- lm(df$bray.curtis ~ df$geo.dist)
summary(DD.reg.bc)
abline(DD.reg.bc , col = "#00eaf3", lwd = 2)
par(mar = c(2, 5, 1, 1) + 0.1)
#2
plot(df$geo.dist, df$unifrac, xlab = "", las = 1, ylim = c(0.1, 0.9), ylab = "Unifrac Similarity", col = "#805239")
#3
DD.reg.uni <- lm(df$unifrac ~ df$geo.dist)
summary(DD.reg.uni)
abline(DD.reg.uni, col = "#00eaf3", lwd = 2)
mtext("Geographic Distance (km)", side = 1, adj = 0.55, line = 0.5, outer = TRUE)
Expand All @@ -488,14 +575,17 @@ In the R code chunk below, test if the trend lines in the above distance decay r

```{r}
diffslope(df$geo.dist, df$unifrac, df$geo.dist, df$bray.curtis)
```

***Question 7***: Interpret the slopes from the taxonomic and phylogenetic DD relationships. If there are differences, hypothesize why this might be.

> ***Answer 7***:
> The slopes are statistically significantly different. However, the difference is quite small in magnitude (0.001603). Therefore one might believe it would be wise to investigate the biology from which the statistical difference arises---it is possible that this statistically significant difference is not biologically meaningful.
> That said, the magnitude of the slopes is very small. And in fact the absolute value of the slope for the DD is an order of magnitude larger than that of the slope of the PDD. It appears that there is biological significance. In addition, for the PDD the slope is slightly positive (although the regression itself is not significant (P = 0.07738))---when accounting phylogeny there is no decrease in similarity over geographic distance. Whereas for the taxonomic DD there is a statistically signifcant (P = 0.0262) decrease in similarity across geographic distance.
### B. Phylogenetic diversity-area relationship (PDAR)

Expand All @@ -505,12 +595,35 @@ In the R code chunk below, write a function to generate the PDAR.

```{r}
PDAR <- function(comm, tree){
areas <- c()
diversity <- c()
num.plots <- c(2, 4, 8, 16, 32, 51)
for (i in num.plots){
areas.iter <- c()
diversity.iter <- c()
for (j in 1:10){
pond.sample <- sample(51, replace = FALSE, size = i)
area <- 0
sites <- c()
for (k in pond.sample) {
area <- area + pond.areas[k]
sites <- rbind(sites, comm[k, ])
}
areas.iter <- c(areas.iter, area)
psv.vals <- psv(sites, tree, compute.var = FALSE)
psv <- psv.vals$PSVs[1]
diversity.iter <- c(diversity.iter, as.numeric(psv))
}
diversity <- c(diversity, mean(diversity.iter))
areas <- c(areas, mean(areas.iter))
print(c(i, mean(diversity.iter), mean(areas.iter)))
}
return(cbind(areas, diversity))
}
```
Expand All @@ -525,20 +638,39 @@ In the R code chunk below, do the following:
5. customize the PDAR plot.

```{r}
#1
pond.areas <- as.vector(pi * (env$Diameter/2)^2)
#2
pdar <- PDAR(comm, phy)
pdar <- as.data.frame(pdar)
pdar$areas <- sqrt(pdar$areas)
#3
Pearson <- cor.test(pdar$areas, pdar$diversity, method = "pearson")
P <- round(Pearson$estimate, 2)
P.pval <- round(Pearson$p.value, 3)
Spearman <- cor.test(pdar$areas, pdar$diversity, method = "spearman")
rho <- round(Spearman$estimate, 2)
rho.pval <- round(Spearman$p.value, 3)
#4
plot.new()
par(mfrow=c(1, 1), mar = c(1, 5, 2, 1) + 0.1, oma = c(2, 0, 0, 0))
plot(pdar[, 1], pdar[, 2], xlab = "Area", ylab = "PSV", ylim = c(0, 1),main = "Phylogenetic Diversity-Area Relationship",col = "#a96632", pch = 16, las = 1)
legend("topleft", legend= c(paste("Spearman Correlation = ", rho, "; p = ", rho.pval, sep = ""),paste("Pearson Correlation = ", P, "; p = ", P.pval, sep = "")),bty = "n", col = "#a96632")
dufit<-lm(pdar$diversity~pdar$areas )
abline(.4266,-0.00002398, col="#dbd4ab")
```

***Question 8***: Compare your observations of the microbial PDAR and SAR in the Indiana ponds?
How might you explain the differences between the taxonomic (SAR) and phylogenetic (PDAR)?

> ***Answer 8***:
> The z value for the SAR is 0.144. For the PDAR, however, the slope is essentially zero (very slightly negative, slope=-0.00002398). This makes sense. The SAR should always be positive, because you should not 'lose' species as area increases. Contrariwise, it is possible that as area increases the spatial extent will start to include new species that are very closely related to species which have already been sampled---perhaps these close relatives are occupying the same niche, but are located somewhere adjacent. In such a circumstance there is phylogenetic clustering of trait values, and local competitive exclusion among closely related species.
## SYNTHESIS

Ignoring technical or methodological constraints, discuss how phylogenetic information could be useful in your own research. Specifically, what kinds of phylogenetic data would you need? How could you use it to answer important questions in your field? In your response, feel free to consider not only phylogenetic approaches related to phylogenetic community ecology, but also those we discussed last week in the PhyloTraits module, or any other concepts that we have not covered in this course.
Ignoring technical or methodological constraints, discuss how phylogenetic information could be useful in your own research. Specifically, what kinds of phylogenetic data would you need? How could you use it to answer important questions in your field? In your response, feel free to consider not only phylogenetic approaches related to phylogenetic community ecology, but also those we discussed last week in the PhyloTraits module, or any other concepts that we have not covered in this course.

> Let us imagine that there are epigenetic mechanisms regulating a dormancy response to starvation in bacteria. There are a few ways I could use phylogenetic information. Firstly, I could construct a tree and assess the clustered/"overdispersed" pattern of the trait(i.e. up- or down-regulation of methylation when starved)'s appearance on the tree topology. Secondly, within and between methyltransferase families, I would like to BLAST flanking genes for each dormancy-related methylation motif in a custom library all-against-all search. By identifying reciprocal best hits, I would be able to determine which SPECIFIC motifs are phylogenetically conserved. (Too, I might be able to learn which up- and down-stream GENEs could be involved in the dormancy response.)
854 changes: 854 additions & 0 deletions Week7-PhyloCom/PhyloCom_assignment.html

Large diffs are not rendered by default.

Binary file modified Week7-PhyloCom/PhyloCom_assignment.pdf
Binary file not shown.

0 comments on commit 2e47bc1

Please sign in to comment.