Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WGCNA Part 2: Running WGCNA #360

Merged
merged 16 commits into from
Nov 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions 04-advanced-topics/network-analysis_rnaseq_01_wgcna.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,9 @@ library(magrittr)

# We'll need this for finding gene modules
library(WGCNA)

# We'll be making some plots
library(ggplot2)
```

## Import and set up data
Expand Down Expand Up @@ -320,6 +323,103 @@ normalized_counts <- assay(dds_norm) %>%
t() # Transpose this data
```

## Determine parameters for WGCNA

To identify which genes are in the same modules, WGCNA first creates a weighted network to define which genes are near each other.
The measure of "adjacency" it uses is based on the correlation matrix, but requires the definition of a threshold value, which in turn depends on a "power" parameter that defines the exponent used when transforming the correlation values.
The choice of power parameter will affect the number of modules identified, and the WGCNA modules provides the `pickSoftThreshold()` function to help identify good choices for this parameter.

```{r}
sft <- pickSoftThreshold(normalized_counts,
dataIsExpr = TRUE,
corFnc = cor,
networkType = "signed"
)
```

This `sft` object has a lot of information, we will want to plot some of it to figure out what our `power` soft-threshold should be.
We have to first calculate a measure of the model fit, the signed $R^2$, and make that a new variable.

```{r}
sft_df <- data.frame(sft$fitIndices) %>%
dplyr::mutate(model_fit = -sign(slope) * SFT.R.sq)
```

Now, let's plot the model fitting by the `power` soft threshold so we can decide on a soft-threshold for power.

```{r}
ggplot(sft_df, aes(x = Power, y = model_fit, label = Power)) +
# Plot the points
geom_point() +
# We'll put the Power labels slightly above the data points
geom_text(nudge_y = 0.1) +
# We will plot what WGCNA recommends as an R^2 cutoff
geom_hline(yintercept = 0.80, col = "red") +
# Just in case our values are low, we want to make sure we can still see the 0.80 level
ylim(c(min(sft_df$model_fit), 1)) +
# We can add more sensible labels for our axis
xlab("Soft Threshold (power)") +
ylab("Scale Free Topology Model Fit, signed R^2") +
ggtitle("Scale independence") +
# This adds some nicer aesthetics to our plot
theme_classic()
```

Using this plot we can decide on a power parameter.
WGCNA's authors recommend using a `power` that has an signed $R^2$ above `0.80`, otherwise they warn your results may be too noisy to be meaningful.

If you have multiple power values with signed $R^2$ above `0.80`, then picking the one at an inflection point, in other words where the $R^2$ values seem to have reached their saturation [@Zhang2005].
You want to a `power` that gives you a big enough $R^2$ but is not excessively large.

So using the plot above, going with a power soft-threshold of `16`!

If you find you have all very low $R^2$ values this may be because there are too many genes with low expression values that are cluttering up the calculations.
You can try returning to [gene filtering step](#define-a-minimum-counts-cutoff) and choosing a more stringent cutoff (you'll then need to re-run the transformation and subsequent steps to remake this plot to see if that helped).

## Run WGCNA!

We will use the `blockwiseModules()` function to find gene co-expression modules in WGCNA, using `16` for the `power` argument like we determined above.

This next step takes some time to run.
The `blockwise` part of the `blockwiseModules()` function name refers to that these calculations will be done on chunks of your data at a time to help with conserving computing resources.

Here we are using the default `maxBlockSize`, 5000 but, you may want to adjust the `maxBlockSize` argument depending on your computer's memory.
The authors of WGCNA recommend running [the largest block your computer can handle](https://peterlangfelder.com/2018/11/25/blockwise-network-analysis-of-large-data/) and they provide some approximations as to GB of memory of a laptop and what `maxBlockSize` it should be able to handle:

> • If the reader has access to a large workstation with more than 4 GB of memory, the parameter maxBlockSize
can be increased. A 16GB workstation should handle up to 20000 probes; a 32GB workstation should handle
perhaps 30000. A 4GB standard desktop or a laptop may handle up to 8000-10000 probes, depending on
operating system and other running programs.

[@Langfelder2016]

```{r}
bwnet <- blockwiseModules(normalized_counts,
maxBlockSize = 5000, # What size chunks (how many genes) the calculations should be run in
TOMType = "signed", # topological overlap matrix
power = 16, # soft threshold for network construction
numericLabels = TRUE, # Let's use numbers instead of colors for module labels
randomSeed = 1234, # there's some randomness associated with this calculation
# so we should set a seed
)
```

The `TOMtype` argument specifies what kind of topological overlap matrix (TOM) should be used to make gene modules.
You can safely assume for most situations a `signed` network represents what you want -- we want WGCNA to pay attention to directionality.
However if you suspect you may benefit from an `unsigned` network, where positive/negative is ignored see [this article](https://peterlangfelder.com/2018/11/25/signed-or-unsigned-which-network-type-is-preferable/) to help you figure that out [@Langfelder2018].

There are a lot of other settings you can tweak -- look at `?blockwiseModules` help page as well as the [WGCNA tutorial](https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/) [@Langfelder2016].

## Write main WGCNA results object to file

We will save our whole results object to an RDS file in case we want to return to our original WGCNA results.

```{r}
readr::write_rds(bwnet,
file = file.path("results", "SRP133573_wgcna_results.RDS")
)
```

_Next sections addressed in upcoming PR_

# Resources for further learning
Expand Down
Loading