Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Usage introduction simplification #378

Draft
wants to merge 8 commits into
base: staging
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 42 additions & 110 deletions 04-advanced-topics/network-analysis_rnaseq_01_wgcna.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,164 +26,96 @@ As with many clustering and network methods, there are some parameters that may

# How to run this example

For general information about our tutorials and the basic software packages you will need, please see our ['Getting Started' section](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#how-this-tutorial-is-structured).
We recommend taking a look at our [Resources for Learning R](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#resources-for-learning-r) if you have not written code in R before.
For general information about our tutorials and the basic software packages required, please see our ['Getting Started' pages](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html).
Below are some brief instructions about the files and directory structure this example expects.
If you need more detailed instructions about how to obtain the data files from refine.bio, please consult one of the earlier examples, such as this one about [clustering and heatmaps](https://alexslemonade.github.io/refinebio-examples/03-rnaseq/clustering_rnaseq_01_heatmap.html).

## Obtain the `.Rmd` file

To run this example yourself, [download the `.Rmd` for this analysis by clicking this link](https://alexslemonade.github.io/refinebio-examples/03-rnaseq/differential_expression_rnaseq_01_rnaseq.Rmd).
## Directory structure and required files

Clicking this link will most likely send this to your downloads folder on your computer.
Move this `.Rmd` file to where you would like this example and its files to be stored.
To run this example yourself, [download the `.Rmd` for this analysis by clicking this link](https://alexslemonade.github.io/refinebio-examples/03-rnaseq/differential_expression_rnaseq_01_rnaseq.Rmd) and move the `.Rmd` file to your preferred analysis folder.

You can open this `.Rmd` file in RStudio and follow the rest of these steps from there. (See our [section about getting started with R notebooks](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#how-to-get-and-use-rmds) if you are unfamiliar with `.Rmd` files.)

## Set up your analysis folders

Good file organization is helpful for keeping your data analysis project on track!
We have set up some code that will automatically set up a folder structure for you.
Run this next chunk to set up your folders!

If you have trouble running this chunk, see our [introduction to using `.Rmd`s](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#how-to-get-and-use-rmds) for more resources and explanations.

```{r}
# Create the data folder if it doesn't exist
if (!dir.exists("data")) {
dir.create("data")
}

# Define the file path to the plots directory
plots_dir <- "plots" # Can replace with path to desired output plots directory

# Create the plots folder if it doesn't exist
if (!dir.exists(plots_dir)) {
dir.create(plots_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still will need to create these directories though. (Unless you've put this part somewhere else)

}

# Define the file path to the results directory
results_dir <- "results" # Can replace with path to desired output results directory

# Create the results folder if it doesn't exist
if (!dir.exists(results_dir)) {
dir.create(results_dir)
}
```

In the same place you put this `.Rmd` file, you should now have three new empty folders called `data`, `plots`, and `results`!

## Obtain the dataset from refine.bio

For general information about downloading data for these examples, see our ['Getting Started' section](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#how-to-get-the-data).

Go to this [dataset's page on refine.bio](https://www.refine.bio/experiments/SRP140558).

Click the "Download Now" button on the right side of this screen.

<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/download-now.png" width=200>

Fill out the pop up window with your email and our Terms and Conditions:

<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/download-email.png" width=500>

We are going to use non-quantile normalized data for this analysis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about deleting this piece in particular. We need them to know whether or not they should download quantile normalized data and where to find that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do cover that on line 40, but it could be made more prominent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think screenshots help. I guess bigger question is just because someone's an "advanced topics" user, can we assume they know how to download data from refine.bio and know the refine.bio options more readily?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I think even advanced topics users will appreciate screenshots. Though I do agree with cutting back on the file path hand holding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my thought, or they could go to another example to get this information. 🤷🏼

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go to another example to get this information. 🤷🏼

If we think they might do this, then we should probably just keep the screenshots here too. I think screenshots help decrease brain glucose usage for simple things.

To get this data, you will need to check the box that says "Skip quantile normalization for RNA-seq samples".
Note that this option will only be available for RNA-seq datasets.
The data we are using is from the SRA project SRP140558 as processed by refine.bio.
To obtain this data, go to the [dataset's page on refine.bio](https://www.refine.bio/experiments/SRP140558/identification-of-transcription-factor-relationships-associated-with-androgen-deprivation-therapy-response-and-metastatic-progression-in-prostate-cancer) and click the "Download Now" button.
Follow the instructions, being sure to select "Skip quantile normalization for RNA-seq samples".

<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/skip-quantile-normalization.png" width=500>

It may take a few minutes for the dataset to process.
You will get an email when it is ready.

## About the dataset we are using for this example

For this example analysis, we will use this [acute viral bronchiolitis dataset](https://www.refine.bio/experiments/SRP140558).
The data that we downloaded from refine.bio for this analysis has 62 paired peripheral blood mononuclear cell RNA-seq samples obtained from 31 patients.
Samples were collected at two time points: during their first, acute bronchiolitis visit (abbreviated "AV") and their recovery, their post-convalescence visit (abbreviated "CV").

## Place the dataset in your new `data/` folder

refine.bio will send you a download button in the email when it is ready.
Follow the prompt to download a zip file that has a name with a series of letters and numbers and ends in `.zip`.
Double clicking should unzip this for you and create a folder of the same name.

<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/download-folder-structure.png" width=400>

For more details on the contents of this folder see [these docs on refine.bio](http://docs.refine.bio/en/latest/main_text.html#downloadable-files).

The `<experiment_accession_id>` folder has the data and metadata TSV files you will need for this example analysis.
Experiment accession ids usually look something like `GSE1235` or `SRP12345`.

Copy and paste the `SRP140558` folder into your newly created `data/` folder.
Once you have downloaded and unzipped the refine.bio dataset, place the `SRP140558` folder in a `data` subdirectory of your analysis folder.

## Check out our file structure!
We will also create `plots` and `results` folders for future use.
The analysis folder will have the following content:

Your new analysis folder should contain:

- The example analysis `.Rmd` you downloaded
- A folder called "data" which contains:
- The example analysis `.Rmd` notebook
- A folder called `data` which contains:
- The `SRP140558` folder which contains:
- The gene expression
- The metadata TSV
- A folder for `plots` (currently empty)
- A folder for `results` (currently empty)
- A `plots` folder
- A `results` folder

Your example analysis folder should now look something like this (except with respective experiment accession ID and analysis notebook name you are using):
Your analysis folder will end up looking something like this (except with respective experiment accession ID and analysis notebook name you are using):

<img src="https://github.com/AlexsLemonade/refinebio-examples/raw/40e47f4d3f39283effbd9843a457168061be9680/template/screenshots/analysis-folder-structure.png" width=400>

In order for our example here to run without a hitch, we need these files to be in these locations so we've constructed a test to check before we get started with the analysis.
These chunks will declare your file paths and double check that your files are in the right place.
## Define file paths

First we will declare our file paths to our data and metadata files, which should be in our data directory.
This is handy to do because if we want to switch the dataset (see next section for more on this) we are using for this analysis, we will only have to change the file path here to get started.
We will define variables for the files and directories we are using in the chunk below.

```{r}
# Define the file path to the data directory
data_dir <- file.path("data", "SRP140558") # Replace with accession number which will be the name of the folder the files will be in
# Define the file path to the accession data directory
data_dir <- file.path("data", "SRP140558")

# path to the refine.bio expression matrix
data_file <- file.path(data_dir, "SRP140558.tsv")
# file path to the refine.bio metadata file
metadata_file <- file.path(data_dir, "metadata_SRP140558.tsv")

# Declare the file path to the gene expression matrix file using the data directory saved as `data_dir`
data_file <- file.path(data_dir, "SRP140558.tsv") # Replace with file path to your dataset
# Define the file path to the plots directory
# (create it if missing)
plots_dir <- "plots"
if (!dir.exists(plots_dir)) {
dir.create(plots_dir)
}

# Declare the file path to the metadata file using the data directory saved as `data_dir`
metadata_file <- file.path(data_dir, "metadata_SRP140558.tsv") # Replace with file path to your metadata
# Define the file path to the results directory
results_dir <- "results"
if (!dir.exists(results_dir)) {
dir.create(results_dir)
}
```

Now that our file paths are declared, we can use the `file.exists()` function to check that the files are where we specified above.
It is always worth checking that the paths we defined above are correct and the files are where we expect!

```{r}
# Check if the gene expression matrix file is at the file path stored in `data_file`
# Check the gene expression matrix file
file.exists(data_file)

# Check if the metadata file is at the file path stored in `metadata_file`
# Check for the metadata file
file.exists(metadata_file)
```

If the chunk above printed out `FALSE` to either of those tests, you won't be able to run this analysis _as is_ until those files are in the appropriate place.

If the concept of a "file path" is unfamiliar to you; we recommend taking a look at our [section about file paths](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#an-important-note-about-file-paths-and-Rmds).

# Using a different refine.bio dataset with this analysis?
## Using a different refine.bio dataset with this analysis?

If you'd like to adapt an example analysis to use a different dataset from [refine.bio](https://www.refine.bio/), we recommend placing the files in the `data/` directory you created and changing the filenames and paths in the notebook to match these files (we've put comments to signify where you would need to change the code).
We suggest saving plots and results to `plots/` and `results/` directories, respectively, as these are automatically created by the notebook.
From here you can customize this analysis example to fit your own scientific questions and preferences.

### Sample size

Keep in mind when using a different refine.bio dataset with this example, that WGCNA requires at least 15 samples to produce a meaningful result [according to its authors](https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html).
Keep in mind when using a different refine.bio dataset with this example that WGCNA requires at least 15 samples to produce a meaningful result [according to its authors](https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html).
So you will need to make sure the dataset you use is sufficiently large.
However, note that very large datasets will be difficult to run locally (on a personal laptop) due to the required computing power.
While you can adjust some parameters to make this more doable on a laptop, it may decrease the reliability of your result if taken to an extreme (more on this parameter, called `maxBlockSize`, in the [`Run WGCNA!` section](#run-wgcna)).

### Microarray vs RNA-seq
### Microarray vs. RNA-seq

WGCNA can be used with both RNA-seq and microarray datasets so long as they are well normalized and filtered.
In this example we use RNA-seq and [normalize and transform the data with DESeq2's `vst()`](https://alexslemonade.github.io/refinebio-examples/03-rnaseq/00-intro-to-rnaseq.html#deseq2-transformation-methods), which not only is a method and package we recommend in general, but is also the [authors' specific recommendations for using WGCNA with RNA-seq data](https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html#:~:text=Can%20WGCNA%20be%20used%20to,Yes.&text=Whether%20one%20uses%20RPKM%2C%20FPKM,were%20processed%20the%20same%20way.).

If you end up wanting to run WGCNA with a microarray dataset, the normalization done by refine.bio _should_ be sufficient, but you will likely want to [apply a minimum expression filter](#define-a-minimum-counts-cutoff) as we do in this example.
If you have troubles finding a `power` parameter that yields a sufficient R^2 even after applying a stringent cutoff, you may want to look into using a different dataset.

***

<!-- Do not delete this line --> <a name="analysis" style="padding-top:56px;margin-top:-56px;">&nbsp;</a>

Expand Down
Loading