Skip to content

Commit

Permalink
Merge branch 'predictor-plots'
Browse files Browse the repository at this point in the history
Conflicts:
	README.md
	README.txt
  • Loading branch information
shraddhapai committed Sep 13, 2017
2 parents 16a506e + c6d4615 commit ff6e449
Show file tree
Hide file tree
Showing 609 changed files with 790,598 additions and 1,133 deletions.
85 changes: 70 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,28 @@ The `examples/` folder contains R code that should just run once both `netDx/` a
For more information and FAQ, visit **http://netdx.org**

* [Install netDx](#install-netdx)
* [Prerequisites](#prerequisites)
* [Test functionality](#test-functionality)
* [Known issues with compiling pdfs](#known-issues-with-compiling-pdfs)
* [Run breastcancer LumA example](#run-breastcancer-luma-example)
* [Run Medulloblastoma example](#mblastoma)
* [Run breastcancer LumA example](#brca)
* [See full list of examples](#other-examples)
* [Known issues with compiling pdfs](#pdfissue)

**Other useful information:**
* [Read the netDx preprint at bioRXiv](https://doi.org/10.1101/084418): Pai et al. (2016). netDx: Patient classification using integrated patient similarity networks. https://doi.org/10.1101/084418
* Once you have run the included netDx examples, [read the user manual](http://netdx-manual.readthedocs.io/en/latest/) to learn how to design features or predictors.

## Install netDx

### Prerequisites
**netDx has been tested on Mac OS/X and on Linux systems. For now we recommend you run netDx on these operating systems.** Future versions of netDx will have Windows support.

You must have Java, Python and R installed. Within R, you must have BioConductor installed. This section helps you figure out which
of these you need to install. If you already have all these, skip to the next section.
You must have Java, Python and R installed. Within R, you must have BioConductor installed. To plot the results of the predictor run, including network visualizations such as the EnrichmentMap and integrated patient similarity network, you will need Cytoscape with the latest EnrichmentMap and AutoAnnotate apps installed. This section helps you figure out which of these you need to install.

If you already have all these, skip to the next section.

#### Java (1.8+ recommended, but will probably work on 1.6+)
netDx uses the GeneMANIA algorithm to integrate patient networks and recommend patients by similarity (Mostafavi and Morris (2008). *Genome Biol* 9:Suppl 1). GeneMANIA is currently implemented in Java, making this interpreter a requirement for netDx.
The engine netDx uses to integrate patient networks and recommend patients by similarity is implemented in Java, making this interpreter a requirement for netDx.

At command line, run `java --version`. You should see output like this:
```
Expand All @@ -36,7 +43,7 @@ Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
If you don't see this kind of output, you may need to first [install java](https://java.com/en/).

#### Python (2.7 recommended)
netDx uses legacy Python scripts in creating the GeneMANIA database, so for now a Python interpreter is required to run netDx. Future versions of netDx will not have this requirement.
A legacy script to create the database of input patient networks was implemented in Python. For now, netDx requires Python to run but future versions of netDx will not have this requirement.

At command line, run `python --version`. You should see output like this:

Expand Down Expand Up @@ -70,22 +77,58 @@ $ R
```
Say `yes` to all dependencies that need to be installed.

#### Cytoscape (3.5.1+ recommended)
In order to generate and visualize patient similarity networks and enrichment maps from the data generated by netDx we will need Cytoscape. If Cytoscape is not located in your Applications directory, install the latest version from http://www.cytoscape.org.

With Cytoscape open, install the following Apps:
* AutoAnnotate v1.2 (http://apps.cytoscape.org/apps/autoannotate) *Earlier versions may not work.*
* Enrichment Map v3.0.0 (http://apps.cytoscape.org/apps/enrichmentmap) *Earlier versions may not work.*

## Install `netDx` and `netDx.examples`
This section assumes you have Java, Python, R and Bioconductor installed. From command-line, download the git repo for these packages and install them. In the code below, output from intermediate steps is omitted for clarity.

*Note: For now, R package dependencies must be separately installed using the install.packages() call as shown below. netDx will be submitted to CRAN following publication; thereafter, dependencies can be automatically installed with the call to install netDx.*

On Unix systems you may need one or more of these packages as these are dependencies for R packages:

```
sudo apt-get install zlib1g-dev libssl-dev libssh2-1-dev libcurl-devel
```

If you are on an RPM system you may be need to run this:
```
sudo apt-get install libcurl4-openssl-dev
```

Now we install the necessary R packages:
```
$ cd netDx-master/
$ R
> install.packages(c('devtools','curl'))
> install.packages(c("bigmemory","foreach","combinat","doParallel","ROCR","pracma","RColorBrewer","reshape2"))
> devtools::install_github("igraph/rigraph") # install from CRAN has a bug and can fail (31 Aug 2017).
> devtools::install_github('cytoscape/cytoscape-automation/for-scripters/R/r2cytoscape')
> devtools::install_github('BaderLab/Easycyrest/[email protected]')
> install.packages("netDx",type="source",repos=NULL)
> install.packages("netDx.examples",type="source",repos=NULL)
> install.packages("knitr") # needed to run examples
```
On Unix systems you may need to install the libraries below at command-line:
```
$ sudo apt-get install libssl-dev # for openssl & httr
$ sudo apt-get install xml2-config # for XML & r2cytoscape
```

```
>install.packages(c("openssl","httr","RJSONIO"))
>devtools::install_github('cytoscape/cytoscape-automation/for-scripters/R/r2cytoscape')
>devtools::install_github('BaderLab/Easycyrest/[email protected]')
```

Note: On Unix systems, installing `httr` requires a prior install of the `openssl` package. If the `openssl` install fails with a message like: `ERROR: configuration failed for package 'openssl'`, you will need to install openssl for your system. e.g. for Debian and Ubuntu, install `libssl-dev`


<a name="mblastoma"></a>
## Test functionality
Run the medulloblastoma vignette to make sure the netDx pipeline works from end to end.
Each vignette is in Sweave format (`.Rnw`) . To run these, you need to have both `netDx` and `netDx.examples` installed. You will also need to install the R package `knitr` to compile the Sweave file. If you have [Rstudio](https://www.rstudio.com/home/) installed (highly recommended), you should be able to open the `Rnw` file and click `Compile PDF`. Alternately, you may run the vignette through an interactive R session:
Expand All @@ -98,6 +141,20 @@ $ R
```
This should generate `Medulloblastoma.pdf` in the `examples/` directory.

## Run BreastCancer LumA example
This vignette is presented in the netDx manuscript. Here we start with 348 primary tumours from the Cancer Genome Atlas, and build a predictor for Luminal A subtype classification (The Cancer Genome Atlas (2012). *Nature.* **490**:61-70). This example illustrates feature selection using a simple design in which networks are scored out of 10 based on a single round of 10-fold cross validation. On a MacBook Air laptop (late 2014), this vignette takes ~1.5 hours to run to completion. You may speed it up by running it on a machine with more processors and changing the `numCores` variable in the vignette.

**We do not recommend running it on a Mac with less than 8Gb RAM. A Unix machine manages memory differently and may require as much as 32Gb RAM. If such a machine is not available, set `numCores=2L` in the `.Rnw` file before running.**

```
$ cd netDx/examples/
$ R
> require(knitr)
> knit2pdf("BreastCancer.Rnw")
```
**NOTE:** The vignette will generate a pdf file. All intermediate files will be stored in the `TCGA_BRCA/` subdirectory of the examples directory.

<a name="pdfissue"></a>
## Known issues with compiling pdfs

#### (Linux)
Expand All @@ -119,13 +176,11 @@ ln -s /Library/TeX/Distributions/.DefaultTeX/Contents/Programs/texbin /usr/texbi
Step 3. Now, in the terminal check the value of `echo $PATH`. Make sure that `/usr/texbin` is present. If it isn't present, then you need to add `/usr/texbin` to your PATH variable. This can be done by updating the `PATH` variable in `~/.bashrc`.
However, if you find yourself having to mess with the PATH variable, try reinstalling the [MacTex](http://tug.org/mactex/) package.

## Run BreastCancer LumA example
This vignette is presented in the netDx manuscript. Here we start with 348 primary tumours from the Cancer Genome Atlas, and build a predictor for Luminal A subtype classification (The Cancer Genome Atlas (2012). *Nature.* **490**:61-70). This example illustrates feature selection using a simple design in which networks are scored out of 10 based on a single round of 10-fold cross validation. On a MacBook Air laptop (late 2014), this vignette takes ~1.5 hours to run to completion. You may speed it up by running it on a machine with more processors and changing the `numCores` variable in the vignette. We do not recommend running it on a machine with less than 8Gb RAM.
<a name="brca"></a>

```
$ cd netDx/examples/
$ R
> require(knitr)
> knit2pdf("BreastCancer.Rnw")
```
**NOTE:** The vignette will generate a pdf file. All intermediate files will be stored in the `TCGA_BRCA/` subdirectory of the examples directory.
## Other examples
The `examples/` directory contains R notebooks (`.Rmd`) that teach basic functionality useful for any predictor building. These include plotting predictor results and running nested cross-validation, the design we recommend to use after feature design.

The R notebooks must be run from within Rstudio. Install [Rstudio](https://www.rstudio.com/products/rstudio/download/) if necessary.

We have also posted the results of running all the examples [here](http://netdx.org/index.php/examples/).
131 changes: 0 additions & 131 deletions README.txt

This file was deleted.

3 changes: 2 additions & 1 deletion examples/BreastCancer.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
\usepackage{caption}

\begin{document}
\SweaveOpts{concordance=TRUE}

\title{netDx use case \linebreak Integrate gene expression and CNV for \linebreak binary classification of breast tumour}
\author{Shraddha Pai}
Expand Down Expand Up @@ -29,7 +30,7 @@ The workflow is shown in Figure 1. The algorithm proceeds in two steps:

\begin{figure}[ht]
\begin{center}
\includegraphics[width=\textwidth]{tcga_brca.png}
\includegraphics[width=\textwidth]{images/tcga_brca.png}
\caption{netDx workflow for a binary tumour classifier from gene expression and CNV data. \newline A. Two sets of patient similarity networks are generated: the first based on correlation of gene expression in cellular pathways (magenta), and the second based on shared overlap of CNVs in cellular pathways (teal). Each datatype generates ~1,000-2,000 networks, and these are integrated into a single database by GeneMANIA. \newline B. Feature selection is separately carried out for the `LumA' class for the `other' class. A GeneMANIA query is run on the integrated database is queried 10 times; each time a different 9/10th of the training ``+'' samples is used as query. A network's score is the frequency with which GeneMANIA marks it as being informative. Networks scoring 9 or 10 out of 10 are feature selected. Before patient classification, two enriched databases are constructed (orange and grey cylinders); each contain feature selected nets and train as well as test samples. \newline C. Patient similarity to a class is ranked by running a query against the class database; this is done once per class. Test patients are assigned to the class for which they have the highest-ranking similarity.}
\end{center}
\end{figure}
Expand Down
3 changes: 2 additions & 1 deletion examples/Medulloblastoma.Rnw
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
\documentclass{article}

\begin{document}
\SweaveOpts{concordance=TRUE}

\title{netDx use case: 4-way classification: Medulloblastoma tumour subtype}
\author{Shraddha Pai}
Expand All @@ -15,7 +16,7 @@ The netDx workflow is shown in Figure 1. We use gene signatures identified by pr

\begin{figure}[ht]
\begin{center}
\includegraphics[width=\textwidth]{mblastoma.png}
\includegraphics[width=\textwidth]{images/mblastoma.png}
\caption{netDx workflow for a 4-way classification of medulloblastoma tumour from known gene signatures. \newline A. A GeneMANIA database is built for each subtype. The networks are at the gene-level, with a custom similarity metric (see Section 3); each subtype-specific database contains networks for the genes in the corresponding subtype signature. The networks and database contain all training and test patients. \newline B. A sample is classified by running four GeneMANIA queries, one per subtype. In each case, the query comprises of training samples for that subtype. The query results in subtype-specific similarity ranking. After such ranks have been obtained for all four subtypes, the sample is assigned to the class for which it has the highest rank.}
\end{center}
\end{figure}
Expand Down
Loading

0 comments on commit ff6e449

Please sign in to comment.