title	description	services	documentationcenter	author	manager	editor	ms.assetid	ms.service	ms.custom	ms.devlang	ms.topic	ms.tgt_pltfrm	ms.workload	ms.date	ms.author
Compute context options for R Server on HDInsight - Azure \| Microsoft Docs	Learn about the different compute context options available to users with R Server on HDInsight	HDInsight		jeffstokes72	jhubbard	cgronlun	0deb0b1c-4094-459b-94fc-ec9b774c1f8a	HDInsight	hdinsightactive	R	article	na	data-services	02/28/2017	jeffstok

Compute context options for R Server on HDInsight

Microsoft R Server on Azure HDInsight provides the latest capabilities for R-based analytics. It uses data that's stored in HDFS in a container in your Azure Blob storage account, a Data Lake store or the local Linux file system. Since R Server is built on open source R, the R-based applications you build can leverage any of the 8000+ open source R packages. They can also leverage the routines in ScaleR, Microsoft’s big data analytics package that's included with R Server.

The edge node of a cluster provides a convenient place to connect to the cluster and run your R scripts. With an edge node, you have the option of running ScaleR’s parallelized distributed functions across the cores of the edge node server. You also have the option to run them across the nodes of the cluster by using ScaleR’s Hadoop Map Reduce or Spark compute contexts.

Compute contexts for an edge node

In general, an R script that's run in R Server on the edge node runs within the R interpreter on that node. The exceptions are those steps that call a ScaleR function. The ScaleR calls run in a compute environment that's determined by how you set the ScaleR compute context. When you run your R script from an edge node, the possible values of the compute context are local sequential (‘local’), local parallel (‘localpar’), Map Reduce, and Spark.

The ‘local’ and ‘localpar’ options differ only in how rxExec calls are executed. They both execute other rx-function calls in a parallel manner across all available cores unless specified otherwise through use of the ScaleR numCoresToUse option, e.g. rxOptions(numCoresToUse=6). The following summarizes the various compute context options

Compute context	How to set	Execution context
Local sequential	rxSetComputeContext(‘local’)	Parallelized execution across the cores of the edge node server, except for rxExec calls which are executed serially
Local parallel	rxSetComputeContext(‘localpar’)	Parallelized execution across the cores of the edge node server
Spark	RxSpark()	Parallelized distributed execution via Spark across the nodes of the HDI cluster
Map Reduce	RxHadoopMR()	Parallelized distributed execution via Map Reduce across the nodes of the HDI cluster

Assuming that you’d like parallelized execution for the purposes of performance, then there are three options. Which option you choose depends on the nature of your analytics work, and the size and location of your data.

Guidelines for deciding on a compute context

Currently, there is no formula that tells you which compute context to use. There are, however, some guiding principles that can help you make the right choice, or at least help you narrow down your choices before you run a benchmark. These guiding principles include:

The local Linux file system is faster than HDFS.
Repeated analyses are faster if the data is local, and if it's in XDF.
It's preferable to stream small amounts of data from a text data source; if the amount of data is larger, convert it to XDF prior to analysis.
The overhead of copying or streaming the data to the edge node for analysis becomes unmanageable for very large amounts of data.
Spark is faster than Map Reduce for analysis in Hadoop by running compute at in-memory speeds using Spark RDDs.
The Spark compute context leverages the Spark DAG for distributing work across the nodes of the cluster, and provides a number of options for persisting those tasks. Because spawning these tasks is an expensive process, we can see performance increases over Map Reduce for many types of tasks.
Spark runs under YARN for resource management, providing greater flexibility on selecting the number of nodes on which to run tasks.

Given these principles, some general rules of thumb for selecting a compute context are:

Local

If the amount of data to analyze is small and does not require repeated analysis, then stream it directly into the analysis routine and use 'local' or 'localpar'.
If the amount of data to analyze is small or medium-sized and requires repeated analysis, then copy it to the local file system, import it to XDF, and analyze it via 'local' or 'localpar'.

Hadoop Spark

If the amount of data to analyze is large, then then import it to a Spark DataFrame using RxHiveData or RxParquetData, or to XDF in HDFS (unless storage is an issue), and analyze it via ‘Spark’.
SparkR provides access to native Spark capabilities, including a growing number of predictive analytics algorithms available in Spark.

Hadoop Map Reduce

Use only if you encounter an insurmountable problem with use of the Spark compute context since generally it will be slower.

Inline help on rxSetComputeContext

For more information and examples of ScaleR compute contexts, see the inline help in R on the rxSetComputeContext method, for example:

> ?rxSetComputeContext

You can also refer to the “ScaleR Distributed Computing Guide” that's available from the R Server MSDN library.

Next steps

In this article, you learned how to create a new HDInsight cluster that includes R Server. You also learned the basics of using the R console from an SSH session. Now you can read the following articles to discover other ways of working with R Server on HDInsight:

Overview of R Server for Hadoop
Get started with R Server for Hadoop
Add RStudio Server to HDInsight (if not added during cluster creation)
Azure Storage options for R Server on HDInsight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hdinsight-hadoop-r-server-compute-contexts.md

hdinsight-hadoop-r-server-compute-contexts.md

Compute context options for R Server on HDInsight

Compute contexts for an edge node

Guidelines for deciding on a compute context

Local

Hadoop Spark

Hadoop Map Reduce

Inline help on rxSetComputeContext

Next steps

Files

hdinsight-hadoop-r-server-compute-contexts.md

Latest commit

History

hdinsight-hadoop-r-server-compute-contexts.md

File metadata and controls

Compute context options for R Server on HDInsight

Compute contexts for an edge node

Guidelines for deciding on a compute context

Local

Hadoop Spark

Hadoop Map Reduce

Inline help on rxSetComputeContext

Next steps