Skip to content

Latest commit

 

History

History
29 lines (20 loc) · 1.99 KB

hdinsight-phoenix-read-write-spark.md

File metadata and controls

29 lines (20 loc) · 1.99 KB
title description services documentationcenter tags keywords
Read and Write Phoenix Data from a Spark cluster - Azure HDInsight | Microsoft Docs
hdinsight
azure-portal
spark, Apache Phoenix

Read and Write Phoenix Data from a Spark cluster

Apache HBase data can be queried either with its low level API of scans, gets and puts or with a SQL syntax using Apache Phoenix. Phoenix is an API for HBase which uses a JDBC driver (rather than Hadoop MapReduce) to extend the HBase key-value store to enable features that make it similiar to a relational database. These features include adding a SQL query engine, metadata repository and an embedded JDBC driver. Phoenix was originially developed at Salesforce, and it was subsequently open-sourced as an Apache project. It is important to note that Phoenix is desinged to work only with HBase data.

Apache Spark can be used as a convenient and performant alternative way to query and modify data stored by HBase. This method of cross-cluster access is enabled by the use of the Spark-HBase Connector (also called the SHC). See Using Spark to Query HBase for details on this approach.

  • IMPORTANT As of this writing (June 2017) HDInsight does not support the open source Apache Spark plugin for Phoenix. You are advised to use the Spark-HBase connector to support querying HBase from Spark at this time.

See Also