title | description | services | documentationcenter | tags | keywords |
---|---|---|---|---|---|
Read and Write Phoenix Data from a Spark cluster - Azure HDInsight | Microsoft Docs |
hdinsight |
azure-portal |
spark, Apache Phoenix |
Apache HBase data can be queried either with its low level API of scans, gets and puts or with a SQL syntax using Apache Phoenix. Phoenix is an API for HBase which uses a JDBC driver (rather than Hadoop MapReduce) to extend the HBase key-value store to enable features that make it similiar to a relational database. These features include adding a SQL query engine, metadata repository and an embedded JDBC driver. Phoenix was originially developed at Salesforce, and it was subsequently open-sourced as an Apache project. It is important to note that Phoenix is desinged to work only with HBase data.
Apache Spark can be used as a convenient and performant alternative way to query and modify data stored by HBase. This method of cross-cluster access is enabled by the use of the Spark-HBase Connector (also called the SHC
). See Using Spark to Query HBase for details on this approach.
- IMPORTANT As of this writing (June 2017) HDInsight does not support the open source Apache Spark plugin for Phoenix. You are advised to use the Spark-HBase connector to support querying HBase from Spark at this time.
- Using Spark to Query HBase
- Spark HBase Connector
- Phoenix Spark depenency list
- Apache Phoenix and HBase Past Present and Future of SQL over HBase
- New Features in Apache Phoenix
- Apache Spark Plugin for Apache Phoenix