Skip to content

Commit

Permalink
feat: add JWT support to PyHive (#1)
Browse files Browse the repository at this point in the history
* feat: add HTTP and HTTPS to hive (dropbox#385)

* feat: add https protocol

* support HTTP

* fix: make hive https py2 compat (dropbox#389)

* fix: make hive https py2 compat

* fix lint

* Update README.rst (dropbox#423)

* chore: rename Trino entry point (dropbox#428)

* Support for Presto decimals (dropbox#430)

* Support for Presto decimals

* lower

* Use str type for driver and name in HiveDialect (dropbox#450)

PyHive's HiveDialect usage of bytes for the name and driver fields is not the norm is causing issues upstream: apache/superset#22316
Even other dialects within PyHive use strings. SQLAlchemy does not strictly require a string, but all the stock dialects return a string, so I figure it is heavily implied.

I think the risk of breaking something upstream with this change is low (but it is there ofc). I figure in most cases we just make someone's `str(dialect.driver)` expression redundant.

Examples for some of the other stock sqlalchemy dialects (name and driver fields using str):
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/pysqlite.py#L501
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/base.py#L1891
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/base.py#L2383
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/mysqldb.py#L113
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/pymysql.py#L59

* Correcting Iterable import for python 3.10 (dropbox#451)

* changing drivers to support hive, presto and trino with sqlalchemy>=2.0 (dropbox#448)

* Revert "changing drivers to support hive, presto and trino with sqlalchemy>=2.0 (dropbox#448)" (dropbox#452)

This reverts commit b0206d3.

* Update __init__.py (dropbox#453)

dropbox@1c1da8b

dropbox@1f99552

* use pure-sasl with python 3.11 (dropbox#454)

* minimal changes for sqlalchemy 2.0 support (dropbox#457)

* update readme to reflect recent changes (dropbox#459)

* Update README.rst (dropbox#475)

* Update README.rst (dropbox#476)

* feat: JWT support

* Add CI to build package

---------

Co-authored-by: Daniel Vaz Gaspar <[email protected]>
Co-authored-by: Bogdan <[email protected]>
Co-authored-by: serenajiang <[email protected]>
Co-authored-by: Usiel Riedl <[email protected]>
Co-authored-by: Multazim Deshmukh <[email protected]>
Co-authored-by: nicholas-miles <[email protected]>
  • Loading branch information
7 people authored Aug 8, 2024
1 parent d6e7140 commit 32ee963
Show file tree
Hide file tree
Showing 23 changed files with 1,366 additions and 321 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ cover/
.cache/
*.iml
/scripts/.thrift_gen
.python-version
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
0.7.0a
======

- Add support for JWT authentication.
94 changes: 94 additions & 0 deletions Jenkinsfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
LIB_NAME = 'PyHive'
String currentVersion = ""


podTemplate(
imagePullSecrets: ['preset-pull'],
nodeUsageMode: 'NORMAL',
containers: [
containerTemplate(
alwaysPullImage: true,
name: 'ci',
image: 'preset/ci:latest',
ttyEnabled: true,
command: 'cat',
resourceRequestCpu: '100m',
resourceLimitCpu: '200m',
resourceRequestMemory: '1000Mi',
resourceLimitMemory: '2000Mi',
),
containerTemplate(
alwaysPullImage: true,
name: 'py-ci',
image: 'preset/python:3.8.9-ci',
ttyEnabled: true,
command: 'cat'
)
]
) {
node(POD_LABEL) {
container('py-ci') {
stage('Checkout') {
checkout scm
}

stage('Tests') {
sh(script: 'pip install -e . && pip install -r requirements-dev.txt', label: 'install dependencies')
parallel(
check: {
currentVersion = sh(
script: "python setup.py --version",
returnStdout: true,
label: 'Get current version'
).trim()
def retVal = sh(
script: "curl -I -f https://pypi.devops.preset.zone/${LIB_NAME}/${LIB_NAME}-${currentVersion}.tar.gz",
returnStatus: true,
label: 'Check for existing tarball'
)
// If the thing exists, we should bail as we don't want to overwrite
if (retVal == 0) {
error("Version ${currentVersion} of ${LIB_NAME} already exists! Version bump required.")
}
}
)
}
}

container('py-ci') {
stage('Package Release') {
if (env.BRANCH_NAME.startsWith("PR-")) {
def shortGitRev = sh(
returnStdout: true,
script: 'git rev-parse --short HEAD'
).trim()
def pullRequestVersion = "${currentVersion}+${env.BRANCH_NAME}.${shortGitRev}"
sh(script:"sed -i \'s/version = ${currentVersion}/version = ${pullRequestVersion}/g\' setup.cfg", label: 'Changing version for PR')
sh(script:"echo PR version: ${pullRequestVersion}", label: 'PR Release candidate version')
}
sh(script: 'python setup.py sdist --formats=gztar', label: 'Bundling release')
sh(script: "mkdir -p dist/${LIB_NAME} && mv dist/*.gz dist/${LIB_NAME}", label: 'Setup release folder')
}
}

container('ci') {
stage('Upload Release') {
withCredentials([
[
$class : 'AmazonWebServicesCredentialsBinding',
credentialsId : 'ci-user',
accessKeyVariable: 'AWS_ACCESS_KEY_ID',
secretKeyVariable: 'AWS_SECRET_ACCESS_KEY',
]
]) {
if ((env.BRANCH_NAME == 'master') || (env.BRANCH_NAME.startsWith("PR-"))) {
sh(script: "aws s3 sync ./dist s3://preset-pypi", label: "Uploading to s3")
}
else {
echo "Skipping upload as this isn't master..."
}
}
}
}
}
}
93 changes: 72 additions & 21 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,24 +1,31 @@
.. image:: https://travis-ci.org/dropbox/PyHive.svg?branch=master
:target: https://travis-ci.org/dropbox/PyHive
.. image:: https://img.shields.io/codecov/c/github/dropbox/PyHive.svg
========================================================
PyHive project has been donated to Apache Kyuubi
========================================================

You can follow it's development and report any issues you are experiencing here: https://github.com/apache/kyuubi/tree/master/python/pyhive



Legacy notes / instructions
===========================

======
PyHive
======
**********


PyHive is a collection of Python `DB-API <http://www.python.org/dev/peps/pep-0249/>`_ and
`SQLAlchemy <http://www.sqlalchemy.org/>`_ interfaces for `Presto <http://prestodb.io/>`_ and
`Hive <http://hive.apache.org/>`_.
`SQLAlchemy <http://www.sqlalchemy.org/>`_ interfaces for `Presto <http://prestodb.io/>`_ ,
`Hive <http://hive.apache.org/>`_ and `Trino <https://trino.io/>`_.

Usage
=====
**********

DB-API
------
.. code-block:: python
from pyhive import presto # or import hive or import trino
cursor = presto.connect('localhost').cursor()
cursor = presto.connect('localhost').cursor() # or use hive.connect or use trino.connect
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10')
print cursor.fetchone()
print cursor.fetchall()
Expand Down Expand Up @@ -54,7 +61,7 @@ In Python 3.7 `async` became a keyword; you can use `async_` instead:
SQLAlchemy
----------
First install this package to register it with SQLAlchemy (see ``setup.py``).
First install this package to register it with SQLAlchemy, see ``entry_points`` in ``setup.py``.

.. code-block:: python
Expand All @@ -64,12 +71,33 @@ First install this package to register it with SQLAlchemy (see ``setup.py``).
# Presto
engine = create_engine('presto://localhost:8080/hive/default')
# Trino
engine = create_engine('trino://localhost:8080/hive/default')
engine = create_engine('trino+pyhive://localhost:8080/hive/default')
# Hive
engine = create_engine('hive://localhost:10000/default')
# SQLAlchemy < 2.0
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
print select([func.count('*')], from_obj=logs).scalar()
# Hive + HTTPS + LDAP or basic Auth
engine = create_engine('hive+https://username:password@localhost:10000/')
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
print select([func.count('*')], from_obj=logs).scalar()
# SQLAlchemy >= 2.0
metadata_obj = MetaData()
books = Table("books", metadata_obj, Column("id", Integer), Column("title", String), Column("primary_author", String))
metadata_obj.create_all(engine)
inspector = inspect(engine)
inspector.get_columns('books')
with engine.connect() as con:
data = [{ "id": 1, "title": "The Hobbit", "primary_author": "Tolkien" },
{ "id": 2, "title": "The Silmarillion", "primary_author": "Tolkien" }]
con.execute(books.insert(), data[0])
result = con.execute(text("select * from books"))
print(result.fetchall())
Note: query generation functionality is not exhaustive or fully tested, but there should be no
problem with raw SQL.

Expand All @@ -89,7 +117,7 @@ Passing session configuration
'session_props': {'query_max_run_time': '1234m'}}
)
create_engine(
'trino://user@host:443/hive',
'trino+pyhive://user@host:443/hive',
connect_args={'protocol': 'https',
'session_props': {'query_max_run_time': '1234m'}}
)
Expand All @@ -104,27 +132,30 @@ Passing session configuration
)
Requirements
============
************

Install using

- ``pip install 'pyhive[hive]'`` for the Hive interface and
- ``pip install 'pyhive[presto]'`` for the Presto interface.
- ``pip install 'pyhive[hive]'`` or ``pip install 'pyhive[hive_pure_sasl]'`` for the Hive interface
- ``pip install 'pyhive[presto]'`` for the Presto interface
- ``pip install 'pyhive[trino]'`` for the Trino interface

Note: ``'pyhive[hive]'`` extras uses `sasl <https://pypi.org/project/sasl/>`_ that doesn't support Python 3.11, See `github issue <https://github.com/cloudera/python-sasl/issues/30>`_.
Hence PyHive also supports `pure-sasl <https://pypi.org/project/pure-sasl/>`_ via additional extras ``'pyhive[hive_pure_sasl]'`` which support Python 3.11.

PyHive works with

- Python 2.7 / Python 3
- For Presto: Presto install
- For Trino: Trino install
- For Presto: `Presto installation <https://prestodb.io/docs/current/installation.html>`_
- For Trino: `Trino installation <https://trino.io/docs/current/installation.html>`_
- For Hive: `HiveServer2 <https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2>`_ daemon

Changelog
=========
*********
See https://github.com/dropbox/PyHive/releases.

Contributing
============
************
- Please fill out the Dropbox Contributor License Agreement at https://opensource.dropbox.com/cla/ and note this in your pull request.
- Changes must come with tests, with the exception of trivial things like fixing comments. See .travis.yml for the test environment setup.
- Notes on project scope:
Expand All @@ -134,8 +165,28 @@ Contributing
- We prefer having a small number of generic features over a large number of specialized, inflexible features.
For example, the Presto code takes an arbitrary ``requests_session`` argument for customizing HTTP calls, as opposed to having a separate parameter/branch for each ``requests`` option.

Tips for test environment setup
****************************************
You can setup test environment by following ``.travis.yaml`` in this repository. It uses `Cloudera's CDH 5 <https://docs.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_download_510.html>`_ which requires username and password for download.
It may not be feasible for everyone to get those credentials. Hence below are alternative instructions to setup test environment.

You can clone `this repository <https://github.com/big-data-europe/docker-hive/blob/master/docker-compose.yml>`_ which has Docker Compose setup for Presto and Hive.
You can add below lines to its docker-compose.yaml to start Trino in same environment::
trino:
image: trinodb/trino:351
ports:
- "18080:18080"
volumes:
- ./trino:/etc/trino

Note: ``./trino`` for docker volume defined above is `trino config from PyHive repository <https://github.com/dropbox/PyHive/tree/master/scripts/travis-conf/trino>`_

Then run::
docker-compose up -d

Testing
=======
*******
.. image:: https://travis-ci.org/dropbox/PyHive.svg
:target: https://travis-ci.org/dropbox/PyHive
.. image:: http://codecov.io/github/dropbox/PyHive/coverage.svg?branch=master
Expand All @@ -154,7 +205,7 @@ WARNING: This drops/creates tables named ``one_row``, ``one_row_complex``, and `
database called ``pyhive_test_database``.

Updating TCLIService
====================
********************

The TCLIService module is autogenerated using a ``TCLIService.thrift`` file. To update it, the
``generate.py`` file can be used: ``python generate.py <TCLIServiceURL>``. When left blank, the
Expand Down
2 changes: 2 additions & 0 deletions dev_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ pytest-timeout==1.2.0
requests>=1.0.0
requests_kerberos>=0.12.0
sasl>=0.2.1
pure-sasl>=0.6.2
kerberos>=1.3.0
thrift>=0.10.0
#thrift_sasl>=0.1.0
git+https://github.com/cloudera/thrift_sasl # Using master branch in order to get Python 3 SASL patches
2 changes: 1 addition & 1 deletion pyhive/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from __future__ import absolute_import
from __future__ import unicode_literals
__version__ = '0.6.3'
__version__ = '0.7.0a'
7 changes: 6 additions & 1 deletion pyhive/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@
from future.utils import with_metaclass
from itertools import islice

try:
from collections.abc import Iterable
except ImportError:
from collections import Iterable


class DBAPICursor(with_metaclass(abc.ABCMeta, object)):
"""Base class for some common DB-API logic"""
Expand Down Expand Up @@ -245,7 +250,7 @@ def escape_item(self, item):
return self.escape_number(item)
elif isinstance(item, basestring):
return self.escape_string(item)
elif isinstance(item, collections.Iterable):
elif isinstance(item, Iterable):
return self.escape_sequence(item)
elif isinstance(item, datetime.datetime):
return self.escape_datetime(item, self._DATETIME_FORMAT)
Expand Down
Loading

0 comments on commit 32ee963

Please sign in to comment.