Skip to content

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.

License

Notifications You must be signed in to change notification settings

rbarham/aws-glue-libs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

aws-glue-libs

This repository contains libraries used in the AWS Glue service. These libraries extend Apache Spark with additional data types and operations for ETL workflows. They are used in code generated by the AWS Glue service and can be used in scripts submitted with Glue jobs.

Content

  • awsglue -- This Python package includes the Python interfaces to the AWS Glue ETL library.

Running gluepyspark shell, gluesparksubmit and pytest locally

The Glue ETL jars are now available via the maven build system in a s3 backed maven repository. We use the copy-dependencies target in maven to get all the dependencies needed for glue locally.

Install apache maven from the following location: https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-common/apache-maven-3.6.0-bin.tar.gz

Install the spark distribution from the following location based on the glue version: Glue version 0.9: https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-0.9/spark-2.2.1-bin-hadoop2.7.tgz Glue version 1.0: https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-1.0/spark-2.4.3-bin-hadoop2.8.tgz

Export SPARK_HOME environment variable to extracted location of the above spark archive. Glue version 0.9: export SPARK_HOME=/home/$USER/spark-2.2.1-bin-hadoop2.7 Glue version 1.0: export SPARK_HOME=/home/$USER/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8

The gluepytest script assumes that the pytest module is installed and available in the PATH

Glue shell: ./bin/gluepyspark Glue submit: ./bin/gluesparksubmit pytest: ./bin/gluepytest

Licensing

The libraries in this repository licensed under the Amazon Software License (the "License"). They may not be used except in compliance with the License, a copy of which is included here in the LICENSE file.

About

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Shell 0.9%