Skip to content

Latest commit

 

History

History
128 lines (83 loc) · 7.26 KB

CONTRIBUTING.md

File metadata and controls

128 lines (83 loc) · 7.26 KB

Contributing to dbt-spark

  1. About this document
  2. Getting the code
  3. Running dbt-spark in development
  4. Testing
  5. Updating Docs
  6. Submitting a Pull Request

About this document

This document is a guide intended for folks interested in contributing to dbt-spark. Below, we document the process by which members of the community should create issues and submit pull requests (PRs) in this repository. It is not intended as a guide for using dbt-spark, and it assumes a certain level of familiarity with Python concepts such as virtualenvs, pip, Python modules, and so on. This guide assumes you are using macOS or Linux and are comfortable with the command line.

For those wishing to contribute we highly suggest reading the dbt-core's contribution guide if you haven't already. Almost all of the information there is applicable to contributing here, too!

Signing the CLA

Please note that all contributors to dbt-spark must sign the Contributor License Agreement to have their Pull Request merged into an dbt-spark codebase. If you are unable to sign the CLA, then the dbt-spark maintainers will unfortunately be unable to merge your Pull Request. You are, however, welcome to open issues and comment on existing ones.

Getting the code

You will need git in order to download and modify the dbt-spark source code. You can find directions here on how to install git.

External contributors

If you are not a member of the dbt-labs GitHub organization, you can contribute to dbt-spark by forking the dbt-spark repository. For a detailed overview on forking, check out the GitHub docs on forking. In short, you will need to:

  1. fork the dbt-spark repository
  2. clone your fork locally
  3. check out a new branch for your proposed changes
  4. push changes to your fork
  5. open a pull request against dbt-labs/dbt-spark from your forked repository

dbt Labs contributors

If you are a member of the dbt Labs GitHub organization, you will have push access to the dbt-spark repo. Rather than forking dbt-spark to make your changes, just clone the repository, check out a new branch, and push directly to that branch.

Running dbt-spark in development

Installation

First make sure that you set up your virtualenv as described in Setting up an environment. Ensure you have the latest version of pip installed with pip install --upgrade pip. Next, install dbt-spark latest dependencies:

pip install -e . -r dev-requirements.txt

When dbt-spark is installed this way, any changes you make to the dbt-spark source code will be reflected immediately in your next dbt-spark run.

To confirm you have correct version of dbt-core installed please run dbt --version and which dbt.

Testing

Initial Setup

dbt-spark uses test credentials specified in a test.env file in the root of the repository. This test.env file is git-ignored, but please be extra careful to never check in credentials or other sensitive information when developing. To create your test.env file, copy the provided example file, then supply your relevant credentials.

cp test.env.example test.env
$EDITOR test.env

Test commands

There are a few methods for running tests locally.

dagger

To run functional tests we rely on dagger. This launches a virtual container or containers to test against.

pip install -r dagger/requirements.txt
python dagger/run_dbt_spark_tests.py --profile databricks_sql_endpoint --test-path tests/functional/adapter/test_basic.py::TestSimpleMaterializationsSpark::test_base

--profile: required, this is the kind of spark connection to test against

options:

  • "apache_spark"
  • "spark_session"
  • "spark_http_odbc"
  • "databricks_sql_endpoint"
  • "databricks_cluster"
  • "databricks_http_cluster"

--test-path: optional, this is the path to the test file you want to run. If not specified, all tests will be run.

pytest

Finally, you can also run a specific test or group of tests using pytest directly (if you have all the dependencies set up on your machine). With a Python virtualenv active and dev dependencies installed you can do things like:

# run all functional tests
python -m pytest --profile databricks_sql_endpoint tests/functional/
# run specific functional tests
python -m pytest --profile databricks_sql_endpoint tests/functional/adapter/test_basic.py
# run all unit tests in a file
python -m pytest tests/unit/test_adapter.py
# run a specific unit test
python -m pytest test/unit/test_adapter.py::TestSparkAdapter::test_profile_with_database

Updating Docs

Many changes will require and update to the dbt-spark docs here are some useful resources.

  • Docs are here.
  • The docs repo for making changes is located here.
  • The changes made are likely to impact one or both of Spark Profile, or Saprk Configs.
  • We ask every community member who makes a user-facing change to open an issue or PR regarding doc changes.

Adding CHANGELOG Entry

We use changie to generate CHANGELOG entries. Note: Do not edit the CHANGELOG.md directly. Your modifications will be lost.

Follow the steps to install changie for your system.

Once changie is installed and your PR is created, simply run changie new and changie will walk you through the process of creating a changelog entry. Commit the file that's created and your changelog entry is complete!

You don't need to worry about which dbt-spark version your change will go into. Just create the changelog entry with changie, and open your PR against the main branch. All merged changes will be included in the next minor version of dbt-spark. The Core maintainers may choose to "backport" specific changes in order to patch older minor versions. In that case, a maintainer will take care of that backport after merging your PR, before releasing the new version of dbt-spark.

Submitting a Pull Request

dbt Labs provides a CI environment to test changes to the dbt-spark adapter, and periodic checks against the development version of dbt-core through Github Actions.

A dbt-spark maintainer will review your PR. They may suggest code revision for style or clarity, or request that you add unit or functional test(s). These are good things! We believe that, with a little bit of help, anyone can contribute high-quality code.

Once all requests and answers have been answered the dbt-spark maintainer can trigger CI testing.

Once all tests are passing and your PR has been approved, a dbt-spark maintainer will merge your changes into the active development branch. And that's it! Happy developing 🎉