Welcome! There are many ways to contribute, including submitting bug reports, improving documentation, submitting feature requests, reviewing new submissions, or contributing code that can be incorporated into the project.
We should keep close to these items during development:
- Some companies still use old Spark versions, like 2.3.1. So it is required to keep compatibility if possible, e.g. adding branches for different Spark versions.
- Different users uses onETL in different ways - some uses only DB connectors, some only files. Connector-specific dependencies should be optional.
- Instead of creating classes with a lot of different options, prefer splitting them into smaller classes, e.g. options class, context manager, etc, and using composition.
Please follow instruction.
If you are not a member of a development team building onETL, you should create a fork before making any changes.
Please follow instruction.
Open terminal and run these commands:
git clone [email protected]:myuser/onetl.git -b develop
cd onetl
Create virtualenv and install dependencies:
python -m venv venv
source venv/bin/activate
pip install -U wheel
pip install -U pip setuptools
pip install -U \
-r requirements/core.txt \
-r requirements/ftp.txt \
-r requirements/hdfs.txt \
-r requirements/kerberos.txt \
-r requirements/s3.txt \
-r requirements/sftp.txt \
-r requirements/webdav.txt \
-r requirements/dev.txt \
-r requirements/docs.txt \
-r requirements/tests/base.txt \
-r requirements/tests/clickhouse.txt \
-r requirements/tests/kafka.txt \
-r requirements/tests/mongodb.txt \
-r requirements/tests/mssql.txt \
-r requirements/tests/mysql.txt \
-r requirements/tests/postgres.txt \
-r requirements/tests/oracle.txt \
-r requirements/tests/pydantic-2.txt \
-r requirements/tests/spark-3.5.4.txt
# TODO: remove after https://github.com/zqmillet/sphinx-plantuml/pull/4
pip install sphinx-plantuml --no-deps
Install pre-commit hooks:
pre-commit install --install-hooks
Test pre-commit hooks run:
pre-commit run
Build image for running tests:
docker-compose build
Start all containers with dependencies:
docker-compose up -d
You can run limited set of dependencies:
docker-compose up -d mongodb
Run tests:
docker-compose run --rm onetl ./run_tests.sh
You can pass additional arguments, they will be passed to pytest:
docker-compose run --rm onetl ./run_tests.sh -m mongodb -lsx -vvvv --log-cli-level=INFO
You can run interactive bash session and use it:
docker-compose run --rm onetl bash
./run_tests.sh -m mongodb -lsx -vvvv --log-cli-level=INFO
See logs of test container:
docker-compose logs -f onetl
Stop all containers and remove created volumes:
docker-compose down -v
Warning
To run HDFS tests locally you should add the following line to your /etc/hosts
(file path depends on OS):
# HDFS server returns container hostname as connection address, causing error in DNS resolution
127.0.0.1 hdfs
Note
To run Oracle tests you need to install Oracle instantclient,
and pass its path to ONETL_ORA_CLIENT_PATH
and LD_LIBRARY_PATH
environment variables,
e.g. ONETL_ORA_CLIENT_PATH=/path/to/client64/lib
.
It may also require to add the same path into LD_LIBRARY_PATH
environment variable
Note
To run Greenplum tests, you should:
Download VMware Greenplum connector for Spark
Either move it to
~/.ivy2/jars/
, or pass file path toCLASSPATH
Set environment variable
ONETL_GP_PACKAGE_VERSION=local
.On Linux, you may have to set environment variable
SPARK_EXTERNAL_IP
to IP ofonetl_onetl
network gateway:export SPARK_EXTERNAL_IP=$(docker network inspect onetl_onetl --format '{{ (index .IPAM.Config 0).Gateway }}')
This is because in some cases Spark does not properly detect hsot machine IP address, so Greenplum segments cannot connect to Spark executors.
Start all containers with dependencies:
docker-compose up -d
You can run limited set of dependencies:
docker-compose up -d mongodb
Load environment variables with connection properties:
source .env.local
Run tests:
./run_tests.sh
You can pass additional arguments, they will be passed to pytest:
./run_tests.sh -m mongodb -lsx -vvvv --log-cli-level=INFO
Stop all containers and remove created volumes:
docker-compose down -v
Build documentation using Sphinx:
cd docs
make html
Then open in browser docs/_build/index.html
.
Please create a new GitHub issue for any significant changes and enhancements that you wish to make. Provide the feature you would like to see, why you need it, and how it will work. Discuss your ideas transparently and get community feedback before proceeding.
Significant Changes that you wish to contribute to the project should be discussed first in a GitHub issue that clearly outlines the changes and benefits of the feature.
Small Changes can directly be crafted and submitted to the GitHub Repository as a Pull Request.
Commit your changes:
git commit -m "Commit message"
git push
Then open Github interface and create pull request. Please follow guide from PR body template.
After pull request is created, it get a corresponding number, e.g. 123 (pr_number
).
onETL
uses towncrier
for changelog management.
To submit a change note about your PR, add a text file into the docs/changelog/next_release folder. It should contain an explanation of what applying this PR will change in the way end-users interact with the project. One sentence is usually enough but feel free to add as many details as you feel necessary for the users to understand what it means.
Use the past tense for the text in your fragment because, combined with others, it will be a part of the "news digest" telling the readers what changed in a specific version of the library since the previous version.
You should also use
reStructuredText syntax for highlighting code (inline or block),
linking parts of the docs or external sites.
If you wish to sign your change, feel free to add -- by
:user:`github-username`
at the end (replace github-username
with your own!).
Finally, name your file following the convention that Towncrier
understands: it should start with the number of an issue or a
PR followed by a dot, then add a patch type, like feature
,
doc
, misc
etc., and add .rst
as a suffix. If you
need to add more than one fragment, you may add an optional
sequence number (delimited with another period) between the type
and the suffix.
In general the name will follow <pr_number>.<category>.rst
pattern,
where the categories are:
feature
: Any new featurebugfix
: A bug fiximprovement
: An improvementdoc
: A change to the documentationdependency
: Dependency-related changesmisc
: Changes internal to the repo like CI, test and build changes
A pull request may have more than one of these components, for example a code change may introduce a new feature that deprecates an old feature, in which case two fragments should be added. It is not necessary to make a separate documentation fragment for documentation changes accompanying the relevant code changes.
Added a ``:github:user:`` role to Sphinx config -- by :github:user:`someuser`
Fixed behavior of ``WebDAV`` connector -- by :github:user:`someuser`
Added support of ``timeout`` in ``S3`` connector
-- by :github:user:`someuser`, :github:user:`anotheruser` and :github:user:`otheruser`
Tip
See pyproject.toml for all available categories
(tool.towncrier.type
).
Just add ci:skip-changelog
label to pull request.
Before making a release from the develop
branch, follow these steps:
- Checkout to
develop
branch and update it to the actual state
git checkout develop
git pull -p
- Backup
NEXT_RELEASE.rst
cp "docs/changelog/NEXT_RELEASE.rst" "docs/changelog/temp_NEXT_RELEASE.rst"
- Build the Release notes with Towncrier
VERSION=$(cat onetl/VERSION)
towncrier build "--version=${VERSION}" --yes
- Change file with changelog to release version number
mv docs/changelog/NEXT_RELEASE.rst "docs/changelog/${VERSION}.rst"
- Remove content above the version number heading in the
${VERSION}.rst
file
awk '!/^.*towncrier release notes start/' "docs/changelog/${VERSION}.rst" > temp && mv temp "docs/changelog/${VERSION}.rst"
- Update Changelog Index
awk -v version=${VERSION} '/DRAFT/{print;print " " version;next}1' docs/changelog/index.rst > temp && mv temp docs/changelog/index.rst
- Restore
NEXT_RELEASE.rst
file from backup
mv "docs/changelog/temp_NEXT_RELEASE.rst" "docs/changelog/NEXT_RELEASE.rst"
- Commit and push changes to
develop
branch
git add .
git commit -m "Prepare for release ${VERSION}"
git push
- Merge
develop
branch tomaster
, WITHOUT squashing
git checkout master
git pull
git merge develop
git push
- Add git tag to the latest commit in
master
branch
git tag "$VERSION"
git push origin "$VERSION"
- Update version in
develop
branch after release:
git checkout develop
NEXT_VERSION=$(echo "$VERSION" | awk -F. '/[0-9]+\./{$NF++;print}' OFS=.)
echo "$NEXT_VERSION" > onetl/VERSION
git add .
git commit -m "Bump version"
git push