This page is for anybody who wishes to contribute code to Data Prepper. Welcome!
First, please read our contribution guide for more information on how to contribute to Data Prepper.
Building Data Prepper requires JDK 11 or 17. The Data Prepper Gradle build runs in a Java 11 or 17 JVM, but uses Gradle toolchains to compile the Java code using Java 11. If you have a JDK 11 installed locally, Gradle will use your installed JDK 11. If you do not, Gradle will install JDK 11.
All main source code builds on JDK 11, so it must be compatible with Java 11. The test code (unit and integration tests) runs using JDK 11.
The assemble task will build the Jar files and create a runnable distribution without running the integration tests. If you are just looking to build Data Prepper from source, this build is faster than running the integration test suite.
To build the project from source, run the following command from the project root:
./gradlew assemble
Running the build command will assemble the Jar files needed for running DataPrepper. It will also run the integration test suite.
To build, run the following command from the project root:
./gradlew build
Before running Data Prepper, check that configuration files (see configuration docs for more
information) have been put in the respective folders under Data Prepper home directory. When building from source,
Data Prepper home directory is at release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64
($VERSION is the current version as defined in gradle.properties). The configuration files
should be put in the following folders:
data-prepper-config.yaml
inconfig/
folderpipelines.yaml
inpipelines/
folder
Go to home directory:
cd release/archives/linux/build/install/opensearch-data-prepper-$VERSION-linux-x64
Data Prepper can then be run with the following commands:
bin/data-prepper
You can also supply your own pipeline configuration file path followed by the server configuration file path, but the support for this method will be dropped in a future release. Example:
bin/data-prepper pipelines.yaml data-prepper-config.yaml
Additionally, Log4j 2 configuration file is read from config/log4j2.properties
in the application's home directory.
In some cases, you may wish to build a local Docker image and run it. This is useful if you are making a change to the Docker image, are looking to run a bleeding-edge Docker image, or are needing a custom-built Docker image of Data Prepper.
To build the Docker image, run:
./gradlew clean :release:docker:docker
If successful, the Docker image will be available locally.
The repository is opensearch-data-prepper
and the tag is
the current version as defined in gradle.properties.
You can run the following command in Linux environments to see your Data Prepper Docker images:
docker images | grep opensearch-data-prepper
The results will look somewhat like the following:
opensearch-data-prepper 2.0.0-SNAPSHOT 3e81ef26250c 23 hours ago 566MB
If you build a local Docker image, you can run it using a variation on the following command. You may wish to change the ports you map depending on your specific pipeline configuration.
docker run \
-p 21890:21890 \
-v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml \
-v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml \
opensearch-data-prepper:2.0.0-SNAPSHOT
When you are ready to contribute a change to Data Prepper, please create a GitHub Pull Request (PR). Your PR should target main
.
The Data Prepper maintainers will review your PR and merge it once it is approved.
Some changes containing bug fixes or security fixes may be eligible for a patch release. If you believe your change should be a patch release, please see Backporting
The Data Prepper maintainers use the main
branch for the next upcoming release (major or minor).
Near the time of the next release, we create a release branch for that upcoming
release (e.g. 1.2
). We perform our release builds from this branch. Any patch
releases also build from that release branch.
When you create a PR which targets main
and need this change as a patch to a previous version
of Data Prepper, use the auto backport GitHub Action. All you need to do is add the label
backport <version>
to your PR which is targeting main
. After the PR is merged, the GitHub
Action will create a new PR to cherry-pick those changes into the <version>
branch.
A Data Prepper maintainer will need to approve and merge the backported code into the target branch.
The auto-generated PR will be on a branch named backport/backport-<original PR number>-to-<version>
.
Data Prepper supports patch releases only on the latest version (e.g. 2.1) and on the last version for the previous major release (e.g. 1.4 after 2.0 has been released). These releases are only for bug fixes or security fixes. Please use backports only for bug and security fixes and only targeting candidate releases. You can ask about backporting in your PR or by creating a GitHub issue to request that a previous change be backported.
Documentation is very important for users of Data Prepper and contributors. We are using the following conventions for documentation.
- Document features in markdown. Plugins should have detailed documentation in a
README.md
file in the plugin project directory. Documentation for all of Data Prepper should be in the docs directory. - Provide Javadocs for all public classes, methods, and fields. Plugins need not follow this guidance since their classes are generally not exposed.
- Avoid commenting within code, unless it is required to understand that code.
For the most part, we use common Java conventions. Here are a few things to keep in mind.
- Use descriptive names for classes, methods, fields, and variables.
- Avoid abbreviations unless they are widely accepted
- Use final on all variables which are not reassigned
- Wildcard imports are not allowed.
- Static imports are preferred over qualified imports when using static methods
- Prefer creating non-static methods whenever possible. Static methods should generally be avoid as they are often used as a shortcut. Sometimes static methods are the best solution such as when using a builder.
- Public utility or “common” classes are not permitted.
- They are fine in test code
- They are fine if package protected
- Use Optional for return values if the value may not be present. This should be preferred to returning null.
- Do not create checked exceptions, and do not throw checked exceptions from public methods whenever possible. In general, if you call a method with a checked exception, you should wrap that exception into an unchecked exception.
- Throwing checked exceptions from private methods is acceptable.
Please use the following formatting guidelines:
- Java indent is 4 spaces. No tabs.
- Maximum line width is 140 characters
- We place opening braces at the end of the line, rather than on its own line
The official formatting rules for this project are committed as a Checkstyle configuration in config/checkstyle/checkstyle.xml
.
If you are using IntelliJ, you can use the unofficial Checkstyle IDEA plugin. These instructions may be useful for configuring the rules.
- You should first raise an issue in the Data Prepper project if you are interested in adding a new dependency to the core projects.
- Avoid using dependencies which provide similar functionality to existing dependencies.
- For example, this project uses Jackson, so do not add Gson
- If core Java has the function or feature, prefer it over an external library. Example: Guava’s hashcode and equals methods when Java’s Objects class has them.
We have the following categories for tests:
- Unit tests - Test a single class in isolation.
- Integration tests - Test a large component or set of classes in isolation.
- End-to-end tests - Tests which run an actual Data Prepper. The should generally be in the
e2e-test
project.
Testing Guidelines:
- Use JUnit 5 for all new test suites
- You are encouraged to update existing JUnit 4 tests to JUnit 5, but this is not necessary.
- Use Hamcrest of assertions
- Use Mockito for mocking
- Each class should have a unit test.
- Unit test class names should end with Test.
- Each large component should have an integration test.
- A good example is a plugin. Plugins should have their own integration tests which integrate all of the plugin’s classes. However, these tests do not run a full Data Prepper.
- Integration test class names should end with IT.
- Test names should indicate what is being tested, if we see a failed test we should be able to look at the test name and have a good idea about what just failed with minimal context about the code being written
- Two good approaches may be used, depending on what you are testing:
- methodUnderTest_condition_result
- test_when_something_condition_then_something_else
- Please avoid generic test names like “testSuccess”
- Two good approaches may be used, depending on what you are testing:
- Our Gradle builds use Groovy, so follow our normal Java styles in the build files. For example, use camel case rather than snake case.
- Use Gradle strings (single quote) unless you need string interpolation. If you need string interpolation, use a GString (double quotes)
Before merging in your PR, the Data Prepper continuous integration (CI) builds must pass. These builds run as GitHub Actions.
If an Action is failing, please view the log and determine what is causing your commit to fail. If a test fails, please check the Summary section of that Action. There may be artifacts for the test results. You can download these and view the result information. Additionally, many builds have Unit Test Results job which includes a summary of the results.
We have the following pages for specific development guidance on the topics: