[FEATURE]Add PPL Sanity Tests Job #718
Labels
enhancement
New feature or request
Lang:PPL
Pipe Processing Language support
testing
test related feature
Is your feature request related to a problem?
We need a comprehensive testing framework to validate **PPL ** commands in the Spark environment, ensuring that each new PPL (Spark) release meets critical requirements such as:
This testing job should be deployable on any Spark-PPL compatible setup and should automate the dataset setup, reducing the friction for developers and testers.
Additionally, this can evolve into a multi-step project, eventually introducing TPC-H-based performance benchmarking as well as extended validation scenarios.
What solution would you like?
We propose creating a Spark PPL Sanity Job that includes the following components:
Dataset Generator:
Endpoint API:
* Input Parameters: Schema & Catalog names for Spark and OpenSearch integration.
* Test Scopes: Define the scope for tests: sanity checks, specific PPL commands, or performance-focused tests.
* Reporting: Define the format of the test results: detailed reports, performance summaries, or pass/fail results for sanity tests.
Extendable Framework:
Plugin Strategy: Developers should be able to extend the framework by adding new types of tests. This would involve a modular architecture where each test can be a standalone plugin or module.
Grammar Extensions: It should be possible to add new PPL commands or grammar rules for testing without changing the architecture or packaging of the project - testing content should be defined as an additional resource to the project.
Multi-step Test Jobs:
Why is this needed?
Example Use Cases & Existing Test Frameworks:
Proposed Architecture:
Dataset Creation Step:
Parameterized Spark Job:
Modular Test Components:
eval
,head
,sort
) returns the expected results on a known dataset.Do you have any additional context?
The text was updated successfully, but these errors were encountered: