Sparkle is a meta-framework built on top of Apache Spark, designed to streamline data engineering workflows and accelerate the delivery of data products. Developed by DataChef, Sparkle focuses on three main areas:
- Improving Developer Experience (DevEx) 🚀
- Reducing Time to Market ⏱️
- Easy Maintenance 🔧
With these goals in mind, Sparkle has enabled DataChef to deliver functional data products from day one, allowing for seamless handovers to internal teams.
Sparkle enhances the developer experience by abstracting away non-business-critical aspects of Spark application development. It achieves this through:
- Sophisticated Configuration Mechanism: Simplifies the setup and configuration of Spark applications, allowing developers to focus solely on business logic.
- Automatic Functional Tests 🧪: Generates tests for each application automatically, based on predefined input and output fixtures. This ensures that the application behaves as expected without requiring extensive manual testing.
Sparkle significantly reduces the time to market by automating the deployment and testing processes. This allows data engineers to concentrate exclusively on developing the business logic, with all other aspects handled by Sparkle:
- Automated Testing ✅: Ensures that all applications are robust and ready for deployment without manual intervention.
- Seamless Deployment 🚢: Automates the deployment pipeline, reducing the time needed to bring new data products to market.
Sparkle simplifies maintenance through heavy testing and abstraction of non-business functional requirements. This provides a reliable and trustworthy system that is easy to maintain:
- Abstraction of Non-Business Logic 📦: By focusing on business logic, Sparkle minimizes the complexity associated with maintaining Spark applications.
- Heavily Tested Framework 🔍: All non-business functionalities are thoroughly tested, reducing the risk of bugs and ensuring a stable environment for data applications.
The Sparkle framework operates on a principle similar to Function as a Service (FaaS). Developers can instantiate a Sparkle application that takes a list of input DataFrames and focuses solely on transforming these DataFrames according to the business logic. The Sparkle application then automatically writes the output of this transformation to the desired destination.
Sparkle is currently under heavy development, and we are continuously working on improving and expanding its capabilities.
To stay updated on our progress and access the latest information, follow us on LinkedIn and GitHub.
We welcome contributions from the community! If you're interested in contributing to Sparkle, please check our GitHub repository for more details on how you can get involved.
Sparkle is licensed under the Apache v2.0 License. See the LICENSE file for more details.
For more information, questions, or feedback, feel free to reach out to us on LinkedIn or open an issue on our GitHub repository.
Thank you for your interest in Sparkle! We're excited to have you join us on this journey to revolutionize data engineering with Apache Spark. 🎉