Skip to content

Commit

Permalink
chore: added some readme to nb
Browse files Browse the repository at this point in the history
  • Loading branch information
Molier committed Aug 27, 2024
1 parent 98d5124 commit 1ac9494
Showing 1 changed file with 36 additions and 0 deletions.
36 changes: 36 additions & 0 deletions pandera_poc.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,42 @@
"from openenergyid.pandera_poc.models import InputModel, OutputModel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# polars pandera\n",
"\n",
"## Usage and Reasoning for Using Polars and Pandera\n",
"\n",
"Polars and Pandera are two powerful libraries in the Python ecosystem that are commonly used for data manipulation and validation tasks.\n",
"\n",
"### [Polars](https://youtu.be/J0wpRP-ExVg)\n",
"\n",
"Polars is a blazingly fast DataFrame library for manipulating structured data. Since the core is written in Rust, you get the performance of C/C++ while providing SDKs in other languages like Python.\n",
"The main advantages of using Polars are:\n",
"\n",
"1. **Performance**: Polars is built on top of Rust, a systems programming language known for its speed and memory safety. This allows Polars to achieve high performance and handle large datasets efficiently.\n",
"\n",
"2. **Ease of Use**: Polars provides a simple and intuitive API that is similar to pandas, making it easy for users familiar with pandas to transition to Polars. It also offers a rich set of functions and methods for data manipulation, allowing users to perform complex operations with ease.\n",
"\n",
"3. **Parallel and Distributed Computing**: Polars supports parallel and distributed computing, allowing users to leverage multiple cores or even distributed clusters to process data faster. This makes it a great choice for handling big data workloads.\n",
"\n",
"### Pandera\n",
"\n",
"Pandera is a lightweight library for data validation in Python. It provides a declarative way to define data validation rules and apply them to pandas DataFrames or Series. Pandera allows you to define complex validation rules, such as checking for missing values, data types, ranges, and more.\n",
"\n",
"The main advantages of using Pandera are:\n",
"\n",
"1. **Data Quality Assurance**: Pandera helps ensure the quality and integrity of your data by allowing you to define validation rules and apply them to your datasets. This helps catch data inconsistencies, missing values, or incorrect data types early in the data processing pipeline.\n",
"\n",
"2. **Declarative Syntax**: Pandera uses a declarative syntax to define validation rules, making it easy to express complex validation logic in a concise and readable manner. This allows you to define rules once and apply them to multiple datasets or columns.\n",
"\n",
"3. **Integration with Pandas**: Pandera seamlessly integrates with pandas, one of the most popular data manipulation libraries in Python. This allows you to leverage the power of pandas for data manipulation tasks while using Pandera for data validation.\n",
"\n",
"Polars and Pandera are powerful libraries that complement each other in the data processing pipeline. Polars provides efficient data manipulation capabilities, while Pandera ensures data quality and integrity through validation rules. Together, they enable you to handle large datasets efficiently and ensure the reliability of your data."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down

0 comments on commit 1ac9494

Please sign in to comment.