Releases: unionai-oss/pandera
Release v0.21.0: Reduce import and schema creation runtime, add docsearch search bar
⭐️ Highlights
This release optimizes the import and schema creation runtime so that importing pandera and creating a schema (without doing any validation) happens in ~5 ms (before it would be >800ms). It also updates the docs to use docsearch
for a better search experience.
- Defer backend registration to validation time by @cosmicBboy in #1818
- Reduce import overhead to improve runtime by @cosmicBboy in #1821
- Add docsearch by @cosmicBboy in #1814
What's Changed
- Upgrade multimethod to 1.12 by @jskrzypek in #1803
- bugfix: validation_enabled configuration correctly disables polars validation by @cosmicBboy in #1813
- Support Enum by @gab23r in #1798
- Revert/1803 by @cosmicBboy in #1815
- Add docsearch by @cosmicBboy in #1816
- Correct spelling in index.md by @galenseilis in #1819
- Bugfix/1760 Bad type hint for argument unique for DataFrameSchema by @Jarek-Rolski in #1817
- accept expr in default value by @gab23r in #1820
- fix: 🐛 add coerce value to pyarrow dtypes by @aaravind100 in #1850
- feat: add overload. by @yassun7010 in #1823
New Contributors
- @jskrzypek made their first contribution in #1803
- @galenseilis made their first contribution in #1819
- @Jarek-Rolski made their first contribution in #1817
- @yassun7010 made their first contribution in #1823
Full Changelog: v0.20.4...v0.21.0
Release v0.20.4: Bugfixes to polars & pyspark backends and more
What's Changed
- Bugfix/1732: Fix misleading error when columns are missing and lazy=True by @benlee1284 in #1752
- Bugfix/1644: refactor geopandas and pyarrow dtypes to avoid top-level import by @cosmicBboy in #1753
- regex column errors should report the correct column name by @cosmicBboy in #1754
- bugfix/1657: use rename instead of select in polars check backend by @cosmicBboy in #1757
- make sure registered checks supports error kwarg by @cosmicBboy in #1756
- make sure optional generic types are supported by @cosmicBboy in #1758
- fix: SQLModel table model not validated by @AlpAribal in #1696
- Restore accidentally-deleted use of "breakpoint()" by @deepyaman in #1763
- Swap
types-pkg_resources
withtypes-setuptools
by @deepyaman in #1779 - Add support for Spark Connect dataframes by @filipeo2-mck in #1775
- feat: select_columns reorders columns by default by @ldacey in #1783
- Update Polars dtype test to generate more examples by @deepyaman in #1770
- bugfix/1784 polars
DataFrameModel.to_json_schema()
fails on DateTime column by @AlpAribal in #1789 - fix pd.ArrowDtype use in pandera engine for old pd versions by @cosmicBboy in #1792
- Reexport polars function to match pyright expectation by @gab23r in #1797
New Contributors
- @benlee1284 made their first contribution in #1752
- @ldacey made their first contribution in #1783
- @gab23r made their first contribution in #1797
Full Changelog: v0.20.3...v0.20.4
Release v0.20.3: polars integration cleanup, docs updates, bugfixes
What's Changed
- update dtype api reference docs by @cosmicBboy in #1745
- handle deprecated methods/arguments in polars v1 by @cosmicBboy in #1746
- handle case when pandera is run with optimized python mode by @cosmicBboy in #1749
Full Changelog: v0.20.2...v0.20.3
Release v0.20.2: Complete pyarrow coverage, support polars v1
⭐️ Highlights:
- feat: add remaining pyarrow types by @aaravind100 in #1720
- Bugfix/1724: Add support for polars v1 by @cosmicBboy in #1725
What's Changed
- Depend on OpenJDK>8.0.0 for PySpark support by @billyvinning in #1701
- Update polars checks.py to avoid calling the check function multiple times by @jcadam14 in #1719
- str checks use plain string instead of re.Pattern by @cosmicBboy in #1729
- Document Field instance reuse workaround by @lundybernard in #1730
- add pyarrow docs by @cosmicBboy in #1739
- fix typing docs by @cosmicBboy in #1740
- Update docs: setup deps for algolia, modify pandera banner, fix API ref by @cosmicBboy in #1741
New Contributors
Full Changelog: v0.20.1...v0.20.2
Release v0.20.1: Bugfix for pyarrow dependency error
What's Changed
- fix: raising type error when pyarrow is not installed by @aaravind100 in #1717
- feat: add pyarrow list and struct to pandas engine by @aaravind100 in #1699
Full Changelog: v0.20.0...v0.20.1
Release v0.20.0: Pyarrow dtype support
⭐️ Highlights
- Pandera now supports pyarrow datatypes in the pandera validation engine! Big shoutout to @aaravind100 for the heavy lifting here.
- Added compatibility for numpy v2.
- Add compatibility for polars v1
pandera.SchemaModel
is now deprecated, usepandera.DataFrameModel
instead.
What's Changed
- Bugfix/1631: Series[Annotated[...]] DataFrameModel types should correctly create a DataFrameSchema by @cosmicBboy in #1633
- Add missing pandas import line. by @kyleweise in #1635
- add pandas pyarrow backend support by @aaravind100 in #1628
- bugfix: timezone-agnostic datetime in polars works in DataFrameModel by @cosmicBboy in #1638
- fix pandas pyarrow string validation by @aaravind100 in #1636
- Bump jinja2 from 3.1.3 to 3.1.4 by @dependabot in #1619
- Updating Old
pandas-stubs
Link in Documentation by @bustosalex1 in #1648 - Bugfix: add missing
reason_code
for pyspark backend by @melvinkokxw in #1646 - change pandas engine to be numpy>2 compat by @cosmicBboy in #1690
- Minor documentation fix by @poulter7 in #1643
- perf: dataframe-level checks, fix polars tests by @cosmicBboy in #1702
- Docs: fix missing import in data conversion code cell by @billyvinning in #1700
- fix: DataFrameSchema repr formatting by @AlpAribal in #1694
- Fix coerion errors for
polars=1.0.0
by @MariusMerkleQC in #1706 - Solve deprecation warning on with_context by @MariusMerkleQC in #1705
- fix: default values set before coercion by @sanzoghenzo in #1708
- remove deprecated SchemaModel by @cosmicBboy in #1711
- Fix mismatched quotes, standardize CONTRIBUTING.md by @deepyaman in #1712
- Run CI on PRs to
ibis-dev
; stop forpolars-dev
by @deepyaman in #1713 - enable black for py311 by @lundybernard in #1697
- Updates to improve TryPandera documentation by @hendera2 in #1668
New Contributors
- @kyleweise made their first contribution in #1635
- @aaravind100 made their first contribution in #1628
- @bustosalex1 made their first contribution in #1648
- @melvinkokxw made their first contribution in #1646
- @poulter7 made their first contribution in #1643
- @billyvinning made their first contribution in #1700
- @AlpAribal made their first contribution in #1694
- @MariusMerkleQC made their first contribution in #1706
- @sanzoghenzo made their first contribution in #1708
- @lundybernard made their first contribution in #1697
- @hendera2 made their first contribution in #1668
Full Changelog: v0.19.2...v0.20.0
Release 0.19.3: Polars dtype bugfixes
What's Changed
- bugfix: timezone-agnostic datetime in polars works in DataFrameModel by @cosmicBboy #1638
- Bugfix/1631: Series[Annotated[...]] DataFrameModel types should correctly create a DataFrameSchema by @cosmicBboy in #1633
- Add missing pandas import line. by @kyleweise in #1635
New Contributors
- @kyleweise made their first contribution in #1635
Full Changelog: v0.19.2...v0.19.3
Release v0.19.2: Bugfix on correctly checking nullable Floats
What's Changed
- bugfix: nullable check float dtype handles nan and null by @cosmicBboy in #1627
Full Changelog: v0.19.1...v0.19.2
Release 0.19.1: Bugfixes and docs fixes
What's Changed
- Bugfix/1616: Polars data container validation by @cstabnick in #1623
- use google colab instead of jupyterlite by @cosmicBboy in #1618
New Contributors
- @cstabnick made their first contribution in #1623
Full Changelog: v0.19.0...v0.19.1
Release 0.19.0: Polars validation support
✨ Highlights ✨
📣 Pandera now supports validation of polars.DataFrame
and polars.LazyFrame
🐻❄️!
You can now do this:
import pandera.polars as pa
import polars as pl
class Schema(pa.DataFrameModel):
state: str
city: str
price: int = pa.Field(in_range={"min_value": 5, "max_value": 20})
lf = pl.LazyFrame(
{
'state': ['FL','FL','FL','CA','CA','CA'],
'city': [
'Orlando',
'Miami',
'Tampa',
'San Francisco',
'Los Angeles',
'San Diego',
],
'price': [8, 12, 10, 16, 20, 18],
}
)
Schema.validate(lf).collect()
And of course you can do functional validation with decorators like so:
from pandera.typing.polars import LazyFrame
@pa.check_types
def function(lf: LazyFrame[Schema]) -> LazyFrame[Schema]:
return lf.filter(pl.col("state").eq("CA"))
function(lf).collect()
You can read more about the integration here. Not all pandera features are supported at this point, but depending on community demand/contributions we'll slowly add them. To learn more about what's currently supported, check out this table.
Special shoutout to @AndriiG13 and @FilipAisot for their contributions on the built-in checks and polars datatypes, respectively, and to @evanrasmussen9, @baldwinj30, @obiii, @Filimoa, @philiporlando, @r-bar, @alkment, @jjfantini, and @robertdj for their early feedback and bug reports during the 0.19.0 beta.
What's Changed
- Support polars DataFrames, LazyFrames by @cosmicBboy, @AndriiG13, and @FilipAisot in #1373
- bugfix: optional columns in polars schema should no longer raise errors when not present by @cosmicBboy in #1532
check_nullable
does not uselessly computeisna()
anymore in pandas backend by @smarie in #1538- Polars LazyFrames are validated at the schema-level by default by @cosmicBboy in #1534
- Enable from_format_kwargs for dict format by @ektar in #1539
- Convert docs to myst by @cosmicBboy in #1542
- fix README(tab to space) by @np-yoe in #1544
- pandas DataFrameModel accepts python generic types by @cosmicBboy in #1547
- Backend registration happens at schema initialization by @cosmicBboy in #1548
- do not format if test is not necessary by @mattB1989 in #1530
- Register default backends when restoring state by @alkment in #1550
- Bump actions/setup-python from 4 to 5 by @dependabot in #1452
- fix: prevent environment pollution when importing pyspark by @sam-goodwin in #1552
- use rst to speed up api docs generation by @cosmicBboy in #1557
- Add _GenericAlias.call patch by @cosmicBboy in #1561
- support typeguard < 3 for better compatability by @cosmicBboy in #1563
- Add parse function to DataFrameModel in #1181
- localize GenericAlias patch to DataFrameBase subclasses by @cosmicBboy in #1571
- Bump idna from 3.4 to 3.7 by @dependabot in #1569
- docs: fix typo in env var name by @alekseik1 in #1562
- polars: fix element-wise checks, register backends by @cosmicBboy in #1572
- remove pytest ignore on modin, dask. pyspark tests with pandas >= 2 by @cosmicBboy in #1573
- make sure check name is propagated to error report by @cosmicBboy in #1574
- update ci to run pyspark, modin, dask with pandas >= v2 by @cosmicBboy in #1575
- use sphinx-design instead of sphinx-panels by @cosmicBboy in #1581
- Update bug_report.md by @philiporlando in #1585
- bugfix: polars column core checks now return check output by @cosmicBboy in #1586
- make pandera.typing.Series[TYPE] error in polars DataFrameModel more readable by @cosmicBboy in #1588
- implement timezone agnostic polars_engine.DateTime type by @cosmicBboy in #1589
- fix pyspark import error by @cosmicBboy in #1591
- fix pyspark tests when run on full test suite by @cosmicBboy in #1593
- Bugfix/1580 by @cosmicBboy in #1596
- Set pandas_io.from_frictionless_schema to use a raw string for docs by @mark-thm in #1597
- Add a generic Series type for polars by @baldwinj30 in #1595
- Add StructType and DDL extraction from Pandera schemas by @filipeo2-mck in #1570
- Clean up typing for pandas GenericDtype by @cosmicBboy in #1601
- Adding warning for unique in pyspark field and a test showing the issue as well as config when it works. by @zippeurfou in #1592
- bugfix/1607: coercion error should correctly report relevant failure cases by @cosmicBboy in #1608
- Create a common DataFrameSchema class, update mypy used in pre-commit by @cosmicBboy in #1609
- Dataframe column schema by @cosmicBboy in #1611
- bugfix: column-level coercion is properly implemented by @cosmicBboy in #1612
- update docs for polars by @cosmicBboy in #1613
- fix: properly coerce dtypes for columns with regex=True by @tesslinden in #1602
- rewrite Check class docstrings to remove pandas assumption by @cosmicBboy in #1614
- add tests for polars decorators by @cosmicBboy in #1615
New Contributors
- @smarie made their first contribution in #1538
- @ektar made their first contribution in #1539
- @np-yoe made their first contribution in #1544
- @alkment made their first contribution in #1550
- @sam-goodwin made their first contribution in #1552
- @alekseik1 made their first contribution in #1562
- @philiporlando made their first contribution in #1585
- @mark-thm made their first contribution in #1597
- @baldwinj30 made their first contribution in #1595
- @zippeurfou made their first contribution in #1592
- @tesslinden made their first contribution in #1602
Full Changelog: v0.18.3...v0.19.0