Releases: unionai-oss/pandera
v0.14.0: ✍️ Pandera Internals Rewrite [phase 1]
⭐️ Highlights
The main highlight of this release is that phase 1 of the Pandera internals re-write is complete 🎉🚀! This is a backwards-compatible re-write (unit tests FTW 😅) that should just work with your existing pandera code. Please submit bug reports if you encounter any regressions that weren't covered by the current test suite.
These PRs #913 #1109, and #1110 address #381, and essentially decouples pandas-specific logic from the pandera schema specification. In summary:
- The pandera schema specifications are defined in
pandera.api
, containing:- schema base classes in
pandera.api.base
- pandera schema classes in
pandera.api.pandas
- the global check and hypothesis namespace in
pandera.api.checks.Check
andpandera.api.hypotheses.Hypothesis
- decorators are provided in
pandera.api.extensions
to be able to register builtin and custom checks/hypotheses
- schema base classes in
- The pandera backend validation logic is defined in
pandera.backends
, containing:- backend base classes in
pandera.backends.base
- pandas-specific backend validators in
pandera.backends.pandas
- backend base classes in
Now, all pandas-specific logic is isolated to specific modules, where support for additional non-pandas-compliant schema specifications and their associated backends can be implemented either as 1st-party-maintained libraries (see issues for supporting polars and ibis) or 3rd party libraries.
🛣 Rewrite Roadmap
The bulk of the re-write is complete, however there are still some outstanding items:
- Write validation backends for the existing pandas-like frameworks (dask, pyspark.pandas, modin). This may lead to refactoring some of the abstractions that came out of the rewrite.
- Write an alpha version of the
pandera-ibis
package, which will create a schema specification and validation backends for ibis data structures (see issue #1105) - Document the process of writing your own 3rd party libraries based on pandera for any arbitrary statistical data container.
What's Changed
- Bugfix/996: strict="filter" doesn't work on spark dataframes by @nwoodbury in #1001
- unpin pandas-stubs version by @williamjamir in #1000
- add PR messages, DCO to contributing guide by @cosmicBboy in #1006
- Turn failure-cases to string to avoid hashing unhashable objects by @a-recknagel in #1014
- not require
coerce==True
when for PydandticModels by @the-matt-morris in #1011 - Schema Model manipulation docs by @a-recknagel in #1012
- Fix handling of decimals with scale=0 by @a-recknagel in #1010
- Add Union support to
check_types
: Bugfix/977 by @kr-hansen in #995 - Bugfix/997 by @joepatol in #1017
- update mypy plugin and tests by @cosmicBboy in #1007
- fix issue where @check_types-decorated function is an iterable by @cosmicBboy in #1022
- fix mypy extra unit tests, pin pandas-stubs for dev env by @cosmicBboy in #1056
- Feature/511: Copy columns in DataFrameSchema.init() by @NickCrews in #1055
- Unpinning ray from requirements-dev.txt by @erichamers in #1052
- core and backend pandera API internals rewrite by @cosmicBboy in #913
- Small fix to example by @brl0 in #1083
- Coerce dt indexes and series by @cristianmatache in #1057
- correctly type-check strings by @cosmicBboy in #1106
- fix lazy validation issue with regex columns if no column found by @cosmicBboy in #1107
- fix(dtypes.py): correction at function is_numeric docstring by @HenriqueAJNB in #1100
- internals rewrite: clean up checks and hypothesis functionality by @cosmicBboy in #1109
- rename pandera.core to pandera.api by @cosmicBboy in #1110
New Contributors
- @nwoodbury made their first contribution in #1001
- @williamjamir made their first contribution in #1000
- @kr-hansen made their first contribution in #995
- @joepatol made their first contribution in #1017
- @erichamers made their first contribution in #1052
- @brl0 made their first contribution in #1083
- @HenriqueAJNB made their first contribution in #1100
Full Changelog: v0.13.4...v0.14.0
Release 0.13.4: JSON serialization and bugfixes
What's Changed
- make GenericDtype TypeVar bound union of supported types by @cosmicBboy in #960
- Use Self in type hints for DataFrameSchema by @NickCrews in #961
- CI-improvements by @NickCrews in #962
- move jupyterlite_sphinx to pip deps by @cosmicBboy in #973
- Fix custom check extensions doc by @a-recknagel in #975
- Raise error on typo in extensions.register_check_method's statistics argument by @a-recknagel in #985
- Add class name for methods in validation error message (#980) by @Drumbits in #982
- check_output works with schema where coerce=True by @cosmicBboy in #979
- make df strategy less complex by @cosmicBboy in #989
- fix tz deprecation by @cristianmatache in #972
- De/serializate data frame schema to/from JSON by @KiaXdice in #924
- fix to_script output by @cosmicBboy in #994
New Contributors
- @a-recknagel made their first contribution in #975
- @Drumbits made their first contribution in #982
- @KiaXdice made their first contribution in #924
Contributors
Shoutout to all the contributors of this release 🎉
Full Changelog: v0.13.3...v0.13.4
Beta Release: v0.13.4b0
Beta Release v0.13.4b0
Release 0.13.3: Fix Decimal Type
What's Changed
- Fix decimal by @cosmicBboy in #956
- add date and decimal types to typing.common module by @cosmicBboy in #958
Full Changelog: v0.13.2...v0.13.3
Release 0.13.2: Fix modin tests
Release 0.13.1: Bugfix on "Try Pandera" Jupyterlite Deployment
Release 0.13.0: Option to Report All Errors, "Try Pandera" with Jupyterlite
Highlights ⭐️
- try pandera: add jupyterlite notebooks, add support for py3.7 (#951) @cosmicBboy
- Feature/922 add other ways to report unique errors as an argument (#914) @ng-henry
What's Changed 📈
- Bugfix/910: Support
ordered=True
in yaml schemas (#943) @dstumpy - docs: Fix typo in pyspark.rst (#948) @smoothml
- update rename_columns not to error on {key: key, ...} rename_dict (#941) @hsorsky
- Fix #937: Handle empty MultiIndex validation (#938) @davidandreoletti
- Fix infer_schema for 'empty' dataframes (#944) @tpvasconcelos
- Bugfix/Fix with_pydantic mypy error (#934) @brrm
- Updating Fugue section docs (#927) @kvnkho
New Contributors 🎉
Shout out to all the first-time contributors!
Full Changelog: v0.12.0...v0.13.0
Beta Release: v0.13.0b1
beta release v0.13.0b1
Beta Release v0.13.0b0
beta release 0.13.0b0
Release 0.12.0: Logical Types, New Built-in Check, Bugfixes, Doc Improvements
Release 0.12.0
Highlights ⭐️
This release features:
- Support for Logical Data Types #798: These data types check the actual values of the data container at runtime to support data types like
"URL"
,"Name"
, etc. Check.unique_values_eq
#858: Make sure that all of the values in the data container cover the entire domain of the specified finite set of values.
What's Changed 📈
- Lazy SchemaErrors contain schema name by @fleimgruber in 0d10f39
- Support for logical data types by @jeffzi in #798
- fix for Index of type category fails on validation by @kuutsav in #840
- Add new check unique_values_eq by @johnkangw in #858
- Add from records to panderas dataframe #850 by @borissmidt in #859
- Doc fix: incorrect default value by @plague006 in #862
- Handle cases of reset_index level being None or an empty list by @plague006 in #865
- fixing unique multi index in SchemaModel by @mattB1989 in #870
- Adding description and title to column serializations by @dantheand in #877
- Fix modin and pyspark CI by @jeffzi and @cosmicBboy in #886
- Add pandas_engine.Date by @jeffzi in #887
- fix typo in docs by @jonwiggins in #895
- Update strict type-hints by @the-matt-morris in #898
- fix strategies ci by @cosmicBboy in #899
- Bugfix/882 don't coerce datatypes twice by @ng-henry in #901
- bugfix/904: ignore_na only ignores df records if all are Nan by @cosmicBboy in #909
- fix sphinx docs by @cosmicBboy in #912
ExtensionDtype
path should follow documentation by @pepelovesvim in #915- pin pandas-stubs version, bump mypy by @cosmicBboy in #916
- Docs/867 by @the-matt-morris in #919
New Contributors 🎉
- @johnkangw made their first contribution in #858
- @pepelovesvim made their first contribution in #915
- @dantheand made their first contribution in #877
- @jonwiggins made their first contribution in #895
- @kuutsav made their first contribution in #840
- @borissmidt made their first contribution in #859
- @plague006 made their first contribution in #862
- @mattB1989 made their first contribution in #870
- @the-matt-morris made their first contribution in #898
- @ng-henry made their first contribution in #901
- @pepelovesvim made their first contribution in #915
Full Changelog: v0.11.0...v0.12.0