Releases: unionai-oss/pandera
Releases · unionai-oss/pandera
0.7.1: Add unique option to DataFrameSchema
Enhancements
- add support for Any annotation in schema model (#594)
- add support for timezone-aware datetime strategies (#595)
unique
keyword arg: replace and deprecateallow_duplicates
(#580)- Add support for empty data type annotation in SchemaModel (#602)
- support frictionless primary keys with multiple fields (#608)
Bugfixes
- unify
typing.DataFrame
class definitions (#576) - schemas with multi-index columns correctly report errors (#600)
- strategies module supports undefined checks in regex columns (#599)
- fix validation of check raising error without message (#613)
Docs Improvements
- Tutorial: docs/scaling - Bring Pandera to Spark and Dask (#588)
Repo Improvements
- use virtualenv instead of conda in ci (#578)
Dependency Changes
Contributors
🎉🎉 Big shout out to all the contributors on this release 🎉🎉
- @admackin
- @jeffzi
- @tfwillems
- @fkrull8
- @kvnkho
0.7.0: Pandera Type System Overhaul
Enhancements
- Add support for frictionless schemas (#454) [docs]
- decouple pandera and pandas dtypes (#559) [docs]
- Unify dataframe definitions to fix auto-complete #576
- Report all failure cases when coercing dtypes fails (#584)
Bugfixes
- Handle case of pandas.DataFrame with pandas.MultiIndex in
pandera.error_formatters.reshape_failure_cases
(#560) - Add 'ordered.setter' decorator (#567)
- Fix decorators on classmethods (#568)
- better handling of datetime/timedelta in serialize/deserialize (#585)
Docs Improvements
- Update contributing guide ccca82f
- Add documentation build to contributing guide 361fec0
- Fix virtualenv instructions in contributing guide ed74a65
- Feature/coroutines docs (#570)
- Add frictionless documentation (#579)
- use python primitive types in docs where possible (#581)
Repo Improvements
Contributors
Big shout out to ✨ @mattHawthorn, @vinisalazar, @cristianmatache, @TColl, @jeffzi, @admackin, and @benkeesey ✨ for your contributions on this release 🎉🎉🎉
0.6.5: Support coroutines, regex matching on non-str column names, bugfixes
Enhancements
- Raise error if check_obj.index is MultiIndex when using pandera.Index (#483)
- support decorators for coroutines (#546)
- added py.typed and typed Series descriptor (#543)
- select non-str column names with regex=True (#551)
Bugfixes
- check decorators support non-DataFrame types (#510)
- lazy validation correctly reports all errors (#528)
- don't drop duplicates for series failure cases (#535)
- custom dataframe-level checks don't corrupt data-synthesis strategy #550
Contributors
Thanks to @jekwatt @cristianmatache @lkadin for your first-time contributions! 🎉🎉🎉
0.6.4: Support dataframe-level checks in SchemaModel Config, Bugfixes
New Features
- Allow attaching registered dataframe checks by using Config field names (#478)
Bugfixes
- alias propagation works correctly on empty subclass (#446)
- Add missing inplace arg to SchemaModel's validate (#450)
- fix check_types decorator should return results from validate (#458)
- Dataframe schemas in yaml do not require any field (#479)
- coerce=True and pandas_dtype=None should be a noop (#476)
Doc Improvement
- update documentation css to fit mobile (#447)
- add copy button to docs (#448)
- link documentation to github (#449)
Infrastructure Changes
0.6.3: Bugfixes, update docs
New Features
- add new method
SchemaModel.to_yaml
to serializeSchemaModel
s to yaml #428
Bugfixes
- preserve pandas extension types during validation (#443)
- Fix to_yaml serialization dropping global checks (#428) 🎉 first contribution @antonl 🎉
- fix empty data type not supported for serialization (#435)
- fix empty SchemaModel (#434)
- add doc about attributes excluded by SchemaModel (#436) @jeffzi
- fix DataFrameSchema comparison with non-DataFrameSchema (#431) @jeffzi
- schema serialization handles non-PandasDtype (#424)
- pa.Object coerce should preserve object type (#423)
Documentation
0.6.2: SchemaModel and synthesis bugfixes
New Feature
- Add SchemaModel column name access through class attributes (#388) @jespercodes @jeffzi 🎉
- Parametrized PandasExtensionType types (#389) @jeffzi 🎉
- adding filter argument to strict parameter (#401) @ktroutman
- feature/341: improve str and repr methods for schemas (#413)
Bugfixes
- fix py3.6 optional + literal dtypes in SchemaModel (#379) @jeffzi 🎉
- Fix minimally required packaging version (#380) contribution #1️⃣ @probberechts 🎉
- prevent mypy Check getattr error for registered checks 920a98c
- Compatibility with numpy 1.20 (#395) @jeffzi
- dataframe strategies can generate regex columns (#402)
- bugfix: df data synthesis with size=None, fix CI (#410)
- bugfix: SeriesSchema raises SchemaErrors on lazy validation (#412)
Repo Improvements
0.6.1: coercion and required column bugfixes
0.6.0: Data Synthesis Strategies, Schema Enhancements
🎉🎉🎉 Thanks to @jeffzi, @ktroutman, @m1so for your contributions! 🎉🎉🎉
Enhancements
- Improve memory efficiency of validation process (#360)
- Add column order validation (#352)
- Implement data synthesis strategies using hypothesis (#344)
- Add support for aliases in
SchemaModel
(#329) - Add support for optional name validation of single-index (#326)
- Move columns to multiindex: add
reset_index
,set_index
method toDataFrameSchema
(#319) - Add support for Python 3.9 (#307)
Bugfixes
- typing.DataFrame should expect annotation input (#318)
Deprecations
SchemaErrors.schema_errors
has been changed tofailure_cases
, and theschema_errors
attribute now contains a list of dicts containing schema errors and reason codes. This is a breaking change, but is a minor part of the API and is fairly straightforward to fix (#360).
Documentation Improvements
- Add required columns documentation for schema models (#362)
- Fix docs: schema examples (#347)
- Add documentation for dataframeschema transformations (#333)
- Fix deprecated SchemaErrorReport references in docs (#310)
- Fix SchemaModel dtype example (#309)
Repo Improvements
0.5.1: bugfix - add packaging dependency
pandera relied on the packaging package to get version information to determine pandas legacy status. This was an implicit sub-dependency of one of pandera's dependencies, which was apparently dropped and led to a bug: #335. This bugfix version explicitly adds packaging.
0.5.0: Class-based API for DataFrame Typing
Enhancements
- Implement class-based API for pydantic-style schema definitions 786b504. Big thanks to @jeffzi 🎉
- Add
inplace=False
argument toschema.validate
method to prevent mutation of original dataframe 586ebf3. - Make pandera optional extensions
[hypothesis]
,[io]
,[all]
available c4716a0. Thanks @amitripshtos and @jeffzi 🎉 - Add support for complex number data types 50e86e4 thanks @ferhah 🎉
- Add support for numpy scalar types a519db5
- Add
check_io
decorator for check inputs and outputs of a function 913cbd7 - Throw SchemaError with column name instead of ValueError for nulls in int series f7b03e3 thanks @TheCleric 🎉
Bugfixes
- Bugfix io.to_script and to_yaml: Ignoring serializing Checks with lambda functions da9c3a5 thanks @ferhah 🎉