Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for float precision calculation using categorical data with trailing 0s #1125

Merged
merged 1 commit into from
Apr 15, 2024

Conversation

SchadtJ
Copy link
Contributor

@SchadtJ SchadtJ commented Mar 25, 2024

Bug fix for #1048 (comment).

The float precision calculation errors out for categorical data when one of the values has leading/ trailing zeros. This is due to the regex operation stripping these zeros and the resulting value being outside the list of possible values.

Passing tests:
dataprofiler-bug

@SchadtJ SchadtJ requested a review from a team as a code owner March 25, 2024 02:02
@CLAassistant
Copy link

CLAassistant commented Mar 25, 2024

CLA assistant check
All committers have signed the CLA.

@SchadtJ SchadtJ changed the title <WIP> Bug fix for float precision calculation using categorical data with trailing 0s Bug fix for float precision calculation using categorical data with trailing 0s Mar 25, 2024
@taylorfturner taylorfturner enabled auto-merge (squash) March 25, 2024 12:40
@taylorfturner taylorfturner added the Bug Something isn't working label Apr 2, 2024
Comment on lines +215 to +219
categorical_series = pd.Series(
[202209, 202210, 202211], dtype="category"
).apply(str)
float_profiler = FloatColumn("Name")
float_profiler.update(categorical_series)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing this locally so may have some additional comments, but initial thought is that there should definitely be some assert statements here to validate this is actually working as intended post-change @SchadtJ @scottiegarcia

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taylorfturner taylorfturner merged commit d3159bd into capitalone:dev Apr 15, 2024
5 checks passed
@taylorfturner
Copy link
Contributor

reverting this accidental merge -- @SchadtJ please reopen into dev. Thanks!

micdavis pushed a commit that referenced this pull request May 20, 2024
* Replace snappy with cramjam (#1091)

* add downloads tile (#1085)

* Replace snappy with cramjam

* Delete test_no_snappy

---------

Co-authored-by: Taylor Turner <[email protected]>

* Quick fix for dependency max pins (#1120)

* Fix dask_expr

* Keras and Tensorflow version fix

* Keras and Tensorflow version fix

* Fix keras bug

* pre-commit fix (#1122)

* docs: update test link to latest version (#1114)

* docs: add contributor notes on where to find documentation branches (#1113)

* docs: add contributor notes on where to find documentation branches

* docs: update documentation wording to spell out why `dev-gh-pages` and `gh-pages` branches exist for staging content

* docs: add note on fork

Co-authored-by: Taylor Turner <[email protected]>

* Update .github/CONTRIBUTING.md

Co-authored-by: Taylor Turner <[email protected]>

---------

Co-authored-by: Taylor Turner <[email protected]>

* update black version (#1131)

* Add memray max version (#1132)

* Bug fix for float precision calculation using categorical data with trailing zeros. (#1125)

* Revert "Bug fix for float precision calculation using categorical data with t…" (#1133)

This reverts commit d3159bd.

* fix

* make up to date

* yep, shouldn't change

* bump version

---------

Co-authored-by: Gábor Lipták <[email protected]>
Co-authored-by: abajpai15 <[email protected]>
Co-authored-by: Patrick Carlson <[email protected]>
Co-authored-by: James Schadt <[email protected]>
JGSweets pushed a commit to JGSweets/data-profiler that referenced this pull request Jun 5, 2024
taylorfturner added a commit that referenced this pull request Jun 6, 2024
* Replace snappy with cramjam (#1091)

* add downloads tile (#1085)

* Replace snappy with cramjam

* Delete test_no_snappy

---------

Co-authored-by: Taylor Turner <[email protected]>

* pre-commit fix (#1122)

* Bug fix for float precision calculation using categorical data with trailing zeros. (#1125)

* Revert "Bug fix for float precision calculation using categorical data with t…" (#1133)

This reverts commit d3159bd.

* refactor: move layers outside of class

* refactor: update model to keras 3.0

* fix: manifest

* fix: bugs in compile and train

* fix: bug in load_from_library

* fix: bugs in CharCNN

* refactor: loading tf model labeler

* fix: bug in data_labeler identification

* fix: update model to use proper softmax layer names

* fix: formatting

* fix: remove unused line

* refactor: drop support for 3.8

* fix: comments

* fix: comment

---------

Co-authored-by: Gábor Lipták <[email protected]>
Co-authored-by: Taylor Turner <[email protected]>
Co-authored-by: James Schadt <[email protected]>
micdavis pushed a commit that referenced this pull request Jun 14, 2024
* refactor: Upgrade the models to use keras 3.0 (#1138)

* Replace snappy with cramjam (#1091)

* add downloads tile (#1085)

* Replace snappy with cramjam

* Delete test_no_snappy

---------

Co-authored-by: Taylor Turner <[email protected]>

* pre-commit fix (#1122)

* Bug fix for float precision calculation using categorical data with trailing zeros. (#1125)

* Revert "Bug fix for float precision calculation using categorical data with t…" (#1133)

This reverts commit d3159bd.

* refactor: move layers outside of class

* refactor: update model to keras 3.0

* fix: manifest

* fix: bugs in compile and train

* fix: bug in load_from_library

* fix: bugs in CharCNN

* refactor: loading tf model labeler

* fix: bug in data_labeler identification

* fix: update model to use proper softmax layer names

* fix: formatting

* fix: remove unused line

* refactor: drop support for 3.8

* fix: comments

* fix: comment

---------

Co-authored-by: Gábor Lipták <[email protected]>
Co-authored-by: Taylor Turner <[email protected]>
Co-authored-by: James Schadt <[email protected]>

* Fix Tox (#1143)

* tox new

* update

* update

* update

* update

* update

* update

* update

* update tox.ini

* update

* update

* remove docs

* empty retrigger

* update (#1146)

* bump version

* update 3.11

* remove dist/

---------

Co-authored-by: JGSweets <[email protected]>
Co-authored-by: Gábor Lipták <[email protected]>
Co-authored-by: James Schadt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants