Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars feature updates #1119

Merged

Conversation

abajpai15
Copy link
Contributor

Added polars to datetime_column_profile and unstructured_labeler

@abajpai15 abajpai15 requested a review from a team as a code owner March 22, 2024 16:30
@taylorfturner taylorfturner enabled auto-merge (squash) March 22, 2024 16:30
auto-merge was automatically disabled March 22, 2024 16:43

Head branch was pushed to by a user without write access

@abajpai15 abajpai15 force-pushed the polars_feature_rebase branch 2 times, most recently from 8fb2da9 to fdf3171 Compare April 3, 2024 21:37
Comment on lines 259 to +260
len_df = len(df_series)

is_row_datetime = pd.Series(np.full((len(df_series)), False))
is_row_datetime = pd.Series(np.full((len_df), False))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch

Comment on lines 712 to 724
if type(data) is pl.DataFrame:
words = (
[
w.strip(string.punctuation)
for w in row.str.to_lowercase().str.split(by=" ")
]
for row in data
)
else:
words = (
[w.strip(string.punctuation) for w in row.lower().split()]
for row in data
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit made it like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Comment on lines 726 to 734
if type(data) is pl.DataFrame:
words = (
[w.strip(string.punctuation) for w in row.str.split(by=" ")]
for row in data
)
else:
words = (
[w.strip(string.punctuation) for w in row.split()] for row in data
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

90% of this is similar ... could make this more modular. follow-up PR maybe or if you can crank it out quick can do this

dataprofiler/profilers/unstructured_text_profile.py Outdated Show resolved Hide resolved
micdavis
micdavis previously approved these changes Apr 5, 2024
@taylorfturner taylorfturner enabled auto-merge (squash) April 8, 2024 14:43
@taylorfturner taylorfturner enabled auto-merge (squash) April 22, 2024 17:12
@taylorfturner taylorfturner merged commit 503efa2 into capitalone:feature/polars Apr 22, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants