Skip to content
This repository has been archived by the owner on Dec 21, 2018. It is now read-only.

[REVIEW] Replace function #106

Open
wants to merge 36 commits into
base: master
Choose a base branch
from
Open

[REVIEW] Replace function #106

wants to merge 36 commits into from

Conversation

gcca
Copy link
Contributor

@gcca gcca commented Aug 22, 2018

API proposal for the replace function.

@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@mike-wendt
Copy link
Contributor

ok to test

@mike-wendt mike-wendt added the 2 - In Progress Currenty a work in progress label Aug 22, 2018
@scopatz
Copy link

scopatz commented Aug 22, 2018

API seems reasonable to me

@gcca gcca changed the title [WIP] Replace function [REVIEW] Replace function Sep 13, 2018
@@ -0,0 +1,110 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to a separate bench folder as we discussed in the binary ops PR review?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And separated into benchmarks and unit tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,143 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add more documentation/comments to the tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsakharnykh Can you approve your review since this is resolved?

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing the function name or at least improving the documentation of what it does. One other minor suggestion.

/// \param[in] values contains the replacement values
///
/// Note that `to_replace` and `values` are related by the index
gdf_error gdf_replace(gdf_column * column,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface is confusing to me. Why does it have column and to_replace parameters? What happens to column? What happens to to_replace? Why not just make a gdf_copy(out_column, in_column)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see. The meaning of "replace" is not clear. It's not a copy. The semantics are actually those of "find_and_replace_all": For each value in to_replace, find all instances of that value in column and replace it with the corresponding value in values. This should be made clear in the header documentation for the function. Consider changing the name to gdf_find_and_replace_all() or something like that...

@@ -0,0 +1,43 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding google benchmark in this PR? Shouldn't that be orthogonal to gdf_replace, and therefore a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the first benchmark I've seen added to libgdf...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since no other PR depends of this, I put google benchmark here. But I could make another PR if it's relevant.

src/replace.cu Outdated
if (NotSameDType(column, to_replace, values)) { return GDF_CUDA_ERROR; }

switch (column->dtype) {
#define WHEN(DTYPE) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like calling this macro "WHEN" -- it's not descriptive. Something like REPLACE_CASE would be clearer. Then you would have

switch (column->type) {
    REPLACE_CASE(INT8);
    REPLACE_CASE(INT16);
    // etc.
}

@mike-wendt mike-wendt added 4 - Needs Rework Additional work is needed migrate to cudf and removed 2 - In Progress Currenty a work in progress labels Oct 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
4 - Needs Rework Additional work is needed migrate to cudf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants