-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: DH-18351: Add CumCountWhere()
and RollingCountWhere()
features to UpdateBy
#6566
base: main
Are you sure you want to change the base?
feat: DH-18351: Add CumCountWhere()
and RollingCountWhere()
features to UpdateBy
#6566
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review
engine/table/src/main/java/io/deephaven/engine/table/impl/updateby/UpdateByOperatorFactory.java
Outdated
Show resolved
Hide resolved
engine/table/src/main/java/io/deephaven/engine/table/impl/updateby/countwhere/CountFilter.java
Outdated
Show resolved
Hide resolved
...ble/src/main/java/io/deephaven/engine/table/impl/updateby/countwhere/CountWhereOperator.java
Show resolved
Hide resolved
...ble/src/main/java/io/deephaven/engine/table/impl/updateby/countwhere/CountWhereOperator.java
Outdated
Show resolved
Hide resolved
@@ -341,6 +341,21 @@ public void testAggCountWhere() { | |||
assertEquals(6L, counts.get(0)); | |||
counts = ColumnVectors.ofLong(doubleCounted, "filter15"); | |||
assertEquals(6L, counts.get(0)); | |||
|
|||
// Get a static set table for use in dynamic where filters (contains 0-3) | |||
final QueryTable setTable = (QueryTable) TableTools.newTable(col("sym", 1, 2, 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed that AggCountWhere didn't test DynamicWhereFilter
in CI, corrected here.
java-client/session/src/test/java/io/deephaven/client/impl/UpdateBySpecBuilderTest.java
Outdated
Show resolved
Hide resolved
@@ -230,6 +230,7 @@ static Count AggCount(String resultColumn) { | |||
* values that pass the supplied {@code filters}. | |||
* | |||
* @param resultColumn The {@link Count#column() output column} name | |||
* @param filters The filters to apply to the input columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected missing param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does code coverage look like?
engine/table/src/main/java/io/deephaven/engine/table/impl/updateby/UpdateByOperator.java
Show resolved
Hide resolved
proto/proto-backplane-grpc/src/main/proto/deephaven_core/proto/table.proto
Outdated
Show resolved
Hide resolved
proto/proto-backplane-grpc/src/main/proto/deephaven_core/proto/table.proto
Outdated
Show resolved
Hide resolved
@@ -665,7 +665,8 @@ def agg_all_by(self, agg: Aggregation, by: Union[str, List[str]]) -> Table: | |||
""" | |||
return super(Table, self).agg_all_by(agg, by) | |||
|
|||
def update_by(self, ops: Union[UpdateByOperation, List[UpdateByOperation]], by: Union[str, List[str]]) -> Table: | |||
def update_by(self, ops: Union[UpdateByOperation, List[UpdateByOperation]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have coverage of the None case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also need to update the doc string for by
with something like defaults to None, meaning...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpwright yes, tested in test_updateby.py
for both client and server side.
filters (Union[str, Filter, List[str], List[Filter]], optional): the filter condition | ||
expression(s) or Filter object(s) | ||
rev_ticks (int): the look-behind window size (in rows/ticks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default matches formula, but is not listed as a default in the doc.
What ist he intention for = 0; =0 to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not have a default for rev_ticks
, corrected.
For timed windows, have rev==fwd==0
means that the window will contain all rows with exactly matching (to the nanosecond) timestamps. For ticks, rev==fwd==0
is undefined, it means a zero-size window and will probably break in interesting ways depending on the operator.
...ble/src/main/java/io/deephaven/engine/table/impl/updateby/countwhere/CountWhereOperator.java
Show resolved
Hide resolved
...ble/src/main/java/io/deephaven/engine/table/impl/updateby/countwhere/CountWhereOperator.java
Outdated
Show resolved
Hide resolved
TestHelper.assertWhereInt(actualIt, expectedIt, val -> val > 10 && val <= 50); | ||
} | ||
|
||
// Test on String column (representing all Object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having simple tests for Instant and Boolean are worthwhile in my experience, there is stuff that can go wrong with reinterpretation. This is maybe not necessary here though, because we might be covered by where testing. I go back and forth on this, because we do need to create fake tables in some circumstances or could have problems with the ChunkFilter not matching what the aggs read.
Tested Boolean column sources to see if performance is affected by re-interpretation, mostly appears in CumSum:
|
CumCountWhere()
and RollingCountWhere()
features to UpdateBy
CumCountWhere()
and RollingCountWhere()
features to UpdateBy
Groovy Examples
Python Examples
Performance Notes
TL:DR Performance compares very well.
RollingCountWhere()
has near identical performance to the comparison benchmarks (can be faster depending on the complexity of the filter.CumCountWhere()
also compares well toEma()
but can't catch up to zero-keyCumSum()
, which is is remarkably fast.Comparing
CumCountWhere
toCumSum
andEma
:Comparing
RollingCountWhere
toRollingCount
andRollingSum
: