-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[euphoria-core] #259 Hints are not runtime specific. #268
Conversation
5443b02
to
f16d343
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I din't walk though the test, I think that they will need more tuning and adding more generic tests after the core part is done. Now I think that the most important change would be to move the hints from Dataset
to operator.
|
||
/** | ||
* A dataset abstraction. | ||
* | ||
* @param <T> type of elements of this data set | ||
*/ | ||
@Audience(Audience.Type.CLIENT) | ||
public interface Dataset<T> extends Serializable { | ||
public interface Dataset<T> extends OutputHintAware<OutputHint>, Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand purpose of this interface. It looks odd in the context of client API. We should definitely keep this set on operator, not dataset, because operator is not (directly) part of client API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interface removed. Hints are now on operator
@@ -39,9 +42,9 @@ | |||
* @return a dataset representing the output of the given operator | |||
*/ | |||
public static <IN, OUT> Dataset<OUT> createOutputFor( | |||
Flow flow, Dataset<IN> input, Operator<IN, OUT> op) { | |||
Flow flow, Dataset<IN> input, Operator<IN, OUT> op, Set<OutputHint> outputHints) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set output hints on operator, there is no need for additional parameter here. On Beam branch, I already further simplified this method, because the parameter flow
is redundant, because it can be retrieved from both input
and op
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hints are now on operator
import javax.annotation.Nullable; | ||
import java.util.Collection; | ||
import java.util.Collections; | ||
import java.util.Set; | ||
|
||
/** | ||
* {@code PCollection} that is input of a {@code Flow}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is not part of this PR, can we fix the comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in #271
@Override | ||
public Set<OutputHint> getHints() { | ||
return Collections.emptySet(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another +1 for moving hints to operator, this useless method will disappear. Moreover, the hints are called OutputHint
, and Dataset
has no "output". Operator does.
@@ -106,24 +105,13 @@ | |||
* | |||
* @return the dataset representing the new operator's output | |||
*/ | |||
default Dataset<V> outputValues() { | |||
default Dataset<V> outputValues(OutputHint... outputHints) { | |||
return MapElements | |||
.named("extract-values") | |||
.of(output()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hints should be set here I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't, output Dataset is with values. There would hint get lost. (last operator is MapElements). I added test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the hint get lost? The graph can be arbitrarily complicated and as a general rule, this should yeild the same runtime behavior:
Dataset input = ...;
Dataset<Pair> output = ReduceByKey.of(input)
...
.output(hints);
MapElements.of(input)
.using(Pair::getSecond)
.output()
.persist(...)
it should behave the same way as
Dataset input = ...;
ReduceByKey.of(input)
...
.outputValues(hints)
.persist(...)
That is because non-shuffle operators should forward hints to downstream Datasets (unless overridden). That is due to the construction of the hints. The operator that the hint applies to is really RBK
, not the MapElements
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion, for now Hints
doesn't propagate downstream in flow. Because we don't know if Hint
will be still valid in next operator.
Flow flow = input.getFlow(); | ||
FlatMap<IN, OUT> map = new FlatMap<>(name, flow, input, functor, evtTimeFn); | ||
FlatMap<IN, OUT> map = new FlatMap<>(name, flow, input, functor, evtTimeFn, Sets.newHashSet | ||
(outputHints)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening parentheses placement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -426,6 +443,11 @@ public boolean isCombinable() { | |||
return valueExtractor; | |||
} | |||
|
|||
@Override | |||
public Dataset<Pair<KEY, OUT>> output(OutputHint... outputHints) { | |||
return output(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks odd. Why two methods, when the varargs version can handle the zero argument version as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, because it cannot be reached (we are using Builders instead)
import java.util.Set; | ||
|
||
@Audience(Audience.Type.INTERNAL) | ||
public interface HintAware<HINT extends Hint> { | ||
public interface OutputHintAware<HINT extends OutputHint> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop this whole interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
/** | ||
* Extra information for runner about Dataset size | ||
*/ | ||
public enum SizeHint implements OutputHint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use @Audience annotation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
assertTrue(join.listInputs().stream().anyMatch(input -> ((Dataset) input).getHints().contains(new | ||
TestHint()))); | ||
assertTrue(join.listInputs().stream().anyMatch(input -> ((Dataset) input).getHints().contains(new | ||
TestHint2()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines wrap strangely.
0e18e81
to
9482640
Compare
9482640
to
6883329
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I have just few minor comments, good job 👍
} | ||
|
||
@SuppressWarnings("unchecked") | ||
ReduceByKey(String name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would prefer sticking with two constructors
Collections.emptySet()); | ||
} | ||
|
||
ReduceStateByKey(String name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
((Dataset) input).getProducer().getHints().contains(new Util.TestHint()))); | ||
assertTrue(join.listInputs().stream().anyMatch(input -> | ||
((Dataset) input).getProducer().getHints().contains(new Util.TestHint2()))); | ||
assertEquals(2, ((Dataset) join.listInputs().stream().findFirst().get()).getProducer().getHints().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reckon multilining this would be beneficial :)
this.valueExtractor = valueExtractor; | ||
} | ||
|
||
@Override | ||
public DAG<Operator<?, ?>> getBasicOps() { | ||
ReduceByKey<IN, KEY, Long, Long, W> reduceByKey = | ||
new ReduceByKey<>(getName(), input.getFlow(), input, | ||
keyExtractor, valueExtractor, windowing, Sums.ofLongs()); | ||
keyExtractor, valueExtractor, windowing, Sums.ofLongs(), this.getHints()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should stick with the same coding style :) Can we remove this.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fixed tests [euphoria-core] hints in getBasicOps are only on last operator [euphoria-core] code style corrections [euphoria-core] code style corrections 2
43ae4b2
to
0fa9ff8
Compare
[euphoria-core] #259 Hints are not runtime specific.
#259 first part of unifying hints
implemented
Its breaking change because
withHints
on Join operator no longer exists