Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current iteration #242

Merged
merged 26 commits into from
Sep 17, 2018
Merged

Current iteration #242

merged 26 commits into from
Sep 17, 2018

Conversation

mvanaken
Copy link
Contributor

@mvanaken mvanaken commented Jul 31, 2018

Implementation of current iteration for Rep, Repn and While loops. The loops are 0-based, meaning that the first iteration of a loop returns 0 when CURRENT_ITERATION is used.
This issue is scoped to only implement referencing to the current iteration. Current iteration of nested loops are supported, but the implementation to reference to a parent loop from within nested loops is handled in a different issue (#240).

TODO based on review comments:

  • Include token check within iterations, to ensure that the closeBranch is done with the same token as the addBranch?
  • Improve last test of CurrentIterationTests, which is that a 0-sized def should not influence the current iteration count.

@mvanaken mvanaken requested review from jvdb and rdvdijk July 31, 2018 21:52
@mvanaken mvanaken added this to the 7.2.0 milestone Jul 31, 2018
@codecov
Copy link

codecov bot commented Jul 31, 2018

Codecov Report

Merging #242 into master will decrease coverage by 0.14%.
The diff coverage is 96.36%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #242      +/-   ##
============================================
- Coverage       100%   99.85%   -0.15%     
- Complexity     1016     1025       +9     
============================================
  Files            90       92       +2     
  Lines          1381     1387       +6     
  Branches        141      141              
============================================
+ Hits           1381     1385       +4     
- Misses            0        1       +1     
- Partials          0        1       +1
Impacted Files Coverage Δ Complexity Δ
...src/main/java/io/parsingdata/metal/token/Post.java 100% <100%> (ø) 10 <2> (ø) ⬇️
.../src/main/java/io/parsingdata/metal/token/Sub.java 100% <100%> (ø) 20 <1> (ø) ⬇️
...src/main/java/io/parsingdata/metal/token/RepN.java 100% <100%> (ø) 11 <4> (-4) ⬇️
.../src/main/java/io/parsingdata/metal/token/Seq.java 100% <100%> (ø) 12 <1> (ø) ⬇️
.../src/main/java/io/parsingdata/metal/token/Pre.java 100% <100%> (ø) 10 <1> (ø) ⬇️
.../src/main/java/io/parsingdata/metal/token/Tie.java 100% <100%> (ø) 16 <1> (ø) ⬇️
...java/io/parsingdata/metal/token/IterableToken.java 100% <100%> (ø) 12 <12> (?)
...rc/main/java/io/parsingdata/metal/token/Token.java 100% <100%> (ø) 15 <1> (+1) ⬆️
.../src/main/java/io/parsingdata/metal/token/Cho.java 100% <100%> (ø) 11 <1> (ø) ⬇️
...rc/main/java/io/parsingdata/metal/token/While.java 100% <100%> (ø) 9 <5> (-5) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f432c33...5442e7a. Read the comment docs.

@mvanaken mvanaken requested a review from ccreeten August 1, 2018 06:48
}

private Trampoline<ParseGraph> findCurrentIterable(final ParseItem item, final ParseGraph iterableCandidate) {
if (!item.isGraph()) { return complete(() -> iterableCandidate); }
Copy link
Collaborator

@ccreeten ccreeten Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if I would do something like rep(def("val", CURRENT_ITERATION)) or rep(tie(any("val"), CURRENT_ITERATION))? Because when I eval the expression, there is no value to stop at? Seems like it would blow up?

Copy link
Contributor Author

@mvanaken mvanaken Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean infinite loop, no, but there will be a null pointer exception, because the head becomes null. Nice catch! Will add item == null to this if statement and add tests.

Copy link
Contributor Author

@mvanaken mvanaken Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see your point now. Because a def of size 0 is not added to the parseGraph. However, then the iterable is in fact empty, which will return an empty value and the parsing will fail, see line 50. So it will not blow up.

It is a nice discussion though what you would expect to be "an iteration". Should a parsing attempt, where eventually 0 bytes are parsed, be considered an iteration? Also, would we expect rep(def("val", add(CURRENT_ITERATION, con(1)))) to work?

Copy link
Collaborator

@ccreeten ccreeten Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if I read the code as is, I think your first comment does happen (I indeed meant something like a NPE)? Because there is no value yet when you eval CURRENT_ITERATION, and the only way the recursion stops is if it encounters a value? Could you add these samples as a test?

As to the discussion, I would probably expect your example to work, because we are in fact in an iteration? I also think conceptually a 0 bytes parsed should still be considered an iteration, as I feel it is the iteration that happens, not what happens inside the iteration (wow, confusing sentence 😄). What would be the downsides of this, or what problems would it pose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented CURRENT_INDEX differently. Instead of extracting it from ParseGraph, it is kept in the ParseState. I added tests for examples similar to rep(def("val", add(CURRENT_ITERATION, con(1)))).

I will need to improve the code a little before I commit.


/**
* A {@link ValueExpression} that represents the current iteration in an
* iterable {@link io.parsingdata.metal.token.Token} (e.g. when inside a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps import the class names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,84 @@
/*
* Copyright 2013-2016 Netherlands Forensic Institute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2018?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return ImmutableList.create(Optional.of(createFromNumeric(currentIteration, new Encoding())));
}

private Trampoline<ParseGraph> findCurrentIterable(final ParseItem item, final ParseGraph iterableCandidate) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like some inline comments or javadoc stating what these things do 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applicable anymore. ;) These methods are removed, because of the new implementation.


@Override
public ImmutableList<Optional<Value>> eval(final ParseState parseState, final Encoding encoding) {
final ParseGraph currentIterable = findCurrentIterable(parseState.order, ParseGraph.EMPTY).computeResult();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool way to calculate the iteration by the way 😄

…within ParseState, instead of extracting it out of the ParseGraph. The iteration is now independent on the parse result.
Copy link
Contributor

@rdvdijk rdvdijk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me! Please see my comments.

Also: I think you need to cover one more branch somewhere to get coverage back to 100%.

.gitignore Outdated
@@ -1,5 +1,6 @@
# Maven
target/
out/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, where did this come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good question! Apparently, IntelliJ has created this directory during build. Unclear why though. Wasn't planning to commit this. Will remove it.

if (parseState.iterations.head == null) {
return ImmutableList.create(Optional.empty());
}
return ImmutableList.create(Optional.of(createFromNumeric(parseState.iterations.head, new Encoding())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default new Encoding() is probably good enough here. For a minute I thought we needed to pass the encoding on, but I think this is correct.

Side note: Don't we have a DEFAULT_ENCODING available somewhere? I wonder how often we do new Encoding() like this.

Copy link
Contributor Author

@mvanaken mvanaken Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is not a default encoding as constant available. We do this 6 times in core; 4 in Shorthand, 1 in CurrentOffset, and 1 in CurrentIteration. In test scope we call it 7 times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point: #245

return Arrays.asList(new Object[][] {
{ "[0, 1, 2, 3, 255] rep(CURRENT_ITERATION), def(255)", seq(rep(VALUE_EQ_ITERATION), VALUE_EQ_255), stream(0, 1, 2, 3, 255), enc(), true },
{ "[0, 1, 2, 3] repn=4(CURRENT_ITERATION)", repn(VALUE_EQ_ITERATION, con(4)), stream(0, 1, 2, 3), enc(), true },
{ "[255, 0, 1, 2, 3, 255] def(255), while<3(CURRENT_ITERATION), def (255)", seq(VALUE_EQ_255, whl(VALUE_EQ_ITERATION, not(eq(con(3)))), VALUE_EQ_255), stream(255, 0, 1, 2, 3, 255), enc(), true },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: the description of the test should actually be while!=3 instead of while<3 right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. I will adjust it.

{ "[0, 1] seq(!CURRENT_ITERATION, ...)", seq(VALUE_NOT_EQ_ITERATION, VALUE_NOT_EQ_ITERATION), stream(0, 1), enc(), true },
{ "[0] !CURRENT_ITERATION", VALUE_NOT_EQ_ITERATION, stream(0), enc(), true },
{ "[0 | 0, 1 | 0, 0, 2 | 0, 0, 0, 3] rep(CURRENT_ITERATION)", rep(def("value", add(CURRENT_ITERATION, con(1)), eqNum(CURRENT_ITERATION))), stream(0, 0, 1, 0, 0, 2, 0, 0, 0, 3), enc(), true },
{ "[1, 1, 0, 1 | 0 | 2 | 3] repn=4(size), rep(def(any, sizeRef(CURRENT_INDEX)))", seq(repn(def("size", 1), con(4)), rep(def("value", nth(ref("size"), CURRENT_ITERATION)))), stream(1, 1, 0, 1, 0, 2, 3), enc(), true },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read this last case over a few times until I understood what is going on here. Would it be possible to add a separate unit test to make absolutely sure that this works as intended? The expression nth(ref("size"), CURRENT_ITERATION) will result in three times a 1 and one time a 0. But did that happen in the expected order, for example?

Copy link
Contributor Author

@mvanaken mvanaken Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes, it happed in the expected order, which is why I omitted the 1, but I do understand why it is confusing. I wanted to add a test that resulted in a 0-size def, which does not result into an item within the parseGraph, but does count as a iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know when you've added that test 👍

Copy link
Contributor

@jvdb jvdb Sep 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved the above test to verify this behaviour. It now does the following:

  1. Iterate over a list of sizes (repn=4, read in the values (1, 1, 0, 1)).
  2. Iterate again for 4 times (repn=4), this time only parsing a value if not(eq(nth(ref("size"), CURRENT_ITERATION), so if the CURRENT_ITERATIONth value in the list is non-zero (this happens in iterations 0, 1 and 3.
  3. Then parse a value of 1 byte that must be equal to CURRENT_ITERATION.

The fact that the parsed and value-checked values are 0, 1 and 3 verifies that CURRENT_ITERATION continues counting even if an iteration does not produce a value (when is implemented with a zero-size def inside a cho).

I agree with the original comment that the test is a little complex, maybe now even more so. We should probably simplify it, but for now I think verifying behaviour is most important.

@@ -36,43 +37,55 @@
public final ParseGraph order;
public final BigInteger offset;
public final Source source;
public final ImmutableList<BigInteger> iterations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that now could happen is that closeBranch(Token) is called with a Token that was not the one used for addBranch(Token). This would of course be a bug, but since these methods are public, we might need to guard for it. We could fix that by keeping track which token is used for the iteration. E.g. something like:

public final ImmutableList<Pair<Token, BigInteger>> iterations;

Whenever we call closeBranch(Token), we can verify that the Token of the current iteration is indeed the one being passed. We could even pass Token to iter() and be sure that we are iterating the expected Token (but that might be overkill).

(Note: We have a Pair class in Selection, but it is not generic.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I'm usually not a fan of using a Pair class, but I cannot think of an alternative, other then creating a new class, which would most likely be implemented as the Pair class. ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it in this case be an identity comparison instead of a (semantic) equality?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a good point... If we want to really make sure we are using the same object, instead of just an equal one.

}

public static ParseState createFromByteStream(final ByteStream input) {
return createFromByteStream(input, ZERO);
}

public ParseState addBranch(final Token token) {
return new ParseState(order.addBranch(token), source, offset);
if (token.isIterable()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a oneliner... Does @jvdb approve of this heresy? 🤣

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could, but should it? ;) The code now clearly shows the difference between the two states, which will not be as clear if it is a oneliner. @jvdb will approve this, I'm sure! :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be entirely honest, seeing multiple identical addBranch()-invocations does hurt a little... But I'm not religious about this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so you both would prefer:

return new ParseState(order.addBranch(token), source, offset, token.isIterable() ?
 iterations.add(ZERO) : iterations);

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that yes 👍

Copy link
Contributor

@jvdb jvdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little worried when looking at the top of the diff and saw the addition of a list to the ParseState. But it does make the implementation of CurrentIteration so much simpler (and more correct in corner cases) that it's absolutely worth the added data (which is going to be negligible in terms of memory usage/speed anyway).

I think @rdvdijk's point about checking that the correct Token is closed is a valid point though, so please let's have a look at doing that.

Furthermore, a very nice optimization of the iterable tokens by using a base class. The amount of red in the diff of the affected classes says enough!

I do hope however that we can fix the BetterCodeHub score though, so let's look at that as well.

In short: excellent work!!

this.order = checkNotNull(order, "order");
this.source = checkNotNull(source, "source");
this.offset = checkNotNegative(offset, "offset");
this.iterations = checkNotNull(iterations, "iteration");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, missing s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

return new ParseState(order.add(parseReference), source, offset, iterations);
}

public ParseState iter() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterate/addIteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, iterate it is.

/**
* A {@link ValueExpression} that represents the 0-based current iteration in an
* iterable {@link Token} (e.g. when inside a {@link Rep} or {@link RepN}).
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also mention it can be checked if a Token is iterable, using a method reference link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.


public final Token token;

IterableToken(String name, final Token token, Encoding encoding) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

this.token = checkNotNull(token, "token");
}

protected final Optional<ParseState> parse(final Environment environment, final Function<Environment, Boolean> stopCondition, final Function<Environment, Optional<ParseState>> ifIterationFails) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function<Env, Boolean> = Predicate<Env> :) (unless you would want it to return a null)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tnx!

.map(nextParseState -> intermediate(() -> iterate(environment.withParseState(nextParseState), count - 1)))
.orElseGet(() -> complete(Util::failure));
final BigInteger count = counts.head.get().asNumeric();
return parse(environment, env -> env.parseState.iterations.head.compareTo(count) >= 0, env -> failure());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a method like ParseState#currentIteration = iterations.head?

return complete(() -> success(environment.parseState.closeBranch(this)));
}
return token
.parse(environment)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My head is a bit foggy, but isn't this an unsafe recursive call? If you would just nest iterable tokens which do not stop at the first iteration?

@ccreeten
Copy link
Collaborator

The updated current iteration implementation is very nicely done by the way 😄

{ "[0, 1] seq(!CURRENT_ITERATION, ...)", seq(VALUE_NOT_EQ_ITERATION, VALUE_NOT_EQ_ITERATION), stream(0, 1), enc(), true },
{ "[0] !CURRENT_ITERATION", VALUE_NOT_EQ_ITERATION, stream(0), enc(), true },
{ "[0 | 0, 1 | 0, 0, 2 | 0, 0, 0, 3] rep(CURRENT_ITERATION)", rep(def("value", add(CURRENT_ITERATION, con(1)), eqNum(CURRENT_ITERATION))), stream(0, 0, 1, 0, 0, 2, 0, 0, 0, 3), enc(), true },
{ "[1, 1, 0, 1 | 0 | 2 | 3] repn=4(size), rep(def(any, sizeRef(CURRENT_INDEX)))", seq(repn(def("size", 1), con(4)), rep(def("value", nth(ref("size"), CURRENT_ITERATION)))), stream(1, 1, 0, 1, 0, 2, 3), enc(), true },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know when you've added that test 👍

@jvdb jvdb merged commit 762b9f1 into master Sep 17, 2018
@jvdb jvdb deleted the current-index branch September 17, 2018 12:58
@jvdb jvdb removed this from the 8.0.0 milestone Sep 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants