Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement option to start traversals at a path #358

Merged
merged 3 commits into from
Mar 6, 2022

Conversation

hannahhoward
Copy link
Collaborator

@hannahhoward hannahhoward commented Feb 12, 2022

Goals

This is an alternative way to "resume selector traversals" - similar to #354 - but tailored to different needs.

#354 is ideal for someone writing a CAR file who needs to stop on a block boundary and wants to retain information about blocks traversed so they can resume writing at a point.

This is ideal for someone who wants to maintain as little state information as possible and simply execute a selector as if it started at a specific path -- even the visits will not get called before the path. It's deal for a scenario like graphsync resumption -- where transmitting a list of "do not send these blocks" has proved quite problematic in the past (cause of the network overhead on large DAGs)

Implementation

It's really simple: you can now set a StartAtPath on the TraversalConfig

The progress keeps an additional piece of state of whether you're "past the startup path" at which point you can just not worry about checking

The code is a wee bit imperative and tricky, but it's tested and works.

For Discussion

Why not do it in a selector itself? Selectors are stateless and I don't believe you can execute this in a selector as a result.

@warpfork
Copy link
Collaborator

I think my main concern/ask for either this or #354 isn't even so much the code at this point -- it's: can we get a serial fixture to appear in ipld/ipld with it?

I realize some of this may feel like it's golang specific APIs, but I doubt it's going to stay that way -- other languages will need to replicate the behavior at some point, and we might as well be ready for them with fixtures. Also, personally, I'm finding serial fixtures a lot easier to review for this stuff.

There's a spec_test.go file in the same directory that should provide an example, although it may need some extending to test a new feature like this.

Copy link
Collaborator

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understood this, and it does what it does with very minimal added state, so I give it a tentative +1.

I'd still be much happier if we got purely serial fixtures for this, though. If I look at these tests and I'm not a go programmer, I don't think I could easily reverse engineer what they mean.

traversal/fns.go Outdated
@@ -37,15 +37,17 @@ type Progress struct {
Path datamodel.Path
Link datamodel.Link
}
Budget *Budget // If present, tracks "budgets" for how many more steps we're willing to take before we should halt.
SeenLinks map[datamodel.Link]struct{} // Set used to remember which links have been visited before, if Cfg.LinkVisitOnlyOnce is true.
PastStartAtPath bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like a comment here about what this state means.

traversal/fns.go Outdated Show resolved Hide resolved
Co-authored-by: Eric Myhre <[email protected]>
@hannahhoward
Copy link
Collaborator Author

hannahhoward commented Feb 17, 2022

re: serial fixtures

I think I could certainly implement a serial fixture test BUT I'm hesitant to do so.

I read over our selector spec and it actually does NOT outline our assumptions about execution and ordering: namely serial, depth first traversal. This strikes me as a more significant issue if we want to put any kind of "start at path" into the spec -- how can I know what skipping to a path means when I haven't defined the order of the nodes visited?

The current fixtures are technically valid with the lack of ordering for execution in the spect cause the header simply reads "expected visit events" (while the actual go test looks at ordering, the language-ing of "expected visit events" does not explicitly mean 'in this order').

Anyway, this seems like spec work that should be done -- especially since GraphSync's DDOS protections rely on predictable ordering. But it feels outside the auspices of this PR to do it.

@hannahhoward
Copy link
Collaborator Author

I believe this is ready to merge pending discussion of spec tests.

@warpfork
Copy link
Collaborator

Alright, SGTM.

FWIW, we do have several selector fixtures which touch on ordering. But it may well be true that it's not highlighted as much as it should be in other parts of the spec.

I'll quit pushing on spec improvements for here for now.

@warpfork
Copy link
Collaborator

warpfork commented Mar 1, 2022

To clarify: this is effectively mergable.

If we use this as a dry run of the recently proposed contribution and merge policy:

  • I already +1'd this. (I'm declining to give it a +2 because I reviewed it only briefly, and because I'm not thrilled about the lack of language-agnostic fixtures that are easy to review.)
  • You can self+1 this, which gives a net +2, and that's good enough to merge it.
  • Or you can poke someone else (perhaps @willscott ; I know he's already looked at it) to give it a +1, gives a net +2, and that's good enough to merge it.

I'm not clicking merge, because at the current count, it's got a net +1 (which is from me; the author can do a self+1, but it's not the default). But you have two routes for getting it to +2, and when it does so, you have the merge button available to click.

@willscott
Copy link
Member

(have been bugged. looks reasonable to me.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants