-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement option to start traversals at a path #358
Conversation
I think my main concern/ask for either this or #354 isn't even so much the code at this point -- it's: can we get a serial fixture to appear in ipld/ipld with it? I realize some of this may feel like it's golang specific APIs, but I doubt it's going to stay that way -- other languages will need to replicate the behavior at some point, and we might as well be ready for them with fixtures. Also, personally, I'm finding serial fixtures a lot easier to review for this stuff. There's a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understood this, and it does what it does with very minimal added state, so I give it a tentative +1.
I'd still be much happier if we got purely serial fixtures for this, though. If I look at these tests and I'm not a go programmer, I don't think I could easily reverse engineer what they mean.
traversal/fns.go
Outdated
@@ -37,15 +37,17 @@ type Progress struct { | |||
Path datamodel.Path | |||
Link datamodel.Link | |||
} | |||
Budget *Budget // If present, tracks "budgets" for how many more steps we're willing to take before we should halt. | |||
SeenLinks map[datamodel.Link]struct{} // Set used to remember which links have been visited before, if Cfg.LinkVisitOnlyOnce is true. | |||
PastStartAtPath bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like a comment here about what this state means.
Co-authored-by: Eric Myhre <[email protected]>
re: serial fixtures I think I could certainly implement a serial fixture test BUT I'm hesitant to do so. I read over our selector spec and it actually does NOT outline our assumptions about execution and ordering: namely serial, depth first traversal. This strikes me as a more significant issue if we want to put any kind of "start at path" into the spec -- how can I know what skipping to a path means when I haven't defined the order of the nodes visited? The current fixtures are technically valid with the lack of ordering for execution in the spect cause the header simply reads "expected visit events" (while the actual go test looks at ordering, the language-ing of "expected visit events" does not explicitly mean 'in this order'). Anyway, this seems like spec work that should be done -- especially since GraphSync's DDOS protections rely on predictable ordering. But it feels outside the auspices of this PR to do it. |
I believe this is ready to merge pending discussion of spec tests. |
Alright, SGTM. FWIW, we do have several selector fixtures which touch on ordering. But it may well be true that it's not highlighted as much as it should be in other parts of the spec. I'll quit pushing on spec improvements for here for now. |
To clarify: this is effectively mergable. If we use this as a dry run of the recently proposed contribution and merge policy:
I'm not clicking merge, because at the current count, it's got a net +1 (which is from me; the author can do a self+1, but it's not the default). But you have two routes for getting it to +2, and when it does so, you have the merge button available to click. |
(have been bugged. looks reasonable to me.) |
Goals
This is an alternative way to "resume selector traversals" - similar to #354 - but tailored to different needs.
#354 is ideal for someone writing a CAR file who needs to stop on a block boundary and wants to retain information about blocks traversed so they can resume writing at a point.
This is ideal for someone who wants to maintain as little state information as possible and simply execute a selector as if it started at a specific path -- even the visits will not get called before the path. It's deal for a scenario like graphsync resumption -- where transmitting a list of "do not send these blocks" has proved quite problematic in the past (cause of the network overhead on large DAGs)
Implementation
It's really simple: you can now set a StartAtPath on the TraversalConfig
The progress keeps an additional piece of state of whether you're "past the startup path" at which point you can just not worry about checking
The code is a wee bit imperative and tricky, but it's tested and works.
For Discussion
Why not do it in a selector itself? Selectors are stateless and I don't believe you can execute this in a selector as a result.