-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support shard-based routing #89
Conversation
* setup two shard peer nodes * set new .env for sharded nodes * add intervalmap data structure * add parsing of shard env variables
* scaffold new Proxies to handle shards * handle shard routing * update PruningOrDefaultProxies logs to Trace
also, prevent uint64 conversion underflows by routing any other block tag that is encoded to a negative number to the default proxy
// NewIntervalURLMap creates a new IntervalMap from a map of interval endpoint => url. | ||
// The intervals are exclusive of their endpoint. | ||
// ie. if the lowest value endpoint in the map is 10, the interval is for all numbers 1 through 9. | ||
func NewIntervalURLMap(urlByEndHeight map[uint64]*url.URL) IntervalURLMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to enforce line 9 comments of // The intervals are defined by their endpoints and must not overlap.
might be handy to validate the map is discontinuous and return an error in this new function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i don't follow. the intervals are only defined by one point (the endpoint), so the only way you couldn't create a valid interval is if you set the same endpoint to two different URLs.
are you suggesting i panic if they set the same endpoint multiple times for the same host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks though! drawing your attention to this made me realize i was still on the branch that had exclusive endpoints. because of some simplifications i made to the shard command, it was actually easier to make them inclusive of the end block. just updated the code & tests to reflect that change (that i already had in the docs 😅)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically I'm saying there are properties we are expecting this type to have that will effect how the service functions and is configured by hands by human, those two things in conjunction make me think it's worth adding some validation that the parsed config is not only syntactically correct but semantically correct - whether that be inclusive / exclusive or unique (fwiw I was referring to testing the inclusive property in terms that the supplied interval map doesn't have any gaps, e.g. config for a shard for blocks 10-50, and for shard 100-latest since this is something configured by a human operator and human operators err but the case of duplicate endpoints is also a good edge case to validate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha. at this point the value has already been parsed into a valid map. the encoding is such that you can't define non-continuous ranges of shards (only endblocks are defined. the first shard starts at height 1, and then each shard covers from the previous shard's endBlock + 1 to its defined end block).
i've added some more validations to the parsing of the env value to hopefully catch misunderstandings or misconfigurations:
- error if a host's shards are not ordered by end block (implies a misunderstanding in how the shard block ranges are defined)
- error if a host has multiple shards defined for the same end block (prevents quiet overrides of the first defined value(s))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽 so far, will review again when docs added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, one request for additional validation around shard interval config
parsing will error in cases of: * unordered shards * multiple shards for same endblock
Shard based routing allows multiple backends for a single host. This extends the rudimentary sharding implemented by height-based routing (pruning vs default node routing) to support routing to nodes that have fixed sets of heights in their data.
Previously, we supported backing an endpoint with pruning & archive nodes:
This PR allows breaking "everything else" down further. For example, if there are two shards each with a million blocks of data:
This is encoded in the new
PROXY_SHARD_BACKEND_HOST_URL_MAP
env variable as"Active" terminology is used to indicate that although a full-archive node would suffice, the node could be pruned of the first 2M blocks because that requests for those data are fielded by the shards.
Shards are ordered & are defined by their end blocks. It is assumed all blocks from 1 to the last shard's end block are contained within the collection of shards.
Opening PR as draft until i have the documentation updated.