[RFC] Implement "related events" API. #238

Lukasa · 2016-06-10T12:37:42Z

This is a potential solution to #234.

The problem we're trying to address here is twofold. Firstly, users like mitmproxy (represented by @Kriechi and @mhils) have a desire to reproduce frames as accurately as possible when they proxy the connection. This means that it's important to them to tell the difference between the two frame sequence of frames that goes [HEADERS(END_HEADERS), PRIORITY] and the one frame one that goes [HEADERS(PRIORITY, END_HEADERS)].

Secondly, we want to begin to introduce to hyper-h2 this notion of events happening "simultaneously": that is, where multiple state transitions happen in such a way that they all occur at the same time, meaning that the user cannot reliably determine the allowed actions unless they have processed all the events that were emitted at that time. Mostly this is a pretty benign issue, but it would be nice to lazily consume events rather than emit them all at once (see #228 for more on this).

That makes this change both an attempt to address the first point cleanly and a potential precursor to the second. In essence, you can think of this change as something of a "beta" API for one of the proposed changes in #228.

In this specific change, events that can semantically occur simultaneously with each other now carry a related_events property. This property is a frozenset (immutability is good!) of all the events that occurred simultaneously.

This PR is a RFC because I'm not 100% sure I'm happy with the API as it stands. In particular, there are a few warts that I'd like feedback on.

Only some events have related_events fields. This is done primarily for clarity (to suggest that only some events can be affected by simultaneity), but it potentially means that code that works with the events gets pretty ugly, because sometimes accessing event.related_events will raise an AttributeError, based only on type(event). So we may want to add related_events to all events.
related_events contains a frozenset of all the events that occurred at once, meaning that event in event.related_events is always True. That's simple and requires the least work in the lower-level library (which is good, because this is a hot section of code), but it leads to the possibility of surprises if the event is handled incautiously (e.g. if you process event and then process event.related_events you may process event twice). So we may want to create different frozensets for each event.

Anyway, I'd like people's feedback on this. @python-hyper/contributors, can I get your input here?

Kriechi · 2016-06-10T13:38:17Z

Thanks for taking a swing at this! Really appreciated! 👍

related_events contains a frozenset of all the events that occurred at once

Why did you choose this behaviour? Sounds strange to me - on first thought...
Not sure why I would want to the the event again while looping through related_events

I have my API-user hat on at the moment:
If I get a RequestReceived event, I have to loop through related_events and call isinstance each time to check if I might have gotten a StreamEnded event. That's kinda annoying - unless you have a clean idea on how this API is intended to be used?

Since you already have a very nice and comprehensive definition of which events can be related with each other (in your test comments) - how about baking this directly into the events?
E.g. A RequestReceived event could have a stream_endend = True/False and priority_weight, ... attribute.

Or maybe I just have a very simplistic view for a general purpose API design? 😄

Lukasa · 2016-06-10T14:35:02Z

Since you already have a very nice and comprehensive definition of which events can be related with each other (in your test comments) - how about baking this directly into the events?

Yeah, so this is a thing I've been thinking about. What I'm most worried about is the question of aliasing. Specifically, if we emit DataReceived with stream_ended=True, should we then emit StreamEnded? If not, users are exposed to a subtle bug wherein they may not notice state changes. If we do, then users need to be alert for the possibility that they may receive multiple instances of redundant information.

This isn't so bad with StreamEnded, but it's potentially somewhat more problematic with PriorityUpdated, which could cause incautious users to process the same priority update twice, churning a whole chunk of state for no good reason.

One possibility is to have events have specific flags for their possible related events (e.g. a DataReceived event could have a stream_ended_event property that is optionally filled with the event that will follow DataReceived). This has all the same issues as above, but you can at least keep a temporary set around and check whether you've processed an event already.

Otherwise, I can't see a system based on "aggregating" events that isn't awkward. Frankly, the least awkward mode here might be to hang the triggering frame off any event fired by that frame. That at least gives mitmproxy what it needs to rebuild equivalent frames, though we again need to worry about the wacky internal gymnastics h2 performs with regard to HEADERS frames.

Kriechi · 2016-06-11T12:17:22Z

events have specific flags for their possible related events (e.g. a DataReceived event could have a stream_ended_event property that is optionally filled

Sounds good!

I'm not sure how other people are receiving data and passing it to h2, but in mitmproxy we only read one frame at a time, and process it. This makes dealing with events much simpler, because it keeps the dependencies between streams limited.

Lukasa · 2016-06-11T13:24:29Z

How are you ensuring you read only one frame?

Kriechi · 2016-06-11T13:37:37Z

We are parsing the frame header and then read the exact length of the frame from the socket.
https://github.com/mitmproxy/mitmproxy/blob/c801f81373d5935f60bf950b1f266911a94ecf60/netlib/http/http2/framereader.py#L6

Lukasa · 2016-06-12T11:04:42Z

Ok, that makes sense. I think that's a relatively unusual approach: most implementations will just feed bytes directly into the state machine without checking where frame boundaries are. Similarly, however, most implementations don't really care about simultaneity except as it might cause unexpected ProtocolErrors to show up.

So I remain a bit unsure about how best to move this forward, but we can also prototype the interface based on set properties and see how the implementation there looks.

Lukasa · 2016-06-28T09:29:10Z

Alright, we decided to go with #240 instead.

Lukasa added 2 commits June 10, 2016 13:17

Add related_events property to some events.

c5d6002

Store related events.

8adcd12

Lukasa mentioned this pull request Jun 14, 2016

[RFC] Alternative approach to "related events". #240

Merged

Lukasa closed this Jun 28, 2016

Lukasa deleted the related-events branch June 28, 2016 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Implement "related events" API. #238

[RFC] Implement "related events" API. #238

Lukasa commented Jun 10, 2016 •

edited by Kriechi

Loading

Kriechi commented Jun 10, 2016

Lukasa commented Jun 10, 2016

Kriechi commented Jun 11, 2016

Lukasa commented Jun 11, 2016

Kriechi commented Jun 11, 2016

Lukasa commented Jun 12, 2016

Lukasa commented Jun 28, 2016

[RFC] Implement "related events" API. #238

[RFC] Implement "related events" API. #238

Conversation

Lukasa commented Jun 10, 2016 • edited by Kriechi Loading

Kriechi commented Jun 10, 2016

Lukasa commented Jun 10, 2016

Kriechi commented Jun 11, 2016

Lukasa commented Jun 11, 2016

Kriechi commented Jun 11, 2016

Lukasa commented Jun 12, 2016

Lukasa commented Jun 28, 2016

Lukasa commented Jun 10, 2016 •

edited by Kriechi

Loading