Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Implement "related events" API. #238

Closed
wants to merge 2 commits into from
Closed

Conversation

Lukasa
Copy link
Member

@Lukasa Lukasa commented Jun 10, 2016

This is a potential solution to #234.

The problem we're trying to address here is twofold. Firstly, users like mitmproxy (represented by @Kriechi and @mhils) have a desire to reproduce frames as accurately as possible when they proxy the connection. This means that it's important to them to tell the difference between the two frame sequence of frames that goes [HEADERS(END_HEADERS), PRIORITY] and the one frame one that goes [HEADERS(PRIORITY, END_HEADERS)].

Secondly, we want to begin to introduce to hyper-h2 this notion of events happening "simultaneously": that is, where multiple state transitions happen in such a way that they all occur at the same time, meaning that the user cannot reliably determine the allowed actions unless they have processed all the events that were emitted at that time. Mostly this is a pretty benign issue, but it would be nice to lazily consume events rather than emit them all at once (see #228 for more on this).

That makes this change both an attempt to address the first point cleanly and a potential precursor to the second. In essence, you can think of this change as something of a "beta" API for one of the proposed changes in #228.

In this specific change, events that can semantically occur simultaneously with each other now carry a related_events property. This property is a frozenset (immutability is good!) of all the events that occurred simultaneously.

This PR is a RFC because I'm not 100% sure I'm happy with the API as it stands. In particular, there are a few warts that I'd like feedback on.

  1. Only some events have related_events fields. This is done primarily for clarity (to suggest that only some events can be affected by simultaneity), but it potentially means that code that works with the events gets pretty ugly, because sometimes accessing event.related_events will raise an AttributeError, based only on type(event). So we may want to add related_events to all events.
  2. related_events contains a frozenset of all the events that occurred at once, meaning that event in event.related_events is always True. That's simple and requires the least work in the lower-level library (which is good, because this is a hot section of code), but it leads to the possibility of surprises if the event is handled incautiously (e.g. if you process event and then process event.related_events you may process event twice). So we may want to create different frozensets for each event.

Anyway, I'd like people's feedback on this. @python-hyper/contributors, can I get your input here?

@Kriechi
Copy link
Member

Kriechi commented Jun 10, 2016

Thanks for taking a swing at this! Really appreciated! 👍

related_events contains a frozenset of all the events that occurred at once

Why did you choose this behaviour? Sounds strange to me - on first thought...
Not sure why I would want to the the event again while looping through related_events

I have my API-user hat on at the moment:
If I get a RequestReceived event, I have to loop through related_events and call isinstance each time to check if I might have gotten a StreamEnded event. That's kinda annoying - unless you have a clean idea on how this API is intended to be used?

Since you already have a very nice and comprehensive definition of which events can be related with each other (in your test comments) - how about baking this directly into the events?
E.g. A RequestReceived event could have a stream_endend = True/False and priority_weight, ... attribute.

Or maybe I just have a very simplistic view for a general purpose API design? 😄

@Lukasa
Copy link
Member Author

Lukasa commented Jun 10, 2016

Since you already have a very nice and comprehensive definition of which events can be related with each other (in your test comments) - how about baking this directly into the events?

Yeah, so this is a thing I've been thinking about. What I'm most worried about is the question of aliasing. Specifically, if we emit DataReceived with stream_ended=True, should we then emit StreamEnded? If not, users are exposed to a subtle bug wherein they may not notice state changes. If we do, then users need to be alert for the possibility that they may receive multiple instances of redundant information.

This isn't so bad with StreamEnded, but it's potentially somewhat more problematic with PriorityUpdated, which could cause incautious users to process the same priority update twice, churning a whole chunk of state for no good reason.

One possibility is to have events have specific flags for their possible related events (e.g. a DataReceived event could have a stream_ended_event property that is optionally filled with the event that will follow DataReceived). This has all the same issues as above, but you can at least keep a temporary set around and check whether you've processed an event already.

Otherwise, I can't see a system based on "aggregating" events that isn't awkward. Frankly, the least awkward mode here might be to hang the triggering frame off any event fired by that frame. That at least gives mitmproxy what it needs to rebuild equivalent frames, though we again need to worry about the wacky internal gymnastics h2 performs with regard to HEADERS frames.

@Kriechi
Copy link
Member

Kriechi commented Jun 11, 2016

events have specific flags for their possible related events (e.g. a DataReceived event could have a stream_ended_event property that is optionally filled

Sounds good!

I'm not sure how other people are receiving data and passing it to h2, but in mitmproxy we only read one frame at a time, and process it. This makes dealing with events much simpler, because it keeps the dependencies between streams limited.

@Lukasa
Copy link
Member Author

Lukasa commented Jun 11, 2016

How are you ensuring you read only one frame?

@Kriechi
Copy link
Member

Kriechi commented Jun 11, 2016

We are parsing the frame header and then read the exact length of the frame from the socket.
https://github.com/mitmproxy/mitmproxy/blob/c801f81373d5935f60bf950b1f266911a94ecf60/netlib/http/http2/framereader.py#L6

@Lukasa
Copy link
Member Author

Lukasa commented Jun 12, 2016

Ok, that makes sense. I think that's a relatively unusual approach: most implementations will just feed bytes directly into the state machine without checking where frame boundaries are. Similarly, however, most implementations don't really care about simultaneity except as it might cause unexpected ProtocolErrors to show up.

So I remain a bit unsure about how best to move this forward, but we can also prototype the interface based on set properties and see how the implementation there looks.

@Lukasa
Copy link
Member Author

Lukasa commented Jun 28, 2016

Alright, we decided to go with #240 instead.

@Lukasa Lukasa closed this Jun 28, 2016
@Lukasa Lukasa deleted the related-events branch June 28, 2016 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants