-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v.next architectural considerations #2
Comments
Thanks for this, @BertKleewein! Regarding:
Keep in mind that you can mix different QoS levels with the same MQTT connection for different operations. It's hard for me to imagine how we'd handle this while separating out a QoS 2 specific module. |
Another discussion today. Some notes on important things: What happens when connection drops (socket error or no pingresp)?We don’t want to automatically reconnect (at the level of the If we take this as a given, there are two things to consider about handling dropped connections.
Session state may be different than
|
I think we can start the sequence all over again by using the
Maybe I'm missing something here, but in case i., the user was presented with an error. At that point, the user is aware and has the agency to try it again. So would we need to keep it in the offline store? |
Some thoughts on packet sequencingWe believe that we are going to use the "packet sequence" model to inform our client implementation. Here's the PR which shows off the basic idea. Shout out to @BertKleewein for driving this. TL;DR: We keep a container of state machines, where each state machine represents an in-flight MQTT operation. When a packet comes in or goes out, it will trigger a state transition for its corresponding operation. This allows us to make sense of all the packets going in and out and keep the code paths for different operations isolated. Client-initiated sequencesSome operations are initiated by the client (what I'm calling client-initiated sequences), while others are initiated by the server (server-initiated sequences). This post mainly concerns client-initiated sequences. My thinking is that there are six control packet types that can kick off a client-initiated sequence:
Three main types of client-initiated sequencesI believe we can break up client-initiated sequences into three major cases. I will call them fire-and-forget, non-overlapping, and overlapping sequences. Fire-and-forgetFire-and-forget sequences are sequences where the client sends a single packet and doesn't care about any other packet with respect to that operation. In other words, they are singleton packet sequences. There are two fire-and-forget sequences:
These operations are simple enough to where we may not even need to store the state machine in the container. Non-overlappingNon-overlapping sequences are sequences where no other sequence of the same initial control packet type can be in-flight at the same time:
Since these sequences are non-overlapping, we don't need to use unique numerical keys to store them, we can just use the name of the control packet type as the key instead. OverlappingOverlapping sequences are sequences where multiple of them are allowed to be in-flight at the same time. As such, they need unique identifiers, and the Packet ID of the initial packet in the sequence can serve this purpose. In other words, a client-initiated sequence is overlapping if and only if its initial packet has a packet ID:
|
@vishnureddy17 your comment all looks good. The only consideration is I'm trying to understand how Will Messages during Disconnect affect the sequencing of packets... I don't think it will contradict what you are currently saying tho. |
@vishnureddy17, another non-overlapping sequence on MQTT5 is for AUTH exchanges when re-authenticating. From what I can tell, this is like a CONNECT exchange, except without the CONNECT packet being sent. This doesn't affect your design, but it will affect the code around the sequencer. For example, I'm not sure if you can publish or subscribe between CONNECT and CONNACK, but you can continue do these things between AUTH and CONNACK if you're re-authenticating. |
@BertKleewein Good to know. I just assumed that AUTH packets were always preceded by a CONNECT. |
Revisiting our previous thinkingAfter a more careful reading of the MQTT spec and discussion with @BertKleewein, it seems like the following was misguided:
In particular, the MQTT spec seems to forbid retrying operations (quote taken from MQTTv5 spec, but a similar statement is in v3.1.1):
(emphasis mine) Now that we know that we shouldn't retry, we no longer need to be concerned about timer efficiency. We only need a timer for the keepalive mechanism. (There might be some other places where a timer is needed, but the number of timers needed will be small nevertheless). |
Some thoughts on error checking and validationChecking that individual packets are well-formed and respect the MQTT specificationFor checking individual packets, this should be done by mqtt-packet, which is one of the core dependencies MQTT.js and a package that the maintainers of MQTT.js can make updates to as needed. An error would be raised here if either we receive an invalid packet from the server (either maliciously or because there is a bug in the server implementation) or the MQTT.js client attempts to send an invalid packet to the server (which should only happen due to a bug in the client implementation). In either case, we should immediately shut down the connection. Checking that packets make sense in contextIt is possible that an individual packet is valid when looked at in isolation, but is problematic due in context due to:
Such problematic packets should be detected by the packet sequencer. If the packet in question is being sent to the MQTT.js client by the server, we should immediately shut down the connection. Otherwise, if the packet is being sent by the MQTT.js client to the server (which would imply that there is a bug at a higher level in the MQTT.js implementation), we should bubble up the error to the caller, but keep the connection alive since no invalid packets were sent over the wire in that case. Checking that expected packets are received in a reasonable amount of timeIf the MQTT.js client is expecting a packet (e.g. waiting for a PINGRESP after sending a PINGREQ, waiting for a PUBACK after sending a QoS 1 PUBLISH) and doesn't receive the packet from the server in a reasonable amount of time (which should be configurable by the user), the client should disconnect and the user should have the ability to attempt to reconnect with the same session. Note that this can happen even if the user chooses to disable keepalive. User errorThe user of the MQTT.js client will not directly use the packet sequencer but will instead interface with a layer above the sequencer. This upper layer should detect any user error and throw errors as appropriate. |
An open question on the MQTT specIn MQTT, the PUBLISH (where QoS > 0), SUBSCRIBE, and UNSUBSCRIBE, packets require a unique Packet Identifier. It is important to free up packet IDs for reuse, as there is a maximum of 2^16 unique packet identifiers that can be used simultaneously. If the client sends a publish where QoS > 0 and disconnects before the QoS handling is complete, the MQTT spec requires the client to retry the publish upon connecting with the existing session. This mechanism allows us to free up the packet ID, as it gives us an opportunity to restart the QoS process. An open question we have is how do we handle un-acked SUBSCRIBEs and UNSUBSCRIBEs? The MQTT spec does not mention the client having to retry un-acked SUBSCRIBEs and UNSUBSCRIBEs. If we disconnect before receiving the SUBACK or UNSUBACK, do we just lose those packet identifiers for the rest of that session? Are we allowed to retry the SUBSCRIBE and UNSUBSCRIBE to give us an opportunity to get back those packet identifiers? |
@vishnureddy17 Dropping this here since I just encountered it and may be of consideration regarding timers mqttjs/MQTT.js#1257 |
EDIT (@vishnureddy17): As we continue with our vNext work, our thoughts on some of these topics have changed. As such there are corrections and updates in this thread, and some of the earlier posts in the thread may be outdated.
We had a few architectural discussions today. I’m writing them here for posterity and tracking.
Message Stores as a separate component.
There are at least two different places we need to store messages. One is “in flight” messages. The other is “unable-to-send” messages. These need to be two different stores. Deciding what to do with “in-filght” messages (e.g. re-publish, timeout, etc) is the responsibility of mqtt.js. Deciding what to do with “unable-to-send” messages (e.g. fail, persist to disk, keep in ram and retry when connected) is more complex because different customers have different needs. We should support custom message stores for “unable-to-send” messages.
Architecting for QOS-2.
QOS-0 is easy, QOS-1 is a small addition, QOS-2 becomes more complex. We should figure out how we’re going to handle QOS-2 while we’re working on QOS-1. My intuition says that you might be able to shoehorn QOS-1 into the core transport. QOS-2 is complex enough that it warrants a separate state machine to keep track of packet states. If you try to intermingle QOS-2 logic with other code, it will quickly turn into spaghetti.
Re-publishing with the same MID.
For QOS-1 and QOS-2, if you need to re-publish a packet, you need to use the same MID. The current MQTT.JS doesn’t seem to support this, so you can get into a cascading failure scenario where PUBACK packets come a little late, and the code continuously re-publishes with new MIDs and ignores the PUBACKs with older MIDs. This realization has serious implications for design.
Efficient use of timers.
If MQTT.JS code keeps track of timeouts on a per-packet basis, there could potentially be 2^16 timers going. Instead of using individual timers for each packet, we need to be smarter and use a single timer that is smart enough to handle all of these packets. This timer could use a fixed period (e.g., fire once a second and handle all of the expirations that happened in that one second period) or it could use a variable period (e.g. keep track of the next expiration, then re-set with the expiration after that when the first expiration fires).
External transports.
The core mqtt.js package should include some set of standard transport implementations (eg. TCP, TCP-TLS, websockets). Other transports that exist in the current version should e supported via some sort of “pluggable transport interface”. It should not be necessary to update the core library to add a new transport.
Using “this is more than P0 or P1” as a hint to make a separate component.
Many of the features, such as offline message store and QOS-2, are not included in initial releases because they’re “less important” or “not part of core functionality”. We should use this as an indicator that we might want this functionality separated out of the code and moved into a separate module (either a separate .ts file or a different package).
@mqtt
namespace.We already have mqtt-packet and mqtt.js packages. If we start adding more packages, we should consider namespacing.
The text was updated successfully, but these errors were encountered: