Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of a response that comes after MRP resends end. #29640

Merged
merged 2 commits into from
Oct 9, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions src/messaging/ExchangeContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ bool ExchangeContext::IsResponseExpected() const
void ExchangeContext::SetResponseExpected(bool inResponseExpected)
{
mFlags.Set(Flags::kFlagResponseExpected, inResponseExpected);
SetWaitingForResponseOrAck(inResponseExpected);
pidarped marked this conversation as resolved.
Show resolved Hide resolved
}

void ExchangeContext::UseSuggestedResponseTimeout(Timeout applicationProcessingTimeout)
Expand Down Expand Up @@ -489,6 +490,9 @@ void ExchangeContext::NotifyResponseTimeout(bool aCloseIfNeeded)
}
#endif // CONFIG_DEVICE_LAYER && CHIP_CONFIG_ENABLE_ICD_SERVER

// Grab the value of WaitingForResponseOrAck() before we mess with our state.
bool mrpTimedOut = WaitingForResponseOrAck();
bzbarsky-apple marked this conversation as resolved.
Show resolved Hide resolved

SetResponseExpected(false);

// Hold a ref to ourselves so we can make calls into our delegate that might
Expand All @@ -502,9 +506,8 @@ void ExchangeContext::NotifyResponseTimeout(bool aCloseIfNeeded)
{
// If we timed out _after_ getting an ack for the message, that means
// the session is probably fine (since our message and the ack got
// through), so don't mark the session defunct unless we have an
// un-acked message here.
if (IsMessageNotAcked())
// through), so don't mark the session defunct unless MRP also timed out.
if (mrpTimedOut)
{
if (mSession->IsSecureSession() && mSession->AsSecureSession()->IsCASESession())
{
Expand Down Expand Up @@ -596,7 +599,7 @@ CHIP_ERROR ExchangeContext::HandleMessage(uint32_t messageCounter, const Payload
return CHIP_NO_ERROR;
}

if (IsMessageNotAcked())
if (IsWaitingForAck())
{
// The only way we can get here is a spec violation on the other side:
// we sent a message that needs an ack, and the other side responded
Expand Down
12 changes: 8 additions & 4 deletions src/messaging/ReliableMessageContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,14 @@ ReliableMessageMgr * ReliableMessageContext::GetReliableMessageMgr()
return static_cast<ExchangeContext *>(this)->GetExchangeMgr()->GetReliableMessageMgr();
}

void ReliableMessageContext::SetMessageNotAcked(bool messageNotAcked)
void ReliableMessageContext::SetWaitingForAck(bool waitingForAck)
{
mFlags.Set(Flags::kFlagMessageNotAcked, messageNotAcked);
mFlags.Set(Flags::kFlagWaitingForAck, waitingForAck);

#if CONFIG_DEVICE_LAYER && CHIP_CONFIG_ENABLE_ICD_SERVER
DeviceLayer::ChipDeviceEvent event;
event.Type = DeviceLayer::DeviceEventType::kICDMsgAckSyncEvent;
event.AckSync.awaitingAck = messageNotAcked;
event.AckSync.awaitingAck = waitingForAck;
CHIP_ERROR status = DeviceLayer::PlatformMgr().PostEvent(&event);
if (status != CHIP_NO_ERROR)
{
Expand Down Expand Up @@ -103,7 +103,11 @@ CHIP_ERROR ReliableMessageContext::FlushAcks()
void ReliableMessageContext::HandleRcvdAck(uint32_t ackMessageCounter)
{
// Msg is an Ack; Check Retrans Table and remove message context
if (!GetReliableMessageMgr()->CheckAndRemRetransTable(this, ackMessageCounter))
if (GetReliableMessageMgr()->CheckAndRemRetransTable(this, ackMessageCounter))
{
SetWaitingForResponseOrAck(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just set SetWaitingForAck(false) here, instead?

Copy link
Contributor Author

@bzbarsky-apple bzbarsky-apple Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because what if ackMessageCounter is an ack for a totally different message counter value than the one we are waiting for an ack for? As in, it's some stale ack that was sent for a previous message, delayed in transit, and is now getting delivered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, isn't HandleRcvdAck actually processing the Ack for the message it is expecting one for? So, setting the flag to false is essentially acknowledging an MRP Ack, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, isn't HandleRcvdAck actually processing the Ack for the message it is expecting one for?

Nope, it's processing an ack if we got an ack. It doesn't check what it's an ack for; that's checked down in CheckAndRemRetransTable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. But I meant where it is being currently set inside HandleRcvdAck() in your change after checking the retransmit table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that is indeed the "We got the ack we were expecting" case. But in that case, IsWaitingForAck has already been set false, no? So I am not sure I understand the question here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, precisely. IsWaitingForAck is set to false when the RetransEntry is destroyed. So, gotMRPAck in ExchangeContext.cpp:494 should be assigned to !IsWaitingForAck() instead, isn't it? If we need separate tracking of App response and MRP Ack, we can achieve that using the existing ResponseExpected(for app response) and WaitingForAck(for MRP), no? My point was whether there is a need for ResponseOrAck flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsWaitingForAck is set to false when the RetransEntry is destroyed

Which happens if either we get the ack or we stop waiting for the ack.

In other words, IsWaitingForAck tells you nothing about whether we got an ack. At least after this PR, or before #29173

Fundamentally, we have the following states that we need to be able to tell apart in OnResponseTimeout and in "message received" handling:

  1. We have not yet gotten an ack, we have not timed out MRP.
  2. We have not yet gotten an ack, but we have timed out MRP.
  3. We got an ack.

OnResponseTimeout needs to handle state 3 different from states 1 and 2, right? It cannot use the "got response" bit for that, since that bit is false by assumption in OnResponseTimeout, so we need a bit to tell those cases apart.

"message received" handling needs to handle state 1 differently from state 2: In state 1 it needs to drop messages that are acking the wrong thing, while in state 2 it needs to process them because it has no way to tell that it's the wrong thing. This needs a second bit to tell the cases apart, yes?

}
else
{
// This can happen quite easily due to a packet with a piggyback ack
// being lost and retransmitted.
Expand Down
38 changes: 30 additions & 8 deletions src/messaging/ReliableMessageContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,11 +114,11 @@ class ReliableMessageContext
*/
bool IsAckPending() const;

/// Determine whether there is message hasn't been acknowledged.
bool IsMessageNotAcked() const;
/// Determine whether the reliable message context is waiting for an ack.
bool IsWaitingForAck() const;

/// Set whether there is a message hasn't been acknowledged.
void SetMessageNotAcked(bool messageNotAcked);
/// Set whether the reliable message context is waiting for an ack.
void SetWaitingForAck(bool waitingForAck);

/// Set if this exchange is requesting Sleepy End Device active mode
void SetRequestingActiveMode(bool activeMode);
Expand All @@ -136,6 +136,9 @@ class ReliableMessageContext
ReliableMessageMgr * GetReliableMessageMgr();

protected:
bool WaitingForResponseOrAck() const;
void SetWaitingForResponseOrAck(bool waitingForResponseOrAck);

enum class Flags : uint16_t
{
/// When set, signifies that this context is the initiator of the exchange.
Expand All @@ -147,8 +150,10 @@ class ReliableMessageContext
/// When set, automatically request an acknowledgment whenever a message is sent via UDP.
kFlagAutoRequestAck = (1u << 2),

/// When set, signifies there is a message which hasn't been acknowledged.
kFlagMessageNotAcked = (1u << 3),
/// When set, signifies the reliable message context is waiting for an
/// ack: a message that needs an ack has been sent, no ack has been
/// received, and we have not yet run out of MRP retries.
kFlagWaitingForAck = (1u << 3),

/// When set, signifies that there is an acknowledgment pending to be sent back.
kFlagAckPending = (1u << 4),
Expand All @@ -172,6 +177,13 @@ class ReliableMessageContext

/// When set, ignore session being released, because we are releasing it ourselves.
kFlagIgnoreSessionRelease = (1u << 10),

/// When set:
///
/// (1) We sent a message that expected a response (hence
/// IsResponseExpected() is true).
/// (2) We have received neither a response nor an ack for that message.
kFlagWaitingForResponseOrAck = (1u << 11),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this flag? Because once we receive the Message response, we would be practically done with the Exchange and not care about MRP Acks. And there is only one timeout for the Message response(compared to multiple for MRP retries)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would be practically done with the Exchange

That's not at all true. Many exchanges have long trains of messages going back and forth.

For the rest, see #29640 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that we do. But we do not have multiple outstanding sent messages. So, an Exchange is a StopAndWait protocol and one and only one message is waiting for a delivery Ack at a given time.
I might have overlooked the case where there are multiple back-and-forth over a single exchange, but my point was about the single sent outstanding message. It is not a sliding window where a received response is not acking an outstanding sent message(since there is only one outstanding sent message).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Again, we could treat a non-duplicate response with a piggyback ack as an implicit ack for "whatever the last thing we sent was, even if we have no way to check whether the piggyback is for the right thing", as #29640 (comment) describes. If you think that would be clearer, we can do that. We still need two flags to track the "no ack" and "we have no way to check" states, which are not identical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is technically true in our stop-and-wait scenario, that a received response is actually a true acknowledgment for the message sent earlier(even if all MRP retries have been exhausted), we can ignore the piggybacked Ack in the response(if MRP timeout has happened), but send the response up to the application if the response timeout has not happened and the exchange is still open. For the 2 states, I think WaitingForAck for MRPAck and IsResponseExpected for app response should cover the 2 scenarios, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if MRP timeout has happened

This parenthetical is really important: There is nothing tracking whether this has happened right now. We could have a bit to track this, if we want, and then that bit plus the "waiting for ack" bit plus the "response expected" bit would let us detect the states we want.

For the 2 states

There are more than 2 states. See #29640 (comment)

};

BitFlags<Flags> mFlags; // Internal state flags
Expand Down Expand Up @@ -216,9 +228,9 @@ inline bool ReliableMessageContext::IsAckPending() const
return mFlags.Has(Flags::kFlagAckPending);
}

inline bool ReliableMessageContext::IsMessageNotAcked() const
inline bool ReliableMessageContext::IsWaitingForAck() const
{
return mFlags.Has(Flags::kFlagMessageNotAcked);
return mFlags.Has(Flags::kFlagWaitingForAck);
}

inline bool ReliableMessageContext::HasPiggybackAckPending() const
Expand Down Expand Up @@ -251,5 +263,15 @@ inline bool ReliableMessageContext::IsEphemeralExchange() const
return mFlags.Has(Flags::kFlagEphemeralExchange);
}

inline bool ReliableMessageContext::WaitingForResponseOrAck() const
{
return mFlags.Has(Flags::kFlagWaitingForResponseOrAck);
}

inline void ReliableMessageContext::SetWaitingForResponseOrAck(bool waitingForResponseOrAck)
{
mFlags.Set(Flags::kFlagWaitingForResponseOrAck, waitingForResponseOrAck);
}

} // namespace Messaging
} // namespace chip
12 changes: 3 additions & 9 deletions src/messaging/ReliableMessageMgr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ namespace Messaging {
ReliableMessageMgr::RetransTableEntry::RetransTableEntry(ReliableMessageContext * rc) :
ec(*rc->GetExchangeContext()), nextRetransTime(0), sendCount(0)
{
ec->SetMessageNotAcked(true);
ec->SetWaitingForAck(true);
}

ReliableMessageMgr::RetransTableEntry::~RetransTableEntry()
{
ec->SetMessageNotAcked(false);
ec->SetWaitingForAck(false);
}

ReliableMessageMgr::ReliableMessageMgr(ObjectPool<ExchangeContext, CHIP_CONFIG_MAX_EXCHANGE_CONTEXTS> & contextPool) :
Expand Down Expand Up @@ -158,12 +158,6 @@ void ReliableMessageMgr::ExecuteActions()
// Do not StartTimer, we will schedule the timer at the end of the timer handler.
mRetransTable.ReleaseObject(entry);

// Dropping our entry marked the exchange as not having an un-acked
// message... but of course it _does_ have an un-acked message and
// we have just given up on waiting for the ack.

ec->GetReliableMessageContext()->SetMessageNotAcked(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind-of raises a protocol question in the sense that, by not setting MessageNotAcked to true, we are stating that we are still open to receiving an acknowledgment of delivery via the response, although all our retransmits have been exhausted. This clearly is an indicator of misconfiguration of MRP timeouts or the application response timeout, isn't it? Ideally, if we have exhausted all retries, it should go up as an OnSendFailure() callback for the application to take a necessary action(of closing the exchange, etc) even if the ResponseTimeout is outstanding, isn't it?
Ideally, the ResponseTimeout should be some function of the MRP intervals and number of retries with some additional buffer time, and not entirely decoupled from the MRP values.

MRP, at its level has a window within which it is expected to process acknowledgments to its sent messages and if that window passes, should it still process late response messages as Acks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind-of raises a protocol question in the sense that, by not setting MessageNotAcked to true

There is no more "message not acked" flag, precisely because the name did not accurately reflect what the flag did. What the flag did (and still does after this PR, under the new name) is track "We have not gotten an ack, and if we do get an ack for the relevant message counter value we will clear this flag". That last "if ..." happens to be false once we reach this code, which is why setting the flag to true here, which was added recently, was wrong.

This clearly is an indicator of misconfiguration of MRP timeouts or the application response timeout, isn't it?

Well... The application response timeout can be quite long for a long-running operation. MRP timeouts are quite short (4s for all resends) for devices that don't claim to be sleepy. Realistically, the use of sleepy params or MRP timeouts is kind of broken, but that's a brokenness required by the spec, unfortunately, which has been reaised as a spec issue.

Ideally, if we have exhausted all retries, it should go up as an OnSendFailure() callback for the application to take a necessary action(of closing the exchange, etc) even if the ResponseTimeout is outstanding, isn't it?

Honestly: that would break the world as things stand, as long as we are following the spec-required algorithm here.

Ideally, the ResponseTimeout should be some function of the MRP intervals and number of retries with some additional buffer time, and not entirely decoupled from the MRP values.

It's not, generally, decoupled. The default behavior is that the app provides how much time to allow on top of the MRP timeout. So the response timeout is nearly always the MRP timeout + some app-defined constant.

should it still process late response messages as Acks

That is the big question, yes. The current implementation does not, because it has no way to tell whether they are in a sane way.

Copy link
Contributor

@pidarped pidarped Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So, app response timeout is well correlated with MRP timeout which is good.

However, OnSendFailure() is not implemented in our SDK. Curious what happens in scenarios where there is a critical issue in sending the message out(say the key is wrong and the message cannot be encrypted) and so all MRP retransmits fail. In that case, the application is forced to wait out the full ExchangeContext timeout when it could have been notified of a Send failure.

Also, regarding the ResponseTimeout, especially in Exchanges where there are multiple back-and-forth messaging, there is a distinction between the ResponseTimeout and a timeout for the entire Exchange. The former should, ideally capture the timeout for the current outstanding sent app message, whereas the latter would be a sum of all the individual back-and-forth. So, for each new send on the same exchange, the response timeout should be re-set for each new send. Not sure if we do this, or the Response timeout is essentially a timeout for all message send and receives.

That is the big question, yes. The current implementation does not, because it has no way to tell whether they are in a sane way.

I think, we can ignore the MRPAcking part of it since that timeout has already passed, but still send the response to the application if it is still waiting and the Exchange is valid. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, OnSendFailure() is not implemented in our SDK.

Correct.

Curious what happens in scenarios where there is a critical issue in sending the message out(say the key is wrong and the message cannot be encrypted)

That's a synchronous failure from SendMessage.

there is a distinction between the ResponseTimeout and a timeout for the entire Exchange

We have no concept of "timeout for the entire Exchange" right now.

for each new send on the same exchange, the response timeout should be re-set for each new send.

Yep, that's what we do. Or rather you can configure it once, and the timer is restarted for each send. Or you can change it as you go (and CASE does that, in fact).


return Loop::Continue;
}

Expand Down Expand Up @@ -204,7 +198,7 @@ void ReliableMessageMgr::Timeout(System::Layer * aSystemLayer, void * aAppState)

CHIP_ERROR ReliableMessageMgr::AddToRetransTable(ReliableMessageContext * rc, RetransTableEntry ** rEntry)
{
VerifyOrDie(!rc->IsMessageNotAcked());
VerifyOrDie(!rc->IsWaitingForAck());

*rEntry = mRetransTable.CreateObject(rc);
if (*rEntry == nullptr)
Expand Down
14 changes: 14 additions & 0 deletions src/messaging/tests/MessagingContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -163,12 +163,26 @@ CHIP_ERROR MessagingContext::CreateSessionBobToAlice()
mBobFabricIndex, mAliceAddress, CryptoContext::SessionRole::kInitiator);
}

CHIP_ERROR MessagingContext::CreateCASESessionBobToAlice()
{
return mSessionManager.InjectCaseSessionWithTestKey(mSessionBobToAlice, kBobKeyId, kAliceKeyId, GetBobFabric()->GetNodeId(),
GetAliceFabric()->GetNodeId(), mBobFabricIndex, mAliceAddress,
CryptoContext::SessionRole::kInitiator);
}

CHIP_ERROR MessagingContext::CreateSessionAliceToBob()
{
return mSessionManager.InjectPaseSessionWithTestKey(mSessionAliceToBob, kAliceKeyId, GetBobFabric()->GetNodeId(), kBobKeyId,
mAliceFabricIndex, mBobAddress, CryptoContext::SessionRole::kResponder);
}

CHIP_ERROR MessagingContext::CreateCASESessionAliceToBob()
{
return mSessionManager.InjectCaseSessionWithTestKey(mSessionAliceToBob, kAliceKeyId, kBobKeyId, GetAliceFabric()->GetNodeId(),
GetBobFabric()->GetNodeId(), mAliceFabricIndex, mBobAddress,
CryptoContext::SessionRole::kResponder);
}

CHIP_ERROR MessagingContext::CreatePASESessionCharlieToDavid()
{
return mSessionManager.InjectPaseSessionWithTestKey(mSessionCharlieToDavid, kCharlieKeyId, 0xdeadbeef, kDavidKeyId,
Expand Down
8 changes: 5 additions & 3 deletions src/messaging/tests/MessagingContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,11 @@ class MessagingContext : public PlatformMemoryUser
const FabricInfo * GetAliceFabric() { return mFabricTable.FindFabricWithIndex(mAliceFabricIndex); }
const FabricInfo * GetBobFabric() { return mFabricTable.FindFabricWithIndex(mBobFabricIndex); }

CHIP_ERROR CreateSessionBobToAlice();
CHIP_ERROR CreateSessionAliceToBob();
CHIP_ERROR CreateSessionBobToFriends();
CHIP_ERROR CreateSessionBobToAlice(); // Creates PASE session
CHIP_ERROR CreateCASESessionBobToAlice();
CHIP_ERROR CreateSessionAliceToBob(); // Creates PASE session
CHIP_ERROR CreateCASESessionAliceToBob();
CHIP_ERROR CreateSessionBobToFriends(); // Creates PASE session
CHIP_ERROR CreatePASESessionCharlieToDavid();
CHIP_ERROR CreatePASESessionDavidToCharlie();

Expand Down
Loading