Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate client transaction upon transport error #3806

Merged
merged 4 commits into from
Jan 2, 2024

Conversation

trengginas
Copy link
Member

#3805 will prevent immediate transaction termination upon transport error.
This will introduce new behavioral change which is an issue.

Provided an outgoing INVITE without patch

22:27:26.194    inv05253D64  .UAC invite session created for dialog dlg05253D64
22:27:26.194     sip_util.c  .Request msg INVITE/cseq=8155 (tdta05262D64) created.
22:27:26.194    inv05253D64  ..Sending Request msg INVITE/cseq=8155 (tdta05262D64)
22:27:26.194    dlg05253D64  ...Sending Request msg INVITE/cseq=8155 (tdta05262D64)
22:27:26.194    tsx0530BEE4  ....Transaction created for Request msg INVITE/cseq=8155 (tdta05262D64)
22:27:26.194    tsx0530BEE4  ...Sending Request msg INVITE/cseq=8155 (tdta05262D64) in state Null
22:27:26.194  sip_resolve.c  ....Target '192.168.1.114:0' type=TCP resolved to '192.168.1.114:5060' type=TCP (TCP transport)
22:27:26.194   tcpc06C06114  ....TCP client transport created
22:27:26.194   tcpc06C06114  ....TCP transport 192.168.100.51:63251 is connecting to 192.168.1.114:5060...
22:27:26.194   pjsua_core.c  ....TX 955 bytes Request msg INVITE/cseq=8155 (tdta05262D64) to TCP 192.168.1.114:5060:
22:27:26.198    tsx0530BEE4  ....State changed from Null to Calling, event=TX_MSG
22:27:26.198    dlg05253D64  .....Transaction tsx0530BEE4 state changed to Calling
22:27:26.198    inv05253D64  ......State changed from NULL to CALLING, event=TSX_STATE
22:27:26.198    pjsua_app.c  .......Call 0 state changed to CALLING
22:27:26.198    pjsua_app.c  .......Tsx tsx0530BEE4 state changed: Calling code: 0
22:27:31.204   sound_port.c !EC suspended because of inactivity
22:27:47.244   tcpc06C06114  TCP connect() error: [code=130060]: Connection timed out (WSAETIMEDOUT)
22:27:47.244    tsx0530BEE4  Failed to send Request msg INVITE/cseq=8155 (tdta05262D64)! err=130060 (Connection timed out (WSAETIMEDOUT))
22:27:47.244    tsx0530BEE4  State changed from Calling to Terminated, event=TRANSPORT_ERROR
22:27:47.244    dlg05253D64  .Transaction tsx0530BEE4 state changed to Terminated
22:27:47.244    inv05253D64  ..State changed from CALLING to DISCONNECTED, event=TSX_STATE
22:27:47.244  pjsua_media.c  ...Call 0: deinitializing media..

With patch

22:33:35.824    inv05451564  .UAC invite session created for dialog dlg05451564
22:33:35.824     sip_util.c  .Request msg INVITE/cseq=7217 (tdta05460564) created.
22:33:35.824    inv05451564  ..Sending Request msg INVITE/cseq=7217 (tdta05460564)
22:33:35.824    dlg05451564  ...Sending Request msg INVITE/cseq=7217 (tdta05460564)
22:33:35.824    tsx0550BEE4  ....Transaction created for Request msg INVITE/cseq=7217 (tdta05460564)
22:33:35.824    tsx0550BEE4  ...Sending Request msg INVITE/cseq=7217 (tdta05460564) in state Null
22:33:35.824  sip_resolve.c  ....Target '192.168.1.114:0' type=TCP resolved to '192.168.1.114:5060' type=TCP (TCP transport)
22:33:35.824   tcpc06E06114  ....TCP client transport created
22:33:35.824   tcpc06E06114  ....TCP transport 192.168.100.51:63302 is connecting to 192.168.1.114:5060...
22:33:35.824   pjsua_core.c  ....TX 955 bytes Request msg INVITE/cseq=7217 (tdta05460564) to TCP 192.168.1.114:5060:
22:33:35.837    tsx0550BEE4  ....State changed from Null to Calling, event=TX_MSG
22:33:35.837    dlg05451564  .....Transaction tsx0550BEE4 state changed to Calling
22:33:35.837    inv05451564  ......State changed from NULL to CALLING, event=TSX_STATE
22:33:35.837    pjsua_app.c  .......Call 0 state changed to CALLING
22:33:35.837    pjsua_app.c  .......Tsx tsx0550BEE4 state changed: Calling code: 0
22:33:56.871   tcpc06E06114  TCP connect() error: [code=130060]: Connection timed out (WSAETIMEDOUT)
22:33:56.871    tsx0550BEE4  Failed to send Request msg INVITE/cseq=7217 (tdta05460564)! err=130060 (Connection timed out (WSAETIMEDOUT))
22:33:56.871   tcpc06E06114  TCP send() error, sent=-130060
22:33:56.872    pjsua_app.c  SIP TCP transport is disconnected from 192.168.1.114:5060: Connection timed out (WSAETIMEDOUT) [status=130060]
22:33:56.872    pjsua_acc.c  Disconnected notification for transport tcpc06E06114
22:33:56.872 sip_transport.  .Transport tcpc06E06114 shutting down, force=0
22:33:56.873 sip_transport.  Transport tcpc06E06114 is being destroyed due to timeout in idle timer
22:33:56.873   tcpc06E06114  TCP transport destroyed with reason 130060: Connection timed out (WSAETIMEDOUT)
22:34:07.848    tsx0550BEE4  Timeout timer event
22:34:07.848    tsx0550BEE4  .State changed from Calling to Terminated, event=TIMER
22:34:07.848    dlg05451564  ..Transaction tsx0550BEE4 state changed to Terminated
22:34:07.848    inv05451564  ...State changed from CALLING to DISCONNECTED, event=TSX_STATE
22:34:07.848  pjsua_media.c  ....Call 0: deinitializing media..

The outgoing INVITE transaction will be terminated by timeout timer and the call disconnected notification will require additional time.

As stated in RFC 6026 (https://www.rfc-editor.org/rfc/rfc6026.html), section 7.1:

A server transaction MUST NOT discard transaction state based only on
   encountering a non-recoverable transport error when sending a
   response.  Instead, the associated INVITE server transaction state
   machine MUST remain in its current state.  (Timers will eventually
   cause it to transition to the "Terminated" state).  This allows
   retransmissions of the INVITE to be absorbed instead of being
   processed as a new request.

And section 7.2:

Note that it is not necessary to preserve client transaction state
   upon the detection of unrecoverable transport errors.  Existing
   requirements ensure the TU has been notified, and the new
   requirements in this document ensure that any received retransmitted
   response will be dropped since there will no longer be any matching
   transaction state.

Upon transport error, only server transaction that must not be discarded/terminated.

This PR will terminate client transaction upon transport error.

@trengginas trengginas added this to the release-2.15 milestone Dec 19, 2023
@trengginas trengginas self-assigned this Dec 19, 2023
@sauwming
Copy link
Member

Sorry, it looked like the previous PR introduced backward incompatibility and failed tests. Could you please fix it as well so that it passed the CI tests?

@trengginas
Copy link
Member Author

Failed pjsip-test:

@sauwming
Copy link
Member

For 100 response, we need to immediately terminate it.
https://datatracker.ietf.org/doc/html/rfc3261#section-17.2.1

Why does 100 response need to be treated differently?

@trengginas
Copy link
Member Author

Currently, retransmission will be done only for > 100 responses.

/* Retransmit provisional response every 1 minute if this is

This is based on https://datatracker.ietf.org/doc/html/rfc3261#section-17.2.1 (see Figure 7: INVITE server transaction)

This will make 100 response to linger (no timeout/retransmission) if it fails to send.
Based on the above figure, then 100 responses should be terminated immediately.
The other alternative would be to retransmit it.

@sauwming
Copy link
Member

This is based on https://datatracker.ietf.org/doc/html/rfc3261#section-17.2.1 (see Figure 7: INVITE server transaction)
This will make 100 response to linger (no timeout/retransmission) if it fails to send. Based on the above figure, then 100 responses should be terminated immediately. The other alternative would be to retransmit it.

Ah okay, I understand now that currently we don't retransmit 100, but why doesn't the timeout work?

After re-reading both RFCs, I'm still not entirely sure as to what we should do upon 100 failure. Based on my understanding: - the server tsx shouldn't be terminated, - timeout must still run. Not sure about retransmission, I believe we need to do this also, but maybe there's a reason right now we intentionally don't do this?

@trengginas
Copy link
Member Author

Because timer timeout isn't scheduled for provisional responses.
On tests, response code > 100 will be terminated when sending retransmission failed:

PJ_LOG(2,(tsx->obj_name,

Regarding rfc6062, the title stated "Correct Transaction Handling for 2xx Responses". so, it should only apply to 2xx responses?

@sauwming
Copy link
Member

I believe the title of RFC 6026 states its primary or main content, which doesn't necessarily mean that it will only affect 2xx responses. Section 8 of the RFC elaborates all the revisions, and Figure 7 shows the overall updated INVITE server transaction flow.

So, I believe the sip_tsx modification in the previous commit 035bad7 was already correct. We don't need to terminate upon 1xx failure.

As for the 100 retransmission, I suggest to comment the 100 condition to activate retransmission for 100, and test if it introduces compatibility issue.

/* Retransmit provisional response every 1 minute if this is

If it doesn't cause incompatibility, I think it should be okay.

@nanangizz
Copy link
Member

I believe the title of RFC 6026 states its primary or main content, which doesn't necessarily mean that it will only affect 2xx responses. Section 8 of the RFC elaborates all the revisions, and Figure 7 shows the overall updated INVITE server transaction flow.

Yes, the updates on the diagrams seem to affect other responses/tsx-states too.

So, I believe the sip_tsx modification in the previous commit 035bad7 was already correct. We don't need to terminate upon 1xx failure.

Agree.

As for the 100 retransmission, I suggest to comment the 100 condition to activate retransmission for 100, and test if it introduces compatibility issue.

/* Retransmit provisional response every 1 minute if this is

If it doesn't cause incompatibility, I think it should be okay.

Not sure about this, but I won't vote for 100 retransmission. It is supposed to be unreliable, immediate (to cease INVITE retransmission) and hop-by-hop (101-199 will be forwarded by proxy, but 100 will not), even PRACK's RFC prohibits 100 reliability (ref), so I guess it is better to leave it as is (i.e: no retransmission).


Btw, it seems we now support rfc6026 partially in the recent changes, haven't really checked the details myself, but would we consider to support it fully (e.g: add new tsx state "Accepted", etc)?

@sauwming
Copy link
Member

Not sure about this, but I won't vote for 100 retransmission. It is supposed to be unreliable, immediate (to cease INVITE retransmission) and hop-by-hop (101-199 will be forwarded by proxy, but 100 will not), even PRACK's RFC prohibits 100 reliability (ref), so I guess it is better to leave it as is (i.e: no retransmission).

Okay, let's go with this.

Btw, it seems we now support rfc6026 partially in the recent changes, haven't really checked the details myself, but would we consider to support it fully (e.g: add new tsx state "Accepted", etc)?

For now, I plan to put in our roadmap for further discussion, especially since it seems quite major.

@sauwming
Copy link
Member

sauwming commented Jan 2, 2024

List of changes in the latest commit:

  • sip transaction

    • transport_callback() need to handle the case to ignore sending failure for UAS tsx.
    • tsx_send_msg() should not treat the error as permanent for UAS tsx.
  • pjsip test
    Now the expected test result will be timeout, instead of transport failure

    • For test 10 (failed transport in TRYING state), the expected result is timeout by app (since tsx will not timeout before it sends a final response).
    • For test 11 and 12 (failed transport in PROCEEDING/COMPLETED state), the timeout is by the tsx (PJSIP_SC_REQUEST_TIMEOUT).

@sauwming sauwming merged commit 4dfbdf7 into master Jan 2, 2024
35 checks passed
@sauwming sauwming deleted the terminate-client-transaction branch January 2, 2024 07:57
dshamaev-intermedia pushed a commit to intermedia-net/pjproject that referenced this pull request Mar 22, 2024
dshamaev-intermedia pushed a commit to intermedia-net/pjproject that referenced this pull request Apr 30, 2024
* Terminate client transaction upon transport error
* Fixed UAS tsx tp error handling and PJSIP test
dshamaev-intermedia added a commit to intermedia-net/pjproject that referenced this pull request Apr 30, 2024
* Terminate client transaction upon transport error (pjsip#3806)

* Terminate client transaction upon transport error
* Fixed UAS tsx tp error handling and PJSIP test

* version bump

---------

Co-authored-by: Riza Sulistyo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants