Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

02-subpub-qos1-message-expiry.py test regression #3124

Closed
mattst88 opened this issue Oct 5, 2024 · 6 comments
Closed

02-subpub-qos1-message-expiry.py test regression #3124

mattst88 opened this issue Oct 5, 2024 · 6 comments

Comments

@mattst88
Copy link

mattst88 commented Oct 5, 2024

On v2.0.19 the 02-subpub-qos1-message-expiry.py test fails:

mattst88@framework ~/projects/mosquitto/test/broker (v2.0.19~3^2~4) $ ./02-subpub-qos1-message-expiry.py
FAIL: Received incorrect puback 1.
Received: PUBACK, rl=3, mid=1
Expected: PUBACK, rl=2, mid=1
1728092536: mosquitto version 2.0.19 starting
1728092536: Using default config.
1728092536: Starting in local only mode. Connections will only be possible from clients running on this machine.
1728092536: Create a configuration file which defines a listener to allow remote access.
1728092536: For more details see https://mosquitto.org/documentation/authentication-methods/
1728092536: Opening ipv4 listen socket on port 1888.
1728092536: Opening ipv6 listen socket on port 1888.
1728092536: mosquitto version 2.0.19 running
1728092536: New connection from ::1:48630 on port 1888.
1728092536: Client <unknown> closed its connection.
1728092536: New connection from 127.0.0.1:55338 on port 1888.
1728092536: New client connected from 127.0.0.1:55338 as subpub-qos0-test (p5, c0, k60).
1728092536: No will message specified.
1728092536: Sending CONNACK to subpub-qos0-test (0, 0)
1728092536: Received SUBSCRIBE from subpub-qos0-test
1728092536: 	subpub/qos1 (QoS 1)
1728092536: subpub-qos0-test 1 subpub/qos1
1728092536: Sending SUBACK to subpub-qos0-test
1728092536: Client subpub-qos0-test closed its connection.
1728092536: New connection from 127.0.0.1:55354 on port 1888.
1728092536: New client connected from 127.0.0.1:55354 as helper (p5, c1, k60).
1728092536: No will message specified.
1728092536: Sending CONNACK to helper (0, 0)
1728092536: Received PUBLISH from helper (d0, q1, r0, m1, 'subpub/qos1', ... (8 bytes))
1728092536: Sending PUBACK to helper (m1, rc16)
1728092536: mosquitto version 2.0.19 terminating

Traceback (most recent call last):
  File "/home/mattst88/projects/mosquitto/test/broker/./02-subpub-qos1-message-expiry.py", line 78, in <module>
    do_test()
  File "/home/mattst88/projects/mosquitto/test/broker/./02-subpub-qos1-message-expiry.py", line 74, in do_test
    print("proto_ver=%d" % (proto_ver))
                            ^^^^^^^^^
NameError: name 'proto_ver' is not defined

(I've filed issue 3123 about the proto_ver issue, which is only apparent when the test fails)

I bisected the failure to commit 1914b3e ("Use separate trees for shared/normal subs").

I've reconfirmed the bisect as well.

The bisect log is:

git bisect start
# status: waiting for both good and bad commits
# good: [3923526c6b4c048bbecad2506c4c9963bc46cd36] Bump version, add webpage.
git bisect good 3923526c6b4c048bbecad2506c4c9963bc46cd36
# status: waiting for bad commit, 1 good commit known
# bad: [5eb40ee3d691fb3c2dc222685e7ffcf6e6a69a79] Fix build WITH_ADNS=yes
git bisect bad 5eb40ee3d691fb3c2dc222685e7ffcf6e6a69a79
# good: [f6d0ddcae019a299a8d520c7932bad3f811a9bb2] Clear SSL error state to avoid spurious error reporting.
git bisect good f6d0ddcae019a299a8d520c7932bad3f811a9bb2
# bad: [ae7a804dadac8f2aaedb24336df8496a9680fda9] Fix crash on bridge using remapped topic being sent a crafted packet.
git bisect bad ae7a804dadac8f2aaedb24336df8496a9680fda9
# good: [d3975c85257c8978fd1c357ffb0bff8d2eacb7fb] fix unused variable warning in net__socket_connect_tls()
git bisect good d3975c85257c8978fd1c357ffb0bff8d2eacb7fb
# good: [9a5c2bc14dde13477c24f51e672651bdf3abd649] Add comment on thread safety to mosquitto_broker.h
git bisect good 9a5c2bc14dde13477c24f51e672651bdf3abd649
# bad: [1914b3ee2a18102d0a94cbdbbfeae1afa03edd17] Use separate trees for shared/normal subs
git bisect bad 1914b3ee2a18102d0a94cbdbbfeae1afa03edd17
# good: [3bb6c9ad51f712864dea63529e0b55661c2a9e84] Remove superfluous function arguments.
git bisect good 3bb6c9ad51f712864dea63529e0b55661c2a9e84
# first bad commit: [1914b3ee2a18102d0a94cbdbbfeae1afa03edd17] Use separate trees for shared/normal subs

I build with

make -j20 -l20 CC=x86_64-pc-linux-gnu-gcc CXX=x86_64-pc-linux-gnu-g++ 'CLIENT_LDFLAGS=-Wl,-O1 -Wl,--as-needed -Wl,-z,pack-relative-relocs' LIB_SUFFIX=64 WITH_BRIDGE=yes WITH_PERSISTENCE=yes WITH_SRV=yes WITH_SYSTEMD=yes WITH_TLS=yes WITH_WEBSOCKETS=no WITH_WRAP=no

and then run the test suite with the same command with test appended.

@kent-bowling-bytetrail
Copy link

I'm running into something possibly related, and wanted to quickly point out it appears to happen in your screenshot above. If I need to reproduce it in something like Python, I can do that etc.

When you're publishing from "helper" while "subpub-qos0-test" is subscribed, you're getting an "rc16" on the PUBACK, which I believe is a 0x10 from the MQTT specs for "no matching subscribers" (despite the fact you can see someone is subscribed above etc.).

I'm working on a project where we send sort of a "header" and then more chunks if someone is listening. When I seemingly moved to 2.0.19, this bit of code no longer worked and it never thinks anyone is subscribed. When I use the 2.0.18 docker image, I don't have this issue.

image

@kent-bowling-bytetrail
Copy link

Actually disregard. In your case it does appear it closed subpub-qos0-test closed its connection before the PUBLISH happened, so rc16 makes sense. Still not sure why I'm getting it in my code on 2.0.19 and not 2.0.18, but I will dig deeper.

@ralight
Copy link
Contributor

ralight commented Oct 8, 2024

This is the same issue as your other report, so I'm going to close it. Thanks for the analysis!

@ralight ralight closed this as completed Oct 8, 2024
@mattst88
Copy link
Author

mattst88 commented Oct 9, 2024

This is the same issue as your other report, so I'm going to close it. Thanks for the analysis!

Not really.

This report is "the test fails when it didn't in the last release".

The other report is "some test programs use an undefined variable when they fail".

They're related, but they're definitely different.

@ralight
Copy link
Contributor

ralight commented Oct 9, 2024

@mattst88 Sorry, I got you and Kent mixed up. Kent submitted an issue which is the same as this: #3128, that's what I meant.

@kent-bowling-bytetrail
Copy link

I apologize if I added confusion to all of this. I did originally think this was the same thing I was seeing, but I did change my mind later in my 2nd comment. The logs show the subscriber disconnecting before the rc16 comes back (possibly disconnecting due to the Python exception and the missing proto_ver?). I originally missed this. If the subscriber is disconnected, I think rc16 is appropriate.

To be clear, I can't say this is the same thing as my bug (like I originally thought), but I also obviously defer to your experience overall. I personally latched onto this one too quickly because I was Googling for "rc16" and saw a newly submitted bug for the new version 😄.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants