-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Session restart example #218
base: master
Are you sure you want to change the base?
Conversation
@@ -222,7 +222,7 @@ | |||
*/ | |||
#ifndef Z_BATCH_SIZE_RX | |||
#define Z_BATCH_SIZE_RX \ | |||
65535 // Warning: changing this value can break the communication | |||
3072 // Warning: changing this value can break the communication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not intend to merge changes made to this file. They are just to make the session opening work as my microcontroller doesn't have enough RAM for the default value.
I was able to reproduce the error again in the debugger but got a slightly different one than previously mentioned in this PR.
Sadly gdb doesn't give the callstack from before the signal handler was called. Whereas when I'm using the Zephyr commit I pointed at earlier I was able to. Maybe they added that between 2.7.1 and 3.3.0 or there is a config option that needs to be turned on that was magically turned on behind my back? I don't know. Anyways, using ``addr2line` I was able to pin point exactly which line triggered the signal handler by doing:
And it pointed to |
Hi @dcorbeil . It seems to me that it fails on the 3rd open. Can you confirm that this is consistent with other failed runs? |
@cguimaraes My mind immediately went to a memory leak and it's exhausting the memory as well so I checked the heap by using this function:
and everything seems ok. There is always at least 10k bytes available. When using Zephyr > 3.3.0, I was able to get the callstack of where it fails exactly and it always fails at |
If the memory keeps stable I wouldn't say it is a leaking issue. It might still be a memory fragmentation issue. If I am not mistaken Zephyr also has a function to tell you the biggest continuous block of memory. Can you double check that? |
Are you talking about For fun I tried allocating 700 int pointers and then freeing them right after to try to fragment the memory. No change. It still crashes at the 3rd open. |
I would prefer to see the statistics with When you allocate and deallocate right after, it might be that the memory will be in a very similar state. |
Here is a log with a call to
|
@dcorbeil can you share your project with us? Not only the |
I think I am able to replicate the same issue. @dcorbeil did you try to use the most recent Zephyr version 3.3.0 or 3.4.0? |
@cguimaraes I tried Zephyr 3.3.0 and 3.4.0 with similar results. Of course when using 3.4.0, I had to pass the |
Hi,
A couple weeks back I asked on the Discord about how to handle the situation where in a multi-router network, the router that the zenoh-pico client is connected to goes down. The recommendation that I was given was to check whether the session is still valid is by periodically calling
z_check()
. I tried doing just that but it didn't work. Even if the router that the zenoh-pico client was connected to went down for whatever reason,z_check()
kept returningtrue
. For the use-case of a publisher I ended up using the return value fromz_publisher_put()
. This works well for one restart.The problem that I run into is that the second restart triggers a
USAGE FAULT
:Setup:
z_pub_session_retry.c
zenohd
v0.7.0-rcSteps to reproduce:
z_pub_session_retry.c
example from this PRzenohd
on two different machines that can function has a zenoh router (I started it by runningRUST_LOG=DEBUG zenohd
. The debug output makes it very easy to know which one is connected to the zenoh-pico client)USAGE FAULT
on the microcontrollerExpected behavior:
My intent with this PR is two easily show the code that I'm using and solve this issue and find the solution to that problem.
I initially ran into a slightly different issue that this one while using a fairly recent version of Zephyr (30efd04) but for the sake of simplicity and removing variables I'm reporting the issue as if I'm using PlatformIO. I suppose that you guys are all using PlaftormIO for zenoh-pico development?