-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nav2 test analysis #38
Comments
Hello @evshary , this is very helpful! May I ask what tests you are referring to? I was trying to make a remote connection to my robot over wifi. Similarly, I also tried both ends of the 1.0.0 PR (ros2#276) i.e., Rolling and dev/1.0.0. However, both of them are unstable. With your results I can see clearly what's going wrong. If I skip the navigation i.e., basically running the ROS2 control for the motors and running the lidar along with its filters everything works fine. I can very smoothly visualize the robot remotely. It also properly updates the odometry if I move it with a joystick. But as soon as I try to run nav2 stack, things go wrong. |
Hi @alireza-moayyedi |
Hi @evshary, Well that's a surprise to be honest. This is the exact usecase that I am trying work out: On the robot side:
On a separate computer:
Expected behavior:
Actual behavior in rolling:
Actual behavior in dev/1.0.0:
I am certain that this is an rmw issue because if I connect the separate computer directly with an ethernet cable to the robot and use CycloneDDS with a explicit peers address list and explicit network interface then everything works very smoothly and I can easily initialize and control the robot remotely. Of course the downside then is that I have to follow the robot with my laptop in the hand. Regarding the release, I am using the latest apt release:
|
Thank you for the detailed steps. I didn't see anything weird.
For the dev/1.0.0 version, perhaps you could share the logs with us. |
Hi @evshary, As suggested I tried to narrow it down furthur and here are my findings (everything run with dev/1.0.0):
So I guess at this point we can conclude something is going wrong with communicating over wifi. Therefore, I tried to dig deeper. First, to omit the possibility of a faulty office wifi, I set up a separate router (2.4 GHz) where only my computer and the robot connected to it. But still the same issues as I reported originally. Here are some logs that might be relevant:
Next, I connected a display to the robot and I tried to see if I could run rviz simultaneously on both the robot as well as the remote computer and check if there was some difference in the behavior. On the robot I managed to get the map loading in the robot's rviz while the remote computer was still not loading it (though not so easily as I will explain later why). Surprisingly, I noticed that after giving the initial pose in the robot's rviz, amcl started to work properly and in the remote rviz I could also see the topics such as costmaps in the map frame (still no map). I drove around a bit and it seemed stable. Here is the remote rviz showing some topics in the map frame after initializing the localization in the robot's rviz: So then I got more suspicious on the map server and started digging deeper into it. Now as I mentioned earlier, it was difficult to get the map showing in the robot's rviz when I was trying to also visualize it simultaneously in the remote's rviz. I noticed some irregular behavior when I tried to run the rviz first on the remote computer and then run the nav2 stack on the robot. For some reason, it caused the map server not to load properly: So now in order to make it work, I need to first run nav2 on the robot, initialize the localization on the robot's rviz and only then run the rviz on the remote. This got me thinking if the /map topic needs some furthur tuning in the zenoh router's configuration to accomodate for the topic's bandwidth. Or maybe this is actually related to the rviz plugin that you mentioned which in that case I should test building nav2 from the source including that fix. Sorry for the long posts, and I appreciate much your patience. Unfortunately I have not yet found anyone around me who has successfully managed to setup the Zenoh rmw in combination with nav2 for establishing a remote connection. Therefore, I have decided to dig deeper into it myself and report it directly to you here. |
Hi @alireza-moayyedi Thank you for the detailed description. It helps a lot. I will investigate it. Feel free to share with us if there is anything else you find. |
@alireza-moayyedi you can try to tune the If you don't know the topic type name and hash, you can replace each with |
@alireza-moayyedi
1st host: ros2 run rmw_zenoh_cpp rmw_zenohd
ros2 launch nav2_bringup rviz_launch.py 2nd host: ros2 run rmw_zenoh_cpp rmw_zenohd
ros2 launch nav2_bringup tb3_simulation_launch.py headless:=False use_rviz:=False It seems like everything works well. However, I found there is an issue if we run nav2 simulation first and then rviz2. |
Hi @evshary @JEnoch , I tested the scenario @evshary described with the difference of using nav2's apt release. I know that it has the rviz plugin bug which is fixed in source but well it was manageable. Similar to your results, everything worked fine and smoothly between the host and the remote. So it got me wondering what is the difference between my robot and the tb3 simulation. I compared the two map files and I noticed that:
Then I thought let's swap the maps and see what happens. So I did the following tests:
So I am guessing some configuration is not properly set to accomodate for the 7.9 MB map over the wifi since it works fine when using an ethernet connection or keeping everything in the host. I took a look at the default router config and max_message_size: 1073741824 seems to be fine. I wonder what should be changed. @JEnoch Thanks for the tip; I will test your suggestion next. But considering my observation, do you suggest anything else to change? |
@JEnoch follow up of my previous message; tried the downsampling but unfortunately it did not help. |
Thank you for those detailed tests! RUST_LOG=zenoh=trace ros2 service call /map_server/load_map nav2_msgs/srv/LoadMap "{map_url: /ros/maps/map.yaml} 2>&1 | tee service_call.log" I guess the Also, can you share your 7.9 MB map somewhere so we can test it with the simulation and analyse further ? |
Hello @JEnoch , Just to summarize, the problem is not the loading of the map by map server, but rather communicating the map over wifi. In other words:
So to make sure we are not missing anything, I have logged and piped the following as you requested:
I will be sending the outputs as well as our map to you and @evshary by email. Thanks for your efforts! @evshary Can you by any chance test the tb3 (or any other remote/host setup you have) with our map as the input? |
OK, I managed to reproduce the issue with simpler ROS 2 examples.
However, I can use pure Zenoh examples (both Rust and C) to send 100MB payload with the same configuration and environment. |
Hello @evshary, Nice! Should this be addressed on a separate repo/issue (I assume https://github.com/ros2/rmw_zenoh/issues)? Or will you look into it yourself? |
I haven't verified it with the rolling branch yet, but I will keep looking into it for sure. At least, we wish this can be fixed while upgrading to Zenoh 1.0 in rmw_zenoh. |
@alireza-moayyedi Now we have a branch to fix the issue. It would be great if you could give it a try and let us know whether it works for you. Thank you! |
Hi @evshary, I spent some time with the robot and tested different scenarios. The map now loads but the performance is very poor. To be more precise:
|
Hi @alireza-moayyedi
I will see how I can reproduce the issue on my side. |
Hi @evshary , I had limited time to check everything you wanted thoroughly so I need to perform more tests later this week again to double check everything. But with a few tests that I did yesterday I observed that:
|
No worries. We always appreciate the early adopters who give us valuable feedback. For 1, it's interesting. I will try the latest dev/1.0.0 on my side again, but at least the branch adjust_qos works on my side. Besides, it would be great if you could share the specs of your robot (I mean the computer on your robot). Now I'm using laptop and IPC to do the simulation, which is powerful enough. Just want to make sure if it's related to the limited resource device. |
Okay, I've verified it again. It still works on my side with this commit 435186a Let's focus on the simulation first and see what the difference is between our environment. Then we can move on to your real robot. |
The text was updated successfully, but these errors were encountered: