Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

malloc(): unaligned tcache chunk detected error #186

Closed
CihatAltiparmak opened this issue May 29, 2024 · 8 comments
Closed

malloc(): unaligned tcache chunk detected error #186

CihatAltiparmak opened this issue May 29, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@CihatAltiparmak
Copy link
Contributor

Tested On

  • ROS VERSION: ros iron from docker image
  • latest version of rmw_zenoh_cpp

Description

Hello,
I want to report below bug occured when is run the tests of ros2_control and ros2_controllers by using this dockerfile. It seems like error occurs in this line of one of joint_trajectory_controller package's test files . I tried to look at all issues to find relevant issue but i couldn't find so i decided to open a new issue here. For more, you can also look at this comment

[Thread 0x7904777fe640 (LWP 20062) exited]
[       OK ] OnlyEffortTrajectoryControllers/TestTrajectoryActionsTestParameterized.test_allow_nonzero_velocity_at_trajectory_end_true/0 (2017 ms)
[ RUN      ] OnlyEffortTrajectoryControllers/TestTrajectoryActionsTestParameterized.test_allow_nonzero_velocity_at_trajectory_end_true/1
malloc(): unaligned tcache chunk detected

Thread 1 "test_trajectory" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=133060925205056) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) backtrace 
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=133060925205056) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=133060925205056) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=133060925205056, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007904a997b476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007904a99617f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007904a99c2676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7904a9b14b77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007904a99d9cfc in malloc_printerr (str=str@entry=0x7904a9b17d20 "malloc(): unaligned tcache chunk detected") at ./malloc/malloc.c:5664
#7  0x00007904a99de3dc in tcache_get (tc_idx=<optimized out>) at ./malloc/malloc.c:3195
#8  __GI___libc_malloc (bytes=512) at ./malloc/malloc.c:3313
#9  0x00007904a9c3298c in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007904a8229b58 in __gnu_cxx::new_allocator<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> > >::allocate (
    this=0x6105b6da66d0, __n=64) at /usr/include/c++/11/ext/new_allocator.h:127
#11 0x00007904a822946f in std::allocator_traits<std::allocator<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> > > >::allocate (
    __a=..., __n=64) at /usr/include/c++/11/bits/alloc_traits.h:464
#12 0x00007904a8228dfe in std::_Deque_base<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> >, std::allocator<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> > > >::_M_allocate_node (this=0x6105b6da66d0) at /usr/include/c++/11/bits/stl_deque.h:562
#13 0x00007904a829718b in std::_Deque_base<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> >, std::allocator<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> > > >::_M_create_nodes (this=0x6105b6da66d0, __nstart=0x6105b6d31f88, __nfinish=0x6105b6d31f90)
    at /usr/include/c++/11/bits/stl_deque.h:663
#14 0x00007904a82959cc in std::_Deque_base<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> >, std::allocator<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> > > >::_M_initialize_map (this=0x6105b6da66d0, __num_elements=0) at /usr/include/c++/11/bits/stl_deque.h:637
#15 0x00007904a82947f4 in std::_Deque_base<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> >, std::allocator<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> > > >::_Deque_base (this=0x6105b6da66d0) at /usr/include/c++/11/bits/stl_deque.h:439
#16 0x00007904a8293772 in std::deque<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> >, std::allocator<std::unique_ptr<rmw_zenoh_cpp::rmw_zenoh_event_status_t, std::default_delete<rmw_zenoh_cpp::rmw_zenoh_event_status_t> > > >::deque (this=0x6105b6da66d0) at /usr/include/c++/11/bits/stl_deque.h:834
#17 0x00007904a8293b11 in rmw_zenoh_cpp::EventsManager::EventsManager (this=0x6105b6da64e0) at /ws/src/rmw_zenoh/rmw_zenoh_cpp/src/detail/event.hpp:111
--Type <RET> for more, q to quit, c to continue without paging--
#18 0x00007904a8293de9 in rmw_zenoh_cpp::rmw_publisher_data_t::rmw_publisher_data_t (this=0x6105b6da63e0) at /ws/src/rmw_zenoh/rmw_zenoh_cpp/src/detail/rmw_data_types.hpp:88
#19 0x00007904a827e5b7 in rmw_create_publisher (node=0x6105b7204720, type_supports=0x7904a95bdb80, topic_name=0x6105b6e442b0 "/rosout", qos_profile=0x7fff652e1eb0, publisher_options=0x7fff652e1f30)
    at /ws/src/rmw_zenoh/rmw_zenoh_cpp/src/rmw_zenoh.cpp:504
#20 0x00007904a9e2c046 in rcl_publisher_init (publisher=publisher@entry=0x7fff652e1ea8, node=node@entry=0x6105b710d3c0, type_support=type_support@entry=0x7904a95bdb80, 
    topic_name=topic_name@entry=0x7904a9e3c950 "/rosout", options=options@entry=0x7fff652e1eb0) at ./src/rcl/publisher.c:111
#21 0x00007904a9e2c93a in rcl_logging_rosout_init_publisher_for_node (node=node@entry=0x6105b710d3c0) at ./src/rcl/logging_rosout.c:275
#22 0x00007904a9e2cb12 in rcl_logging_rosout_init_publisher_for_node (node=node@entry=0x6105b710d3c0) at ./src/rcl/logging_rosout.c:235
#23 0x00007904a9e2d0db in rcl_node_init (node=0x6105b710d3c0, name=<optimized out>, namespace_=<optimized out>, context=0x6105b6afedb0, options=0x6105b6e00120) at ./src/rcl/node.c:297
#24 0x00007904a9f78be3 in rclcpp::node_interfaces::NodeBase::NodeBase(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<rclcpp::Context>, rcl_node_options_s const&, bool, bool, std::shared_ptr<rclcpp::CallbackGroup>) () from /opt/ros/iron/lib/librclcpp.so
#25 0x00007904a9f79028 in rclcpp::Node::Node(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rclcpp::NodeOptions const&) () from /opt/ros/iron/lib/librclcpp.so
#26 0x00007904a9f7a828 in rclcpp::Node::Node(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rclcpp::NodeOptions const&) () from /opt/ros/iron/lib/librclcpp.so
#27 0x00006105b4ce597b in __gnu_cxx::new_allocator<rclcpp::Node>::construct<rclcpp::Node, char const (&) [22]> (this=0x7fff652e8fcf, __p=0x6105b71d5d30) at /usr/include/c++/11/ext/new_allocator.h:162
#28 0x00006105b4ce09b0 in std::allocator_traits<std::allocator<rclcpp::Node> >::construct<rclcpp::Node, char const (&) [22]> (__a=..., __p=0x6105b71d5d30) at /usr/include/c++/11/bits/alloc_traits.h:516
#29 0x00006105b4cda9b2 in std::_Sp_counted_ptr_inplace<rclcpp::Node, std::allocator<rclcpp::Node>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<char const (&) [22]> (this=0x6105b71d5d20, __a=...)
    at /usr/include/c++/11/bits/shared_ptr_base.h:519
#30 0x00006105b4cd1ba4 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<rclcpp::Node, std::allocator<rclcpp::Node>, char const (&) [22]> (this=0x7fff652e9178, __p=@0x7fff652e9170: 0x0, 
    __a=...) at /usr/include/c++/11/bits/shared_ptr_base.h:650
#31 0x00006105b4cc832c in std::__shared_ptr<rclcpp::Node, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<rclcpp::Node>, char const (&) [22]> (this=0x7fff652e9170, __tag=...)
    at /usr/include/c++/11/bits/shared_ptr_base.h:1342
#32 0x00006105b4cbce55 in std::shared_ptr<rclcpp::Node>::shared_ptr<std::allocator<rclcpp::Node>, char const (&) [22]> (this=0x7fff652e9170, __tag=...) at /usr/include/c++/11/bits/shared_ptr.h:409
#33 0x00006105b4caf263 in std::allocate_shared<rclcpp::Node, std::allocator<rclcpp::Node>, char const (&) [22]> (__a=...) at /usr/include/c++/11/bits/shared_ptr.h:863
#34 0x00006105b4ca25e5 in std::make_shared<rclcpp::Node, char const (&) [22]> () at /usr/include/c++/11/bits/shared_ptr.h:879
#35 0x00006105b4c925dc in test_trajectory_controllers::TrajectoryControllerTest::SetUp (this=0x6105b6b84ac0)
    at /ws/src/ros-controls/ros2_controllers/joint_trajectory_controller/test/test_trajectory_controller_utils.hpp:207
--Type <RET> for more, q to quit, c to continue without paging--
#36 0x00006105b4c96539 in TestTrajectoryActions::SetUp (this=0x6105b6b84ac0) at /ws/src/ros-controls/ros2_controllers/joint_trajectory_controller/test/test_trajectory_actions.cpp:62
#37 0x00006105b4c97cec in TestTrajectoryActionsTestParameterized::SetUp (this=0x6105b6b84ac0) at /ws/src/ros-controls/ros2_controllers/joint_trajectory_controller/test/test_trajectory_actions.cpp:211
#38 0x00006105b4d3e32b in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x6105b6b84ac0, method=&virtual testing::Test::SetUp(), location=0x6105b4d6afe3 "SetUp()")
    at /opt/ros/iron/src/gtest_vendor/src/gtest.cc:2433
#39 0x00006105b4d379b5 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x6105b6b84ac0, method=&virtual testing::Test::SetUp(), location=0x6105b4d6afe3 "SetUp()")
    at /opt/ros/iron/src/gtest_vendor/src/gtest.cc:2469
#40 0x00006105b4d1210a in testing::Test::Run (this=0x6105b6b84ac0) at /opt/ros/iron/src/gtest_vendor/src/gtest.cc:2503
#41 0x00006105b4d12bc9 in testing::TestInfo::Run (this=0x6105b6b29350) at /opt/ros/iron/src/gtest_vendor/src/gtest.cc:2684
#42 0x00006105b4d1332a in testing::TestSuite::Run (this=0x6105b6b177d0) at /opt/ros/iron/src/gtest_vendor/src/gtest.cc:2816
#43 0x00006105b4d1f949 in testing::internal::UnitTestImpl::RunAllTests (this=0x6105b6b12b90) at /opt/ros/iron/src/gtest_vendor/src/gtest.cc:5338
#44 0x00006105b4d3f3b2 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x6105b6b12b90, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x6105b4d1f552 <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x6105b4d6b9a0 "auxiliary test code (environments or event listeners)") at /opt/ros/iron/src/gtest_vendor/src/gtest.cc:2433
#45 0x00006105b4d38c07 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x6105b6b12b90, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x6105b4d1f552 <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x6105b4d6b9a0 "auxiliary test code (environments or event listeners)") at /opt/ros/iron/src/gtest_vendor/src/gtest.cc:2469
#46 0x00006105b4d1e12b in testing::UnitTest::Run (this=0x6105b4de6b00 <testing::UnitTest::GetInstance()::instance>) at /opt/ros/iron/src/gtest_vendor/src/gtest.cc:4925
#47 0x00006105b4d098e4 in RUN_ALL_TESTS () at /opt/ros/iron/src/gtest_vendor/include/gtest/gtest.h:2473
#48 0x00006105b4d0985d in main (argc=1, argv=0x7fff652e9958) at /opt/ros/iron/src/gmock_vendor/src/gmock_main.cc:63
(gdb) 

How to reproduce this error

Build given dockerfile using below commands

Shell 1:

docker build -f ros2_control_testing.Dockerfile -t ros2_control_testing .
docker run -it ros2_control_testing
# Inside docker container, just run
cd ws
source /opt/ros/iron/setup.bash
source install/setup.bash
export RMW_IMPLEMENTATION=rmw_zenoh_cpp
gdb ./build/joint_trajectory_controller/test_trajectory_actions
## Inside gdb
run

Just for running rmw_zenohd
Shell 2:

docker exec -it <running_ros2_container_name> bash
source /opt/ros/iron/setup.bash
source install/setup.bash
export RMW_IMPLEMENTATION=rmw_zenoh_cpp
ros2 run rmw_zenoh_cpp rmw_zenohd

Just let me know if i am able to help better at solving this error.

@Yadunund
Copy link
Member

Thanks for detailed logs. I will try reproducing the error.

@Yadunund Yadunund added the bug Something isn't working label Jun 5, 2024
@Yadunund
Copy link
Member

I tried with the latest rmw_zenoh and I haven't been able to reproduce the problem. All the test run albeit with several failures. See output: https://gist.github.com/Yadunund/5e483a8fabc4f85c30a14b28b43e9ed3

@CihatAltiparmak
Copy link
Contributor Author

CihatAltiparmak commented Jun 20, 2024

Hello @Yadunund , sorry for bad dockerfile. I forgot to add some flag to apt command and install gdb. This updated dockerfile should work.

I want to add some additional information. To produce this error, it might be necessary to run test several times because process sometimes gets stuck. You should run tests again when this stuck occurs. I think this is related to rmw_wait problem. Because unfortunately i saw that the failed test uses MultiThreadedExecutor which bring about data races inside rmw_wait. I observed some stuck when MultiThreadedExecutor is run with rmw_zenoh while SingleThreadedExecutor runs well with rmw_zenoh. For more information, you can take a look at here. (It seems you already took a look 😄 Btw i successfully ran moveit with rmw_zenoh using SingleThreadedExecutor and some additional modification.)

Maybe this issue may be related to rmw_wait issue.

@Yadunund
Copy link
Member

I've ran the tests several times in a row and haven't run into the issue yet. But there are known issues in rmw_wait so I won't discount what you're observing. We're working on some improvements on that front and I'll ping here once we have a branch for you to test against.

@CihatAltiparmak
Copy link
Contributor Author

CihatAltiparmak commented Jun 26, 2024

Just for friendly feedback, I've run the failed test case again as soon as your updates related to rmw_wait are merged. Now i receive a segmentation fault related to another malloc issue malloc(): smallbin double linked list corrupted. Below is my backtrace. I compiled with colcon build --symlink-install --packages-up-to rmw_zenoh_cpp --cmake-args -DCMAKE_BUILD_TYPE=Debug In addition, i also validated that rmw_fastrtps_cpp doesn't give any segmentation fault and process ends normally. I'm patiently waiting for your developments 👀 . Btw congrats for rmw_wait improvements 🎉 .

https://gist.githubusercontent.com/CihatAltiparmak/15ade30aef5903274802b9e1ea2cebc0/raw/2db93c36e9dcb2b57f0f0b39c6276926203a5990/rmw_wait_issue_another_malloc_problem_commit_e73df29cf80723f0e49e09d1f72cff5f1d342744.txt

@clalancette
Copy link
Collaborator

@CihatAltiparmak I wasn't able to reproduce the exact error you show in that gist, but I was able to reproduce a related error. I have a fix locally that works around that issue and allows all of the joint_trajectory_controller tests to pass. That said, I'm not sure the fix I have is the right one, so I need to talk to the Zenoh developers a bit more about it.

@clalancette
Copy link
Collaborator

Since we've merged in #228, I'm going to close this out as fixed. @CihatAltiparmak give it another test, and if you are still seeing problems please feel free to reopen.

@CihatAltiparmak
Copy link
Contributor Author

CihatAltiparmak commented Jul 6, 2024

Hello @clalancette ,
I ran the test cases again with your updates and i didn't see any issues related to malloc. But i found another problem and i decided to continue from another issue #240 . Unfortunately i cannot trigger this new bug in #240 so frequently this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants