Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legion: non deterministic assertion #1790

Closed
Tracked by #1032
mariodirenzo opened this issue Nov 11, 2024 · 5 comments
Closed
Tracked by #1032

Legion: non deterministic assertion #1790

mariodirenzo opened this issue Nov 11, 2024 · 5 comments

Comments

@mariodirenzo
Copy link

The CI of HTR++ is randomly tripping into the assertion

prometeo.exec: /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_context.cc:10931: virtual void Legion::Internal::InnerContext::receive_created_region_contexts(const std::vector<Legion::Internal::RegionNode*>&, const std::vector<Legion::Internal::EqKDTree*>&, std::set<Legion::Internal::RtEvent>&, const Legion::Internal::ShardMapping*, Legion::ShardID): Assertion `mapping == __null' failed.

I haven't been able to collect backtraces.
I'll update the issue as soon as I have more data.

@elliottslaughter, can you please add this issue to #1032?

@mariodirenzo
Copy link
Author

This is a backtrace to the assertion and I have a hanging process with an attached debugger to extract further info

(gdb) bt
#0  0x00007f64b1f929fd in nanosleep () from /lib64/libc.so.6
#1  0x00007f64b1f92894 in sleep () from /lib64/libc.so.6
#2  0x0000000006088bf8 in Realm::realm_freeze (signal=6) at /home/gitlab-runner/legion-debug-cmake/runtime/realm/runtime_impl.cc:206
#3  <signal handler called>
#4  0x00007f64b1f03387 in raise () from /lib64/libc.so.6
#5  0x00007f64b1f04a78 in abort () from /lib64/libc.so.6
#6  0x00007f64b1efc1a6 in __assert_fail_base () from /lib64/libc.so.6
#7  0x00007f64b1efc252 in __assert_fail () from /lib64/libc.so.6
#8  0x0000000005a4ea48 in Legion::Internal::InnerContext::receive_created_region_contexts (this=0x7f5c9c283590, created_nodes=..., created_trees=..., applied_events=..., mapping=0x7f5c9c285cb0, source_shard=0) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_context.cc:10931
#9  0x0000000005a4e873 in Legion::Internal::InnerContext::invalidate_created_requirement_contexts (this=0x7f5c9c28adc0, is_top=false, applied_events=..., shard_mapping=0x7f5c9c285cb0, source_shard=0) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_context.cc:10912
#10 0x0000000005a4e296 in Legion::Internal::InnerContext::invalidate_region_tree_contexts (this=0x7f5c9c28adc0, is_top_level_task=false, applied=..., mapping=0x7f5c9c285cb0, source_shard=0) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_context.cc:10855
#11 0x0000000005cc36da in Legion::Internal::ShardTask::trigger_task_commit (this=0x7f5c9c2869b0) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_tasks.cc:8057
#12 0x0000000005ca98a0 in Legion::Internal::TaskOp::trigger_children_committed (this=0x7f5c9c2869b0, precondition=...) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_tasks.cc:2029
#13 0x0000000005a48b99 in Legion::Internal::InnerContext::register_child_commit (this=0x7f5c9c28adc0, op=0x7f5c9409db00) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_context.cc:9361
#14 0x0000000005b3c12e in Legion::Internal::Operation::commit_operation (this=0x7f5c9409db00, do_deactivate=true, wait_on=...) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_ops.cc:2301
#15 0x0000000005b5caea in Legion::Internal::DeletionOp::trigger_commit (this=0x7f5c9409db00) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_ops.cc:10648
#16 0x0000000005a47b7f in Legion::Internal::InnerContext::process_trigger_commit_queue (this=0x7f5c9c28adc0) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_context.cc:9089
#17 0x0000000005a540a4 in Legion::Internal::InnerContext::handle_trigger_commit_queue (args=0x7f5c70ef7d00) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_context.cc:12148
#18 0x0000000005244b21 in Legion::Internal::Runtime::legion_runtime_task (args=0x7f5c70ef7d00, arglen=12, userdata=0x90a4070, userlen=8, p=...) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/runtime.cc:32493
#19 0x000000000606b69e in Realm::LocalTaskProcessor::execute_task (this=0x8e758c0, func_id=4, task_args=...) at /home/gitlab-runner/legion-debug-cmake/runtime/realm/proc_impl.cc:1176
#20 0x00000000060cf752 in Realm::Task::execute_on_processor (this=0x7f5c70ef7b80, p=...) at /home/gitlab-runner/legion-debug-cmake/runtime/realm/tasks.cc:326
#21 0x00000000060d3666 in Realm::KernelThreadTaskScheduler::execute_task (this=0x8e75cb0, task=0x7f5c70ef7b80) at /home/gitlab-runner/legion-debug-cmake/runtime/realm/tasks.cc:1421
#22 0x00000000060d24f7 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x8e75cb0) at /home/gitlab-runner/legion-debug-cmake/runtime/realm/tasks.cc:1160
#23 0x00000000060d2b0a in Realm::ThreadedTaskScheduler::scheduler_loop_wlock (this=0x8e75cb0) at /home/gitlab-runner/legion-debug-cmake/runtime/realm/tasks.cc:1272
#24 0x00000000060d995c in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock> (obj=0x8e75cb0) at /home/gitlab-runner/legion-debug-cmake/runtime/realm/threads.inl:97
#25 0x00000000060e5707 in Realm::KernelThread::pthread_entry (data=0x7f5c819d98b0) at /home/gitlab-runner/legion-debug-cmake/runtime/realm/threads.cc:854
#26 0x00007f64b3ff6ea5 in start_thread () from /lib64/libpthread.so.0
#27 0x00007f64b1fcbb0d in clone () from /lib64/libc.so.6
(gdb) f 8
#8  0x0000000005a4ea48 in Legion::Internal::InnerContext::receive_created_region_contexts (this=0x7f5c9c283590, created_nodes=..., created_trees=..., applied_events=..., mapping=0x7f5c9c285cb0, source_shard=0) at /home/gitlab-runner/legion-debug-cmake/runtime/legion/legion_context.cc:10931
10931	      assert(mapping == NULL);
(gdb) p mapping
$1 = (const Legion::Internal::ShardMapping *) 0x7f5c9c285cb0
(gdb) p *mapping
$2 = {<Legion::Internal::Collectable> = {references = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 1}, static is_always_lock_free = true}}, address_spaces = {<std::_Vector_base<unsigned int, std::allocator<unsigned int> >> = {
      _M_impl = {<std::allocator<unsigned int>> = {<__gnu_cxx::new_allocator<unsigned int>> = {<No data fields>}, <No data fields>}, <std::_Vector_base<unsigned int, std::allocator<unsigned int> >::_Vector_impl_data> = {_M_start = 0x7f5c9c285ce0, _M_finish = 0x7f5c9c285ce8, _M_end_of_storage = 0x7f5c9c285ce8}, <No data fields>}}, <No data fields>}}

@lightsighter
Copy link
Contributor

I think this is just an overzealous assertion that should be removed. What happens when you remove it?

@mariodirenzo
Copy link
Author

The CI works fine if I remove the assertion

@lightsighter
Copy link
Contributor

Removed.

@mariodirenzo
Copy link
Author

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants