Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volk_get_index broken, stuck in infinite loop #516

Closed
marcusmueller opened this issue Sep 20, 2021 · 17 comments · Fixed by #685
Closed

volk_get_index broken, stuck in infinite loop #516

marcusmueller opened this issue Sep 20, 2021 · 17 comments · Fixed by #685
Labels

Comments

@marcusmueller
Copy link
Member

While debugging gnuradio/gnuradio#5013, I stumbled across volk_get_index calling itself indefinitely when using volk_8u_x2_encodeframepolar_8u (at least on Deb11). No matter the reason for finding the right implementation of that kernel failing, this function mustn't go into infinite recursion.

Maybe, we should just actually return -1; and deal with that (by aborting) in the calling functions volk_rank_archs and volk.tmpl.c: ${kern.name}_manual. That would at least make debugging easier.

(gdb) bt
#0  volk_get_index (impl_name=<optimized out>, n_impls=<optimized out>, impl_names=<optimized out>) at ./lib/volk_rank_archs.c:44
#1  volk_rank_archs (kern_name=kern_name@entry=0x7fd65caaf788 "volk_8u_x2_encodeframepolar_8u", impl_names=impl_names@entry=0x7fd65cb8da90 <volk_machine_avx2_64_mmx_orc+67632>, impl_deps=impl_deps@entry=0x7fd65cb8db50 <volk_machine_avx2_64_mmx_orc+67824>,
    alignment=alignment@entry=0x7fd65cb8dbb0 <volk_machine_avx2_64_mmx_orc+67920>, n_impls=n_impls@entry=4, align=align@entry=true) at ./lib/volk_rank_archs.c:68
#2  0x00007fd65c986088 in __init_volk_8u_x2_encodeframepolar_8u () at ./obj-x86_64-linux-gnu/lib/volk.c:10997
#3  __volk_8u_x2_encodeframepolar_8u (frame=0x10d7a80 "", temp=0x12c1da0 "", frame_size=16) at ./obj-x86_64-linux-gnu/lib/volk.c:11022
#4  0x00007fd65b998230 in gr::fec::code::polar_decoder_sc_systematic::generic_work(void*, void*) () from /gnuradio/build/gr-fec/lib/libgnuradio-fec.so.3.10.0git
#5  0x00007fd65b95a37f in gr::fec::decoder_impl::general_work(int, std::vector<int, std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) () from /gnuradio/build/gr-fec/lib/libgnuradio-fec.so.3.10.0git
#6  0x00007fd65cf2de7a in gr::block_executor::run_one_iteration() () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#7  0x00007fd65cfa159a in gr::tpb_thread_body::tpb_thread_body(std::shared_ptr<gr::block>, std::shared_ptr<boost::barrier>, int) () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#8  0x00007fd65cf92b44 in gr::thread::thread_body_wrapper<gr::tpb_container>::operator()() () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#9  0x00007fd65d1bf787 in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:179
#10 0x00007fd65f0c3ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007fd65ee56def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) n

Thread 3 "fec_decoder4" hit Breakpoint 2, volk_get_index (impl_name=<optimized out>, n_impls=<optimized out>, impl_names=<optimized out>) at ./lib/volk_rank_archs.c:36
36	in ./lib/volk_rank_archs.c
(gdb) bt
#0  volk_get_index (impl_name=<optimized out>, n_impls=<optimized out>, impl_names=<optimized out>) at ./lib/volk_rank_archs.c:36
#1  volk_rank_archs (kern_name=kern_name@entry=0x7fd65caaf788 "volk_8u_x2_encodeframepolar_8u", impl_names=impl_names@entry=0x7fd65cb8da90 <volk_machine_avx2_64_mmx_orc+67632>, impl_deps=impl_deps@entry=0x7fd65cb8db50 <volk_machine_avx2_64_mmx_orc+67824>,
    alignment=alignment@entry=0x7fd65cb8dbb0 <volk_machine_avx2_64_mmx_orc+67920>, n_impls=n_impls@entry=4, align=align@entry=true) at ./lib/volk_rank_archs.c:68
#2  0x00007fd65c986088 in __init_volk_8u_x2_encodeframepolar_8u () at ./obj-x86_64-linux-gnu/lib/volk.c:10997
#3  __volk_8u_x2_encodeframepolar_8u (frame=0x10d7a80 "", temp=0x12c1da0 "", frame_size=16) at ./obj-x86_64-linux-gnu/lib/volk.c:11022
#4  0x00007fd65b998230 in gr::fec::code::polar_decoder_sc_systematic::generic_work(void*, void*) () from /gnuradio/build/gr-fec/lib/libgnuradio-fec.so.3.10.0git
#5  0x00007fd65b95a37f in gr::fec::decoder_impl::general_work(int, std::vector<int, std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) () from /gnuradio/build/gr-fec/lib/libgnuradio-fec.so.3.10.0git
#6  0x00007fd65cf2de7a in gr::block_executor::run_one_iteration() () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#7  0x00007fd65cfa159a in gr::tpb_thread_body::tpb_thread_body(std::shared_ptr<gr::block>, std::shared_ptr<boost::barrier>, int) () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#8  0x00007fd65cf92b44 in gr::thread::thread_body_wrapper<gr::tpb_container>::operator()() () from /gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime.so.3.10.0git
#9  0x00007fd65d1bf787 in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:179
#10 0x00007fd65f0c3ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007fd65ee56def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)

This is a tail recursion in volk_get_index. Sadly, it will never terminate (until it causes memory exhaustion in ctest, I guess).

volk_get_index is a bag of mixed emotions for me:

int volk_get_index(const char* impl_names[], // list of implementations by name
                   const size_t n_impls,     // number of implementations available
                   const char* impl_name     // the implementation name to find
)
{
    unsigned int i;
    for (i = 0; i < n_impls; i++) {
        if (!strncmp(impl_names[i], impl_name, 20)) {
            return i;
        }
    }
    // TODO return -1;
    // something terrible should happen here
    fprintf(stderr, "Volk warning: no arch found, returning generic impl\n");
    return volk_get_index(impl_names, n_impls, "generic"); // but we'll fake it for now
}
  1. strncmp(a,b,20): We'll happily compare equal impls that are only different after the first 20 characters.
  2. It never checks whether the impl that can't be found is the _generic one. If that is the case, it just recurses to looking up the generic one, and we end up where we are.

What is confusing is why neither volk_get_info nor the volk_profile tool know of volk_8u_x2_encodeframepolar_8u_generic. In fact, the latter doesn't seem to want to know any implementations of that at all.

Originally posted by @marcusmueller in gnuradio/gnuradio#5013 (comment)

@marcusmueller marcusmueller changed the title volk_get_index broken, needs to be fixed volk_get_index broken, stuck in infinite loop Sep 20, 2021
@jdemel
Copy link
Contributor

jdemel commented Sep 21, 2021

Thanks for the report and investigation!

If we could come up with a system that is smarter than "iterate over a list of strings and compare", that'd be great.

A couple of issues:

  1. We search for a substring and just hope the kernel name does not exceed 20 characters. What would be best practice in these cases? Rely on a string terminator?
  2. We never check if "generic" is findable. We should do so in case the first loop does not return.
  3. We need to decide what to do in case, this whole thing fails. return -1 seems to be correct.
  4. Our code to find, load and assign function pointers for our kernels is rather obscure. I assume we want to think about options to improve it. Memory leak in volk_load_preferences #440 is another obstacle and I'm not yet convinced threadsafe: Load preferences threadsafe #453 would be the correct approach to fix it.

@marcusmueller
Copy link
Member Author

We search for a substring and just hope the kernel name does not exceed 20 characters. What would be best practice in these cases? Rely on a string terminator?

Well, ideally, both the list of implementation and the substring would already be sanitized, so strcmp would work.

We never check if "generic" is findable. We should do so in case the first loop does not return.

agreed.

We need to decide what to do in case, this whole thing fails. return -1 seems to be correct.

But it needs to come with some error handling.

Our code to find, load and assign function pointers for our kernels is rather obscure. I assume we want to think about options to improve it.

Agreed. I especially think the way the impls of a kernel are stored is inelegant and doesn't reflect our needs: there's separate lists for _u and _a kernels, instead of one sorted-by-speed list, where the byte-aligment is just an integer property of the implementation description, that upon constructing things can be checked. In essence, a sorted list of the following struct would make more sense:

struct impl_descriptor {
  unsigned int rank; // from benchmarking, or 0 if not benchmarked yet
  machine_emum machine; //e.g. GENERIC, or MMX. Don't lug around strings for comparison.
  short alignment;
  function_ptr impl; // the actual function pointer we'll call
  char* full_machine_name;  // or string or whatever, something like generic_presorting or sse4_alternative, or zeroptr if "pure" machine
  // leaves us room for future extension, e.g. a field for in-placeness.
};

We could then have a kernel_descriptor and use it to select the best kernel:

struct kernel_descriptor {
  char* name; // 0-terminated

  //sorted list of impls, sorted by volk_profile rank, followed by machine, followed by alignment
  struct impl_descriptor **implementations; // 0-terminated list of pointers to impl descriptors
};

struct impl_descriptor* find_best_match(struct kernel_descriptor* kernel,
                                        enum machine mach,
                                        short aligment) {
  struct impl_descriptor *current_impl = kernel->implementations[0];
  while(current_impl != 0) {
    if(current_impl->machine == mach && current_impl->alignment <= alignment)
      return current_impl;
    current_impl++;
  }
}

honestly, the whole magic would be in sorting **implementations, and that would be 4 lines of code in C++11, but is an error-prone ride in C; not quite sure why we need our initialization to be C, not C++?

@jdemel
Copy link
Contributor

jdemel commented Sep 23, 2021

Regarding C vs. C++:
Everything that goes into libvolk... is written in C. At least it is supposed to.
All the apps and tools are written in C++. e.g. tests, volk_profile, ...
Further, we provide a C++ wrapper around libvolk. That one needs some fixes that break our API.

If we want to change this, we need to discuss it and make a conscious decision.

An immediate fix here might be:

// add this if statement
if (strncmp(impl_name, "generic", 20)) {
    return -1;
}
// proceed 
return volk_get_index(impl_names, n_impls, "generic");

Though, it may cause issues wherever the return goes.

We might just up the character limit

if (!strncmp(impl_names[i], impl_name, 42)) // instead of 20

Or we could figure out how many characters are available in impl_names[i] and use that value. I assume all impl names are \0 terminated.

Besides, I'd like to sketch an idea

Most users just use the function pointer, or _u and _a versions thereof. In these cases, we need to load the correct pointers on the first call and then just leave them there. Our current mechanism works but has issues like the one we discuss.

e.g. volk_profile needs the _manual function that enables us to select a specific implementation. However, it calls this function as well.
I'd like to split this into 2 parts. One function that returns the desired function pointer and one that does manual. So users may use the function pointers manually. We had request about that already.
Also, we can better separate the "Find the correct impl" and the "profile an impl" part.

jdemel added a commit to jdemel/volk that referenced this issue Sep 26, 2021
This function results in an infinite loop on Debian 11 for some impls.
This is a first step to fix it.

Fix gnuradio#516

Signed-off-by: Johannes Demel <[email protected]>
jdemel added a commit to jdemel/volk that referenced this issue Sep 26, 2021
This function results in an infinite loop on Debian 11 for some impls.
This is a first step to fix it.

Fix gnuradio#516

Signed-off-by: Johannes Demel <[email protected]>
jdemel added a commit to jdemel/volk that referenced this issue Oct 2, 2021
This function results in an infinite loop on Debian 11 for some impls.
This is a first step to fix it.

Fix gnuradio#516

Signed-off-by: Johannes Demel <[email protected]>
jdemel added a commit to jdemel/volk that referenced this issue Oct 2, 2021
This function results in an infinite loop on Debian 11 for some impls.
This is a first step to fix it.

Fix gnuradio#516

Signed-off-by: Johannes Demel <[email protected]>
@jdemel
Copy link
Contributor

jdemel commented Oct 4, 2021

@marcusmueller Thanks for:

podman run --cap-add=SYS_PTRACE -it --rm gnuradio/ci:debian-11-3.10

I'll try it as soon as I can.

@jdemel
Copy link
Contributor

jdemel commented Oct 8, 2021

I'd like to add a few more specifics. I work in a Docker Container:

FROM debian:bullseye
RUN         apt-get update && apt-get install -y libvolk2-dev libvolk2-bin libvolk2-doc

I build and run the container with:

docker build -t debian11-bin .
docker run -it --rm debian11-bin

In that container, I run volk_profile with various args. Most run through without an issue. The one that causes an issue:

# volk_profile -n -R polar   

RUN_VOLK_TESTS: volk_8u_x3_encodepolarpuppet_8u(131071,1987)
generic completed in 1374.23 ms
u_ssse3 completed in 732.772 ms
u_avx2 completed in 718.598 ms
a_ssse3 completed in 741.293 ms
a_avx2 completed in 620.256 ms
Best aligned arch: a_avx2
Best unaligned arch: u_avx2
RUN_VOLK_TESTS: volk_32f_8u_polarbutterflypuppet_32f(131071,1987)
^C

I have to kill the last process because nothing ever happens. Except a threads causes 100% CPU load and the fans on my machine yell at me. On the same machine, source build, but outside that container:

volk_profile -n -R polarbutter 

RUN_VOLK_TESTS: volk_32f_8u_polarbutterflypuppet_32f(131071,1987)
generic completed in 6013.26 ms
u_avx completed in 894.084 ms
u_avx2 completed in 861.248 ms
Best aligned arch: u_avx2
Best unaligned arch: u_avx2

So it works on a Ubuntu 20.04 host. @maitbot reports that volk_profile runs flawlessly on Debian 11 machines. This might be a container issue. Anywaus, let's try a Debian 11 build in a container.

FROM debian:bullseye
RUN apt-get update && apt-get install -y build-essential git cmake python3-mako python3-distutils liborc-dev

Run the container and build VOLK inside:

docker build -t volk-debian11 .
docker run -it -v "$(pwd)":/opt --rm volk-debian11
# cd /opt
# mkdir docker-build
# cd docker-build
# cmake ..
# make -j8
# ./apps/volk_profile -n -R polarbutter
Warning: this IS a dry-run. Config will not be written!
RUN_VOLK_TESTS: volk_32f_8u_polarbutterflypuppet_32f(131071,1987)
generic completed in 5502.05 ms
u_avx completed in 872.164 ms
u_avx2 completed in 877.575 ms
Best aligned arch: u_avx
Best unaligned arch: u_avx
Warning: this was a dry-run. Config not generated

I toggle 2 build options in my VOLK build

--- i/CMakeLists.txt
+++ w/CMakeLists.txt
@@ -49,6 +49,7 @@ if(HAVE_C_LIMITED_RANGE)
 endif(HAVE_C_LIMITED_RANGE)
 
 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")
+#set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -fsanitize=undefined")
 set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall")
 add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)

Specifically, I add the build flag -fsanitize=undefined. If it is commented out, volk_profile -n -R polarbutter goes through as expected. With -fsanitize=undefined, RUN_VOLK_TESTS: volk_32f_8u_polarbutterflypuppet_32f(131071,1987) just hangs. On the plus side, the issue should be easier to reproduce. However, we still don't know what exactly breaks.

@marcusmueller
Copy link
Member Author

marcusmueller commented Oct 8, 2021 via email

@jdemel
Copy link
Contributor

jdemel commented Oct 8, 2021

Summary: I couldn't reproduce any of those issues. See below.

Unfortunately, I was to quick. Toggling -fsanitize=undefined has a serious performance impact, but is probably not the cause of this issue.
With -fsanitize=undefined

RUN_VOLK_TESTS: volk_32f_8u_polarbutterflypuppet_32f(131071,100)
generic completed in 6923.07 ms
u_avx completed in 921.302 ms
u_avx2 completed in 921.018 ms
Best aligned arch: u_avx2
Best unaligned arch: u_avx2
Warning: this was a dry-run. Config not generated

without -fsanitize=undefined:

Warning: this IS a dry-run. Config will not be written!
RUN_VOLK_TESTS: volk_32f_8u_polarbutterflypuppet_32f(131071,100)
generic completed in 310.311 ms
u_avx completed in 45.4236 ms
u_avx2 completed in 58.3035 ms
Best aligned arch: u_avx
Best unaligned arch: u_avx
Warning: this was a dry-run. Config not generated

The generic version takes >22x the execution time. The u_avx implementation >20x time. With the default 1987 iterations, it should not be a surprise that "the kernel just hangs" is a result. The volk_profile test with the Debian 11 binary shows the same poor performance but finishes.

The only clear indicator for the infinite loop would be if your terminal is flooded with

Volk warning: no arch found, returning generic impl

Even with the GNU Radio CI container: gnuradio/ci:debian-11-3.10 I can't reproduce this infinite recursion. I tried volk_profile -R encodepolar -i 100 as well. The same result, it is slow but it finishes.

I can't reproduce the issue here or gnuradio/gnuradio#5013 .

@marcusmueller
Copy link
Member Author

it should not be a surprise that "the kernel just hangs" is a result.

Hm, could you attach a debugger inside the container or at least perf top on the outside? We shouldn't be guessing here, when we have the tools to get backtraces ;)

@argilo
Copy link
Member

argilo commented Oct 15, 2022

I did a bit of digging, and this seems to be due to a problematic debian patch: gnuradio/gnuradio#5013 (comment)

@argilo
Copy link
Member

argilo commented Oct 15, 2022

Would it be possible to have the build fail if any kernel is missing a generic implementation?

@jdemel
Copy link
Contributor

jdemel commented Oct 17, 2022

Actually, this would be a good requirement. It may require a bit of digging where to implement this check.

Besides, thanks for your investigation. This sheds quite a bit of light on the issue.

This seems to be the culprit: Debian specific patch to VOLK

@jdemel
Copy link
Contributor

jdemel commented Dec 25, 2022

I suggest to close this issue because we could trace the source to a patch outside this repo. I hope this bug won't reproduce with later releases.

jdemel added a commit to jdemel/volk that referenced this issue Jan 14, 2023
This function results in an infinite loop on Debian 11 for some impls.
This is a first step to fix it.

Fix gnuradio#516

Signed-off-by: Johannes Demel <[email protected]>
@argilo
Copy link
Member

argilo commented Jul 17, 2023

I agree that this issue can be closed off, since it was not a bug in VOLK itself.

It might still be worth improving the build process so that it fails if any kernel is missing a generic implementation, but that could be tracked in a separate issue.

@jdemel
Copy link
Contributor

jdemel commented Sep 15, 2023

We found the root cause of this bug outside VOLK. I'm closing this issue now.

@jdemel jdemel closed this as completed Sep 15, 2023
@argilo
Copy link
Member

argilo commented Oct 12, 2023

Unfortunately this is not entirely fixed. There are some kernels which do not have an implementation named generic and invoking any of them while VOLK_GENERIC is set still results in an infinite loop. The kernels are:

  • volk_32f_asin_32f: only has u_generic
  • volk_32f_exp_32f: only has u_generic
  • volk_32f_s32f_add_32f: generic is inaccessible due to a bug; fixed in Fix include guards #641
  • volk_32fc_s32f_magnitude_16i: generic is inaccessible due to a bug; fixed in Fix include guards #641
  • volk_32u_reverse_32u: no generic at all

@argilo
Copy link
Member

argilo commented Oct 12, 2023

Initially I thought that asserting the presence of a generic implementation in every kernel (in volk_kernel_defs.py) would be a good idea, but the presence of volk_32u_reverse_32u (with variously named implementations and no generic) suggests to me that maybe kernels aren't required to have a generic implementation and therefore code that assumes the presence of generic is incorrect:

// TODO return -1;
// something terrible should happen here
fprintf(stderr, "Volk warning: no arch found, returning generic impl\n");
return volk_get_index(impl_names, n_impls, "generic"); // but we'll fake it for now

// If we've defined VOLK_GENERIC to be anything, always return the
// 'generic' kernel. Used in GR's QA code.
char* gen_env = getenv("VOLK_GENERIC");
if (gen_env) {
return volk_get_index(impl_names, n_impls, "generic");
}

@argilo
Copy link
Member

argilo commented Oct 12, 2023

run_volk_tests also searches for the generic implementation so that it can use it as the reference implementation, but it at least falls back to index 0 when there isn't one:

volk/lib/qa_utils.cc

Lines 716 to 723 in a26a1b8

// and now compare each output to the generic output
// first we have to know which output is the generic one, they aren't in order...
size_t generic_offset = 0;
for (size_t i = 0; i < arch_list.size(); i++) {
if (arch_list[i] == "generic") {
generic_offset = i;
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants