Windows multithreading test failure #55

frankplow · 2023-04-02T15:44:43Z

Had some time so did a little bit more research on the Windows CI (#52) test failure.

memsetting the entire lc->sao_buffer like bab47ca does not fix the issue, so I don't think the issue is related to #26. With this change, valgrind and clang's address sanitiser don't report any memory issues.

I have compiled FFmpeg directly with MSVC/MSYS2 (i.e. not via FFVS-Project-Generator) and the problem is similar so I don't think it's anything to do with the build files. I haven't yet got the gcc/MSYS2 toolchain or MinGW gcc cross-compilation working unfortunately.

I can't get the LTRP_A_ERICSSON_3 failure to reproduce on my machine, so I don't think there's anything special about this test. The tests which fail most frequently on my machine are:

LMCS_A_Dolby_3.bit
WPP_B_Sharp_2.bit

The failures only occur when running tests concurrently, they do not occur when running the tests individually or when running tests using a single thread. I don't know whether this points towards libavcodec/vvc_thread.c at all? ~~This line is part of what is preventing cross-compilation at the moment. Should it not be testing for compiler using _MSC_VER or something instead of checking the OS?~~ See #57 for fix. Don't believe this is related.

The text was updated successfully, but these errors were encountered:

nuomi2021 · 2023-04-03T14:01:25Z

Not sure MSYS2 has valgrind or not, if it has maybe you can use it to do some memory check.

frankplow · 2023-04-05T15:45:39Z

This issue is partially due to the lack of atomic operations for 8-bit types with MSVC winnt.h. Fixing this this will require an upstream change (see patch here) and then changing VVCFrameThread.avails to a atomic_uint *.

With these patches + bab47ca applied, the errors are mostly gone except for LTRP_A_ERICSSON_3 – maybe there is something special about this test case after all? I can now reproduce the errors when a single test is run, rather than as a part of the suite and the decoded MD5 is different each time. MSYS2 does not have valgrind unfortunately. I might try generating a VS solution with FFVVS-Project-Generator and debugging with VS – I see how that's handy already!

nuomi2021 · 2023-04-06T11:15:45Z

LTRP_A_ERICSSON_3
since Linux is always passed, it may be related to some invalid read/write too. maybe you can try valgrind on linux for this file. see what's happened.

I see how that's handy already!
😊

frankplow · 2023-04-06T14:26:58Z

The LTRP_A_ERICSSON_3 failure also affects Linux when assembly optimisations are enabled. I have created a new issue #59 for this.

nuomi2021 · 2023-04-30T03:41:55Z

Tried the current code b1c8bd1 with SLICES_A_HUAWEI_3.bit. We can still reproduce it. But every time the mismatch frame is different, even if I use a single thread.
Not easy to debug

frankplow · 2023-04-30T08:45:06Z

@nuomi2021 Is that with the memset to fix #26 (like bab47ca)?

nuomi2021 · 2023-05-02T14:41:46Z

Not sure, the memset will impact the thread scheduler. Even if the memset is ok, it does not mean we find the root cause.
If we can find a way to reproduce this with sing thread applications. like checkasm, it may help us debug.

frankplow · 2023-05-05T11:59:48Z

@nuomi2021 Sometimes, like here, ffvvc-test / windows/msvc/no asm fails so I don't think it is related to assembly optimisations.

There seem to be some bitstreams which fail much more frequently than others - maybe we could try identifying these and any similarities between them which may be suspect?

nuomi2021 · 2023-05-05T13:05:13Z

There are multi-slice or multiple-tile clips. But the wired thing is the failed blocks are not at the slice/tile boundary. Pretty hard to find out what's happened. A possible way to isolate the issue in my mind:

check fail history, put all fail-prone clips into a tmp directory.
set s->nb_fcs to 1 to disalbe thread.
run "ffmpeg.py tmp"
if it failed. it's maybe not mulitthread issue
try to run https://rr-project.org/ to capture datas
replay rr record to debug.

frankplow mentioned this issue Apr 2, 2023

Change Win32 #56

Closed

frankplow changed the title ~~MSVC Test Failure~~ Windows multithreading test failure Apr 6, 2023

frankplow mentioned this issue Apr 9, 2023

alf: correct stride typo to fix LTRP_A_ERICSSON_3.bit #60

Merged

nuomi2021 mentioned this issue Jun 19, 2023

inter: move refPicList from FrameContext to SliceContext #95

Merged

nuomi2021 closed this as completed in #95 Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows multithreading test failure #55

Windows multithreading test failure #55

frankplow commented Apr 2, 2023 •

edited

Loading

nuomi2021 commented Apr 3, 2023

frankplow commented Apr 5, 2023 •

edited

Loading

nuomi2021 commented Apr 6, 2023

frankplow commented Apr 6, 2023

nuomi2021 commented Apr 30, 2023

frankplow commented Apr 30, 2023

nuomi2021 commented May 2, 2023

frankplow commented May 5, 2023

nuomi2021 commented May 5, 2023

Windows multithreading test failure #55

Windows multithreading test failure #55

Comments

frankplow commented Apr 2, 2023 • edited Loading

nuomi2021 commented Apr 3, 2023

frankplow commented Apr 5, 2023 • edited Loading

nuomi2021 commented Apr 6, 2023

frankplow commented Apr 6, 2023

nuomi2021 commented Apr 30, 2023

frankplow commented Apr 30, 2023

nuomi2021 commented May 2, 2023

frankplow commented May 5, 2023

nuomi2021 commented May 5, 2023

frankplow commented Apr 2, 2023 •

edited

Loading

frankplow commented Apr 5, 2023 •

edited

Loading