Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run crashes on Linux if run outside of main thread #70

Closed
alexhroom opened this issue Aug 21, 2024 · 15 comments
Closed

run crashes on Linux if run outside of main thread #70

alexhroom opened this issue Aug 21, 2024 · 15 comments

Comments

@alexhroom
Copy link
Collaborator

On Linux (including IDAaaS), attempting to run the following script will produce a segfault:

from concurrent.futures import ThreadPoolExecutor

import RATapi as RAT

executor = ThreadPoolExecutor()
future = executor.submit(RAT.examples.DSPC_standard_layers)
project, results = future.result()

This is an issue because RasCAL-2 needs long-running code (e.g. RAT.run()) to run outside the main loop, as the main loop is used for handling and updating the GUI itself. It seems to run successfully on Windows (@DrPaulSharp managed to run it okay). If anyone else would like to run the script on whatever devices they have to hand that'd be useful!

Various observations:

  • This happens for any of the standard_layers or custom_XY examples but NOT the custom_layers examples which run successfully.
  • On DSPC_standard_layers it happens no matter what procedure is chosen.
  • From where it crashes, it seems that we enter the C++ code before it breaks; if using a non-calculate procedure, the Running [procedure] message is produced before it segfaults.
@DrPaulSharp
Copy link
Collaborator

Thanks Alex, I can confirm that I can run the test ok on windows, but I get a seg fault on linux (specifically through using the Windows Subsystem for Linux).

@alexhroom
Copy link
Collaborator Author

some debugging: running in gdb python3:
for custom XY:

Thread 80 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff4de00640 (LWP 11047)]
0x00007ffff38780d3 in RAT::SLDFunction (x=2.8571428571428572, SLD=..., sldVal=...) at cpp/RAT/coder_array.h:164
164	    T& operator[](SZ _index) {

for standard layers:

Thread 80 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff4de00640 (LWP 11879)]
0x00007ffff387fc56 in RAT::allocateLayersForContrast (contrastLayers=..., outParameterisedLayers=..., useImaginary=useImaginary@entry=0 '\000', thisContrastLayers_data=thisContrastLayers_data@entry=0x2af80, thisContrastLayers_size=thisContrastLayers_size@entry=0x7fff4ddbe568) at cpp/RAT/allocateLayersForContrast.cpp:56
56	          thisContrastLayers_data[i1 + n * i] = 0.0;

@alexhroom
Copy link
Collaborator Author

it seems that there are various issues around the GIL when it comes to calling Python from inside C threads?
python/cpython#111034
https://docs.python.org/3/c-api/init.html#non-python-created-threads
unsure if the problem may be that

@StephenNneji
Copy link
Contributor

Irrespective of this issue, I will recommend using a Process instead of a python thread as the latter will affect the responsiveness of the GUI. The main problem with the python process is that it uses pickling to copy inputs i.e. Project and Controls but because of some issue the Project class cannot be pickled (Apologies, for some reason we did not open an issue on GitHub).

We could either try to fix the pickling issue so we can use the run function

RAT.run(project, control)

or we could just pickle the inputs to RATMain

RATapi.rat_core.RATMain(problem_definition, cells, limits, cpp_controls, priors )

@alexhroom
Copy link
Collaborator Author

alexhroom commented Aug 22, 2024

@StephenNneji in the GUI code i was using a QtCore.QThread, which seems to be the recommended way to keep long-running things 'out of the way' without affecting the GUI. the issue above uses a regular Python thread in order to be a smaller reproduceable example (and to confirm that the segfault is on our end and not Qt's!)

@StephenNneji
Copy link
Contributor

As far as I know, QThread are just a wrapper on the python thread https://stackoverflow.com/questions/1595649/threading-in-a-pyqt-application-use-qt-threads-or-python-threads except if something has been changed in PyQt6

@alexhroom
Copy link
Collaborator Author

alexhroom commented Aug 22, 2024

yeah, with the main benefits of the wrapper being that it handles signals/slots and so on to avoid the thread choking up the GUI! as in the post:

The main difference is that QThreads are better integrated with Qt (asynchrnous signals/slots, event loop, etc.).

will try it with a ProcessPoolExecutor when we can get the thing pickled correctly anyway and see if it's more performant

@DrPaulSharp
Copy link
Collaborator

Memory checking shows an Invalid write of size 8 in the following code from allocateLayersForContrast.cpp:

      n = coder::internal::intlength(contrastLayers.size(0), contrastLayers.size
        (1));
      thisContrastLayers_size[0] = n;
      thisContrastLayers_size[1] = 5;
      for (i = 0; i < 5; i++) {
        for (i1 = 0; i1 < n; i1++) {
          thisContrastLayers_data[i1 + n * i] = 0.0;
        }
      }

Specifically, the line thisContrastLayers_data[i1 + n * i] = 0.0;. This line is merely the initialisation of the array, so does not read in any RAT data. As far as I can tell, the array indices are correct, and the array is of type double, so I'm not sure what the cause of the invalid memory access is.

@DrPaulSharp
Copy link
Collaborator

We also find a memory issue in SLDFunction.cpp. In this code, we construct arrays for all values less than and greater than a particular function. It seems to me that there is a possibility that in the way this is done in the C++ code there is the possibility of an empty array, but this doesn't occur in the matlab code.

@alexhroom
Copy link
Collaborator Author

alexhroom commented Sep 2, 2024

fascinatingly, it works completely fine if RAT is imported inside the thread:

from concurrent.futures import ThreadPoolExecutor


def thread_func():
    import RATapi as RAT

    project, results = RAT.examples.DSPC_standard_layers()
    return project, results

executor = ThreadPoolExecutor()
future = executor.submit(thread_func)
project, results = future.result()

doesn't segfault. even stranger is that if you import it outside the thread and then again inside it does segfault, but i may just be missing some detail of python import implementation here:

from concurrent.futures import ThreadPoolExecutor

import RATapi as RAT

def thread_func():
    import RATapi as THREADRAT

    project, results = THREADRAT.examples.DSPC_standard_layers()
    return project, results

executor = ThreadPoolExecutor()
future = executor.submit(thread_func)
project, results = future.result()

@DrPaulSharp
Copy link
Collaborator

DrPaulSharp commented Sep 2, 2024

I've had a fresh look at allocateLayersForContrast.m in the MATLAB source:

image

@arwelHughes Can you please confirm whether the code on line 35 should instead read: thisContrastLayers(i, :) = []; instead of thisContrastLayers(1, :) = [];. I can't guarantee this bug is causing the memory issues, but we should look to fix it urgently regardless.

@arwelHughes
Copy link

arwelHughes commented Sep 3, 2024 via email

@DrPaulSharp
Copy link
Collaborator

Thanks Arwel. Yes, the disp lines are for debugging.

@alexhroom
Copy link
Collaborator Author

since we've merged #74 i've found that RATMain does not produce a segfault in a separate process (only a separate thread). so this issue is a lot less urgent since we can use processes for multiprocessing in the GUI!

@DrPaulSharp
Copy link
Collaborator

closed by #78

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants