Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia binaries of HiGHS hang after solving a simple LP on Windows #1044

Closed
jajhall opened this issue Dec 13, 2022 · 32 comments
Closed

Julia binaries of HiGHS hang after solving a simple LP on Windows #1044

jajhall opened this issue Dec 13, 2022 · 32 comments
Assignees

Comments

@jajhall
Copy link
Member

jajhall commented Dec 13, 2022

No description provided.

@jajhall
Copy link
Member Author

jajhall commented Dec 13, 2022

I've just downloaded the v1.4.0 Windows binaries, and reproduced the errant behaviour: occasional "hanging" after HiGHS has solved the problem (forrest6). Yet I don't get it for (say) v1.2.1

I can't recreate it with a local build of HiGHS v1.4.0, which is unfortunate, as debugging via creation of the binaries is not possible.

@pjaborges
Copy link

pjaborges commented Dec 16, 2022

I experienced the same.
Windows 7 enterprise always good.
It stalks sometimes on windows 10 enterprise.

@jajhall
Copy link
Member Author

jajhall commented Dec 16, 2022

Thanks: that its behaviour varies from one Windows variant to another seems only to add to the difficulty of tracking down what's happening.

For the other user I've suggested to use the driver

https://github.com/ERGO-Code/HiGHS/blob/master/app/RunHighs.cpp

to create a local version of the command-line executable. It would have to be compiled, and then linked to the static library that comes with the executable that's failing to terminate. However, others are using the static library successfully to call HiGHS from their own C++ code.

Another work-around is to build HiGHS from source locally. Once CMake and the C++ compiler are set up properly, this works fine.

Finally, for the Python-oriented, HiGHS is in PyPI so HiGHS can be run using

pip install highspy

and then writing a Python driver. Note that it may be necessary to update "pip", and even "python"

@pjaborges
Copy link

pjaborges commented Dec 17, 2022

I did some more tests (win 7) that may help track this:

With this options setup for file output I get the message and doesn't terminate, even though the file is printed correctly in the selected directory.
image

With this setting, the file is also printed in the selected directory but it does not terminate.
image

If no options for solution output are used it works properly (but fails in win10 as mentioned previously).

@pjaborges
Copy link

pjaborges commented Dec 17, 2022

It seems to me that there is definitely something hanging highs to not terminate.
I call highs.exe from my console app in c# by starting a process with arguments. The issues above disappear once I call the dispose method (free resources used by the process) on the process. Maybe the connection to the output file??

@guifcoelho
Copy link
Contributor

Hello there! I had the same issue with the static executable for Windows but the one with shared libraries worked fine.

@jajhall
Copy link
Member Author

jajhall commented Dec 19, 2022

Thanks for your observations @pjaborges and @guifcoelho. The last "action" in
https://github.com/ERGO-Code/HiGHS/blob/master/app/RunHighs.cpp
writes out the model being solved if write_model_to_file=true

When write_model_to_file=true and the static executable hangs, the model is written OK. Since it also hangs occasionally when write_model_to_file=false, it seems safe to assume that HiGHS reaches line 83 of RunHighs.cpp.

This isn't going to be fixed in the short term, so the best advice for Windows users for whom HiGHS hangs in this way is to create their own executable by compiling and linking RunHighs.cpp.

@jajhall
Copy link
Member Author

jajhall commented Feb 9, 2023

Comments on #1137 give hope of fixing this

@jannicklange
Copy link
Contributor

I also stumbled over this in HiGHS 1.4.2.
I use the following options file:

write_solution_style=4
log_file=Highs_logs\2023-04-14_15-24-50-682_highs_0.log

I started Highs as an external process within a C# applicatoin. Sometimes the process would not exit.
The workaround that I used was to keep looking for the solution file, and kill the HiGHS process once the solution file was no longer accessed by HiGHS.
I.e. call this code in a while-loop (with some sensible Task.Delay) until no exception is thrown :D

try
{
    using (var dummy = fileName.Open(FileMode.Open, FileAccess.Read, FileShare.None))
    {
            
    }
}
catch
{
    this._logger.Log(LogLevel.Debug, $"Accessed file while it was still opened by HiGHS");
    continue;
}

@jajhall
Copy link
Member Author

jajhall commented Apr 14, 2023

Oh dear @jannicklange, I'm sorry you've had to do something so ugly. Can you not set threads=1 and prevent this?

I think we're going to have to set something like threads=4 by default, and modify it internally if it (or the value set by a user) exceeds half the threads available

@e-zaline
Copy link

Hello,
I seem to have a similar problem.

     546484     2.9477613845e+04 Pr: 0(0); Du: 1948(0.458696) 1713s
     548358     2.9475743023e+04 Pr: 0(0); Du: 0(1.90613e-09) 1722s
     548358     2.9475743023e+04 Pr: 0(0); Du: 0(1.90613e-09) 1722s
WARNING: Number of threads available = 4 < 8 = Simplex concurrency to be used: Parallel performance may be less than anticipated
Using EKK parallel dual simplex solver - PAMI with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
     548358     2.9475743023e+04 Pr: 5(0.00227596); Du: 0(4.38972e-09) 1724s
WARNING:    Increasing Markowitz threshold to 0.5
     548358     2.9475743023e+04 Pr: 5(0.00227596); Du: 0(4.38508e-09) 1774s
     548358     2.9475743023e+04 Pr: 5(0.00227596); Du: 0(4.38508e-09) 1825s
     548358     2.9475743023e+04 Pr: 5(0.00227596); Du: 0(4.38508e-09) 1875s

I am using Windows 11. I can provide an example if need be.
Thank you!

@jajhall
Copy link
Member Author

jajhall commented Sep 20, 2023

Thanks, but it's easy to reproduce. We just haven't got around to identifying an internal fix.

Just set 'threads=1'

@odow
Copy link
Collaborator

odow commented Dec 8, 2023

@jhay778 just encountered this working on OpenSolver. The fix of setting threads=1 seemed to fix it. It hung pretty frequently. It looked like some of the threads were not getting cleaned up properly after the solution file was written.

They were using https://github.com/JuliaBinaryWrappers/HiGHSstatic_jll.jl/releases/tag/HiGHSstatic-v1.6.0%2B0

@odow odow changed the title HiGHS v1.4.0 occasionally hanging after solving simple LP on Windows Julia binaries of HiGHS hang after solving a simple LP on Windows Dec 8, 2023
@odow odow self-assigned this Dec 8, 2023
@jhay778
Copy link

jhay778 commented Dec 14, 2023

Running highs.exe and observing in Process Monitor, seems that 9 threads are opened and only 2 are closed.
I can force close through task manager, causing all threads to close and process to exit. It solves the input model and writes to the solution file, but just does not terminate correctly.

Setting threads = 1 bypasses the issue for me.
Setting threads = 2 causes hanging infrequently.
Setting threads = 3 or 4 causes hanging more frequently.
Setting threads = 10 or 0 causes hanging essentially every call (most likely because only 9 threads are created when threads are unlimited anyway).

@jajhall
Copy link
Member Author

jajhall commented Dec 14, 2023

Thanks, this is useful information in addressing this issue

@NPC
Copy link

NPC commented Jul 8, 2024

@jhay778 (I know it was a while ago, so if you can remember) did you set threads via the command line, or via the parameters file? I tried the command line and get a “no such parameter” error.

PS This is still happening, v1.7.1 highs.exe on Win11. I hope to post more data once I've investigated more.

@jajhall
Copy link
Member Author

jajhall commented Jul 9, 2024

No, threads cannot be set as a command line parameter, only in a file

https://ergo-code.github.io/HiGHS/stable/executable/#Command-line-options

@NPC
Copy link

NPC commented Jul 12, 2024

@jajhall here are my observations, perhaps they'll help, but I understand that Windows is not your typical environment, so it's hard to solve issues you don't even observe.

The issue occurs on very simple LP problems, infrequently. In my stress test, it usually hangs just after 100s of iterations (adding a sleep between calls doesn't seem to help, as the issue linked in this discussion suggested). Here's the MPS I used in stress testing:

NAME          001-basic
ROWS
 N  _OBJ_
 L  R3
 L  R2
COLUMNS
    C1        _OBJ_     -.89000000000   R2        1.00000000000
    C1        R3        1.00000000000
RHS
    RHS1      R2       100.00000000000   R3       100.00000000000
BOUNDS
 UP BND1      C1        100.00000000000
ENDATA

It seems to only require presolve, here's the log (the “failed” iteration, I can't see any difference from the “good” ones):

Running HiGHS 1.7.1 (git hash: 43329e528): Copyright (c) 2024 HiGHS under MIT licence terms
LP   001-basic has 2 rows; 1 cols; 2 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [9e-01, 9e-01]
  Bound  [1e+02, 1e+02]
  RHS    [1e+02, 1e+02]
Presolving model
0 rows, 0 cols, 0 nonzeros  0s
0 rows, 0 cols, 0 nonzeros  0s
Presolve : Reductions: rows 0(-2); columns 0(-1); elements 0(-2) - Reduced to empty
Solving the original LP from the solution after postsolve
Model   status      : Optimal
Objective value     : -8.9000000000e+01
HiGHS run time      :          0.00

Setting parallel=off does NOT help, oddly, but threads=1 helps. So it's not even the main LP solver that causes the issue, but something more, ahem, infrastructural (which you likely already knew).

Also, on my desktop PC with 12 physical CPU cores (24 max threads) only threads=1 helps. But on my laptop with 8 cores (16 threads) it looks like threads=2 is stable. This is probably useless to you, but still odd.

PS In all of the above I'm using highs.exe, I switched to it from calling the DLL from C# after not being able to resolve #1547 (comment). HiGHS instability on Windows is a concern, to be honest, but so far we've been able to find workarounds, and your advice is always highly appreciated.

PPS Have you considered adding timestamps to each .log line? It's not a big deal, but would help seeing when the last log entry was updated, plus relative timings between lines. For appended logs it would help identify individual runs (including date).

@galabovaa
Copy link
Contributor

Thank you for the observations and comments, I will try again to dig in further into this one. You are right in saying that HiGHS instability on Windows is a concern

@NPC
Copy link

NPC commented Sep 10, 2024

@galabovaa Thank you for agreeing that this is a real concern. The problem of the process hanging is becoming increasingly disruptive for us, as it started to affect our customers. I'll try to introduce our own measures, e.g. attempt to kill the process after it saves a solution (ugh!), but an update from you would be highly appreciated if you had an opportunity to investigate this.

@jajhall
Copy link
Member Author

jajhall commented Sep 10, 2024

Setting the HiGHS option threads=1 appears to prevent it from hanging, and has minimal impact on performance.

@NPC
Copy link

NPC commented Sep 10, 2024

@jajhall performance drops by ≈30% for me, not negligible. But you are right, it seems to be the price to pay for stability. I just keep hoping that this is a temporary “solution”.

@jajhall
Copy link
Member Author

jajhall commented Sep 10, 2024

Solving a MIP?

@NPC
Copy link

NPC commented Sep 10, 2024

LP

@jajhall
Copy link
Member Author

jajhall commented Sep 10, 2024

Ah, with parallel simplex. I assume that people aren't using that.

If you're solving one-off LPs - or not hot starting after modifications - the interior point solver may well be faster. Have you tried it?

@NPC
Copy link

NPC commented Sep 10, 2024

@jajhall just tried ipm again (thanks for the reminder, I didn't check it for a long time!), it's more than x2 slower than parallel simplex with threads=1 (and x3 slower than parallel simplex with threads=0, sigh).

PS Btw, parallel simplex with threads=1 is still about 10% faster than serial simplex, so I plan to continue using it, even if you think it's an unpopular option.

@jajhall
Copy link
Member Author

jajhall commented Sep 10, 2024

You've got an interesting model there, particularly as parallel simplex with threads=1 is still about 10% faster than serial simplex. The expectation is that it's slower than vanilla dual simplex.

threads=0 means that HiGHS manages half the number of threads on the machine, but (usually) will only use 8 in the dual simplex solver. It's possible that setting threads=8 will get you the old performance, but avoid the issue of "hanging".

All this uses the default dual simplex concurrency of 8. This may not be the best for your problems. You can experiment by lowering this value by setting (for example)
`simplex_max_concurrency=4'.

Larger values for dual simplex concurrency are unlikely to give better performance, so it's not possible to set simplex_max_concurrency to a value greater than 8.

Would you be able to share a typical model with me somehow?

@NPC
Copy link

NPC commented Sep 10, 2024

Thank you for your advice! Setting simplex_max_concurrency to 4 still causes the issue to happen on my PC (in a stress test of 5,000 iterations), sadly.

In general, even if it worked, I don't think I can trust results from just one hardware. I don't have control over customer systems and can't tailor stress-tested custom configs for each one. E.g. see my feedback from earlier tests, “…on my desktop PC with 12 physical CPU cores (24 max threads) only threads=1 helps. But on my laptop with 8 cores (16 threads) it looks like threads=2 is stable.” (I meant that 1 is stable, but 2 is stable also)

Even with threads=1 I'm not fully confident it solves the issue completely… But it seems to help for now, until I hear otherwise.

Hard to say what's a typical model, but here's an example of a “slow” MPS I use for performance tests: 63593-slow.zip. My timings:

  • Simplex parallel threads=0: 319s
    • threads=1: 481s
  • Simplex serial: 550s
  • Interior point: 1068s

Thanks again for your quick responses and for looking into this.

@jajhall
Copy link
Member Author

jajhall commented Sep 11, 2024

Thanks for this.

The IPM solver fails after 800s, so reverts to serial simplex - which solves the problem in a further 775s

Vanilla dual simplex takes 745s

Dual simplex with concurrency/threads of 4 takes 516s

It'll be an interesting test for our new IPM solver

@jajhall
Copy link
Member Author

jajhall commented Sep 26, 2024

Closed by #1942

@jajhall jajhall closed this as completed Sep 26, 2024
@NPC
Copy link

NPC commented Oct 21, 2024

@jajhall @galabovaa my tests so far confirm that this issue is fixed in 1.8.0, thank you!

@jajhall
Copy link
Member Author

jajhall commented Oct 21, 2024

my tests so far confirm that this issue is fixed in 1.8.0, thank you!

Great: thanks for letting us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants