es.tell() Takes Forever to Run and Returns "Process finished with exit code 137 (interrupted by signal 9:SIGKILL)" #272

deringezgin · 2024-10-22T03:11:12Z

Hi everyone,

I am trying to optimize the weights of an LSTM using the CMA-ES. In my current code, I create the LSTM model, initialize random weights, and create the CMA-ES model.

Following this, I ask for solutions from the CMA-ES, and I get a fitness value for each solution. When I have all the possible solutions, I update the "cma.CMAEvolutionStrategy" object using tell.

During this process, the program uses excessive memory, around 80 GB. Moreover, when I come to the es.tell part, the program takes forever to respond and returns the exit code 137 error in the title.

This is a pseudo-code of what I am doing:

model = LSTM(
        input_size=INPUT_SIZE,
        hidden_size=128,
        output_size=OUTPUT_SIZE,
        num_lstm_layers=1,
        num_fc_layers=3,
        fc_hidden_size=64
    )

start_weights = model.get_weights()
es = cma.CMAEvolutionStrategy(start_weights, sigma)
for i in range(100):
       gen_fitness = []
       solutions = es.ask()
       for solution in solutions:
                 gen_fitness.append(get_fitness(solution))
       es.tell(solutions, gen_fitness)

I hope that this is enough information to explain the problem, and I hope that you can help me with it. My program crashes in the first iteration of es.tell(), so this is not a memory piling-up issue.

The text was updated successfully, but these errors were encountered:

nikohansen · 2024-10-23T08:25:05Z

What is the size of start_weights?

deringezgin · 2024-10-23T14:57:40Z

Hi, thank you so much for your reply.

In the current configuration, the length of the LSTM start_weights is 102470.

I hope that this information helps. Have a nice day!

nikohansen · 2024-10-23T22:13:08Z

Looks like you are running out of memory which is quadratic in the above number (hence, if I am not mistaken, in the order of a few 100GB).

I suggest to replace

es = cma.CMAEvolutionStrategy(start_weights, sigma)

with

es = cma.CMAEvolutionStrategy(start_weights, sigma, {'CMA_diagonal': True})

and see how it goes.

deringezgin · 2024-10-23T22:40:58Z

Thank you for your answer.

I tried this and can see that I can run with these configurations.

What would be the downsides of optimizing such a model using a diagonal covariance matrix? I hope that you can clarify this for me. I would like to know what changes.

Thank you so much for your help.

nikohansen · 2024-10-24T10:47:33Z

The diagonal model is the middle in Figure 1 in the tutorial.

nikohansen · 2024-10-24T11:53:44Z

Otherwise, these sampler are alternatives for high dimension which do learn correlations too.

deringezgin · 2024-10-25T03:44:31Z

Thank you for your responses. I will check them out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

es.tell() Takes Forever to Run and Returns "Process finished with exit code 137 (interrupted by signal 9:SIGKILL)" #272

es.tell() Takes Forever to Run and Returns "Process finished with exit code 137 (interrupted by signal 9:SIGKILL)" #272

deringezgin commented Oct 22, 2024

nikohansen commented Oct 23, 2024

deringezgin commented Oct 23, 2024

nikohansen commented Oct 23, 2024

deringezgin commented Oct 23, 2024

nikohansen commented Oct 24, 2024

nikohansen commented Oct 24, 2024

deringezgin commented Oct 25, 2024

es.tell() Takes Forever to Run and Returns "Process finished with exit code 137 (interrupted by signal 9:SIGKILL)" #272

es.tell() Takes Forever to Run and Returns "Process finished with exit code 137 (interrupted by signal 9:SIGKILL)" #272

Comments

deringezgin commented Oct 22, 2024

nikohansen commented Oct 23, 2024

deringezgin commented Oct 23, 2024

nikohansen commented Oct 23, 2024

deringezgin commented Oct 23, 2024

nikohansen commented Oct 24, 2024

nikohansen commented Oct 24, 2024

deringezgin commented Oct 25, 2024