Process crashes when using two NLE instances sequentially (on MacOS for Debug Builds). #254

heiner · 2021-09-20T12:16:06Z

🐛 Bug

The test in #253 should pass but fails on MacOS for Debug builds.

To Reproduce

import random

import gym
import nle


ACTIONS = [0, 1, 2]


def main():
    envs = [gym.make("NetHackScore-v0") for _ in range(2)]

    env, *queue = envs
    env.reset()

    num_resets = 1

    while num_resets < 10:
        _, _, done, _ = env.step(random.choice(ACTIONS))
        if done:
            print("one env done")
            queue.append(env)
            env = queue.pop(0)
            print("about to reset one env")
            env.reset()
            num_resets += 1


main()

Environment

Collecting environment information...
NLE version: 0.7.3+08b9280
PyTorch version: 1.9.0
Is debug build: No
CUDA used to build PyTorch: None

OS: Mac OSX 11.5.1
GCC version: Could not collect
CMake version: version 3.20.0

Python version: 3.8
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] numpysane==0.34
[pip3] torch==1.9.0
[conda] blas 1.0 mkl
[conda] mkl 2019.4 233
[conda] mkl-service 2.3.0 py38h9ed2024_0
[conda] mkl_fft 1.3.0 py38ha059aab_0
[conda] mkl_random 1.1.1 py38h959d312_0
[conda] pytorch 1.9.0 py3.8_0 pytorch

The text was updated successfully, but these errors were encountered:

heiner · 2021-09-24T10:10:07Z

This appears to only trigger on my personal machine, not on CI or for anyone else. Closing for now.

heiner · 2021-11-29T21:22:15Z

OK, this does break on CI as well, but only (1) on MacOS, and (2) when using a Debug build: https://github.com/facebookresearch/nle/runs/4359406818?check_suite_focus=true

heiner · 2022-01-25T19:34:49Z

Issue demonstrated in #290.

This is the issue in #254.

Fail if libnethack is resident before dlopening. This is the issue in #254

heiner · 2022-02-08T19:55:14Z

Related to the dlopen/dlclose dance not actually closing in this specific case (which it never guaranteed to do), as in this issue.

Possible solution: https://gist.github.com/heiner/bc78064fec32174e1a216dbd5fbc6503

Based on investigation in https://gist.github.com/heiner/bc78064fec32174e1a216dbd5fbc6503 Fixes #254.

JupiLogy · 2023-03-16T15:33:07Z

Hi, just wondering if it crashes with an error message at all. I'm getting a Segmentation fault when running nle, specifically when the Nethack.reset() function is called - though it's not every time. Not sure if it's a separate issue. My MWE frustratingly didn't have the issue.

EDIT: I got it working by reducing the action space as I noticed it was specifically happening when executing specific actions.

heiner mentioned this issue Sep 21, 2021

Tests for NLE when using two environments sequentially or in threads #253

Merged

heiner closed this as completed Sep 24, 2021

heiner reopened this Nov 29, 2021

heiner changed the title ~~Process crashes when using two NLE instances sequentially.~~ Process crashes when using two NLE instances sequentially (on MacOS for Debug Builds). Nov 29, 2021

heiner pushed a commit that referenced this issue Feb 2, 2022

Fail if libnethack is resident before dlopening.

0c491f9

This is the issue in #254.

heiner mentioned this issue Feb 2, 2022

Fail if libnethack is resident before dlopening. #311

Merged

heiner pushed a commit that referenced this issue Feb 4, 2022

Fail if libnethack is resident before dlopening. (#311)

d75684b

Fail if libnethack is resident before dlopening. This is the issue in #254

heiner pushed a commit that referenced this issue Feb 10, 2022

Reset NetHack by overriding rw segments of dynamic library in memory.

bf81c8f

Based on investigation in https://gist.github.com/heiner/bc78064fec32174e1a216dbd5fbc6503 Fixes #254.

heiner mentioned this issue Feb 10, 2022

Reset NetHack by overriding rw segments of dynamic library in memory. #312

Closed

heiner mentioned this issue May 6, 2024

TODOs 2024 heiner/nle#6

Open

5 tasks

StephenOman mentioned this issue Sep 9, 2024

Process crashes when using two NLE instances sequentially (on MacOS for Debug Builds) heiner/nle#31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process crashes when using two NLE instances sequentially (on MacOS for Debug Builds). #254

Process crashes when using two NLE instances sequentially (on MacOS for Debug Builds). #254

heiner commented Sep 20, 2021 •

edited

Loading

heiner commented Sep 24, 2021

heiner commented Nov 29, 2021

heiner commented Jan 25, 2022

heiner commented Feb 8, 2022

JupiLogy commented Mar 16, 2023 •

edited

Loading

Process crashes when using two NLE instances sequentially (on MacOS for Debug Builds). #254

Process crashes when using two NLE instances sequentially (on MacOS for Debug Builds). #254

Comments

heiner commented Sep 20, 2021 • edited Loading

🐛 Bug

To Reproduce

Environment

heiner commented Sep 24, 2021

heiner commented Nov 29, 2021

heiner commented Jan 25, 2022

heiner commented Feb 8, 2022

JupiLogy commented Mar 16, 2023 • edited Loading

heiner commented Sep 20, 2021 •

edited

Loading

JupiLogy commented Mar 16, 2023 •

edited

Loading