Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

Process crashes when using two NLE instances sequentially (on MacOS for Debug Builds). #254

Open
heiner opened this issue Sep 20, 2021 · 5 comments

Comments

@heiner
Copy link
Contributor

heiner commented Sep 20, 2021

🐛 Bug

The test in #253 should pass but fails on MacOS for Debug builds.

To Reproduce

import random

import gym
import nle


ACTIONS = [0, 1, 2]


def main():
    envs = [gym.make("NetHackScore-v0") for _ in range(2)]

    env, *queue = envs
    env.reset()

    num_resets = 1

    while num_resets < 10:
        _, _, done, _ = env.step(random.choice(ACTIONS))
        if done:
            print("one env done")
            queue.append(env)
            env = queue.pop(0)
            print("about to reset one env")
            env.reset()
            num_resets += 1


main()

Environment

Collecting environment information...
NLE version: 0.7.3+08b9280
PyTorch version: 1.9.0
Is debug build: No
CUDA used to build PyTorch: None

OS: Mac OSX 11.5.1
GCC version: Could not collect
CMake version: version 3.20.0

Python version: 3.8
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] numpysane==0.34
[pip3] torch==1.9.0
[conda] blas 1.0 mkl
[conda] mkl 2019.4 233
[conda] mkl-service 2.3.0 py38h9ed2024_0
[conda] mkl_fft 1.3.0 py38ha059aab_0
[conda] mkl_random 1.1.1 py38h959d312_0
[conda] pytorch 1.9.0 py3.8_0 pytorch

@heiner
Copy link
Contributor Author

heiner commented Sep 24, 2021

This appears to only trigger on my personal machine, not on CI or for anyone else. Closing for now.

@heiner heiner closed this as completed Sep 24, 2021
@heiner
Copy link
Contributor Author

heiner commented Nov 29, 2021

OK, this does break on CI as well, but only (1) on MacOS, and (2) when using a Debug build: https://github.com/facebookresearch/nle/runs/4359406818?check_suite_focus=true

@heiner heiner reopened this Nov 29, 2021
@heiner heiner changed the title Process crashes when using two NLE instances sequentially. Process crashes when using two NLE instances sequentially (on MacOS for Debug Builds). Nov 29, 2021
@heiner
Copy link
Contributor Author

heiner commented Jan 25, 2022

Issue demonstrated in #290.

heiner pushed a commit that referenced this issue Feb 2, 2022
heiner pushed a commit that referenced this issue Feb 4, 2022
Fail if libnethack is resident before dlopening.

This is the issue in #254
@heiner
Copy link
Contributor Author

heiner commented Feb 8, 2022

Related to the dlopen/dlclose dance not actually closing in this specific case (which it never guaranteed to do), as in this issue.

Possible solution: https://gist.github.com/heiner/bc78064fec32174e1a216dbd5fbc6503

@JupiLogy
Copy link

JupiLogy commented Mar 16, 2023

Hi, just wondering if it crashes with an error message at all. I'm getting a Segmentation fault when running nle, specifically when the Nethack.reset() function is called - though it's not every time. Not sure if it's a separate issue. My MWE frustratingly didn't have the issue.

EDIT: I got it working by reducing the action space as I noticed it was specifically happening when executing specific actions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants