Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

Glyph ID handling #21

Open
heiner opened this issue May 12, 2020 · 8 comments
Open

Glyph ID handling #21

heiner opened this issue May 12, 2020 · 8 comments
Assignees

Comments

@heiner
Copy link
Contributor

heiner commented May 12, 2020

Currently, each dungeon tile (ignoring the char/color/specials observation that's also available) is an int16 between 0 and nethack.MAX_GLYPH == 5976. We use an embedding lookup table of that size embedding_dim == 32. That's 5976 * 32 == 191232 floating points, or 191232 * 16 == 3059712 bits, or ~0.3MB. That doesn't seem too much but there's some issues with the embedding itself. Also, it does not give the agent a cue that certain ids (e.g., dog and large dog) are more related than others (large dog vs wall).

The way these glyphs are organized is that first come all the monsters (NUMMONS many, which is 381), then pets (again NUMMONS many because in theory every monster can be tame, then a single glyph for an invisible monster (GLYPH_INVIS_OFF, which is 762), then a glyph for each "detected" monster (again NUMMONS many). For some obscure reason, then there's corpses, which are not monsters (but there's NUMMONS many), and then there's ridden monsters, which are monsters (NUMMONS many). The check glyph_is_monster(glyph) does this:

#define glyph_is_monster(glyph)                            \
    (glyph_is_normal_monster(glyph) || glyph_is_pet(glyph) \
     || glyph_is_ridden_monster(glyph) || glyph_is_detected_monster(glyph))

This makes a list like [i for i in range(nethack.MAX_GLYPH) if nethack.glyph_is_monster(i)] have length nethack.NUMMONS*4 == 1524, but it's not contiguous.

Cf. https://github.com/fairinternal/NetHack/blob/rl/win/rl/helper.cc#L37 for a list of the offsets and take a look at the comment in https://github.com/fairinternal/NetHack/blob/rl/include/display.h#L235 explaining this.

After monsters there's MAXPCHARS == 96 cmap entries for dungeon features, then there's zap beams (NUM_ZAP << 2 == 8 << 2 == 32 many). Then there's NUMMONS << 3 == 3048 (!) "swallow" glyphs. That's a lot for stuff that basically never happens to our agents. Then there's WARNCOUNT == 6 warning glyphs and finally NUMMONS statue glyphs.

As a graphic representation, the glyph ids are:

MMMMMMPPPPPPDDDDDD%%%%%RRRRRROOOOOOOCXZSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSTTTTTT
MonstsPets--DetectBody-RiddenObjectsCXZSwaaaaaaalllllllllllooooooooooowwww-----------Statue

Where

glyph_labels = {
    GLYPH_MON_OFF: "M",  # 6.38%
    GLYPH_PET_OFF: "P",  # 6.38%
    GLYPH_INVIS_OFF: " ",  # 0.02%
    GLYPH_DETECT_OFF: "D",  # 6.38%
    GLYPH_BODY_OFF: "%",  # 6.38%
    GLYPH_RIDDEN_OFF: "R",  # 6.38%
    GLYPH_OBJ_OFF: "O",  # 7.58%
    GLYPH_CMAP_OFF: "C",  # 1.46%
    GLYPH_EXPLODE_OFF: "X",  # 1.05%
    GLYPH_ZAP_OFF: "Z",  # 0.54%
    GLYPH_SWALLOW_OFF: "S",  # 51.00%
    GLYPH_WARNING_OFF: "W",  # 0.10%
    GLYPH_STATUE_OFF: "T",  # 6.38%
    MAX_GLYPH: "-",
}

More than half of all glyph ids are swallow!

We should rethink the featurization of the glyph ids.

@heiner heiner self-assigned this May 12, 2020
heiner pushed a commit that referenced this issue Jul 15, 2020
This is OS dependent (mail daemon etc). We should really start
doing a better featurization (cf. #21).
@heiner heiner mentioned this issue Jul 29, 2020
@aleSuglia
Copy link

Hey @heiner, I'm changing the original agent implementation and I was thinking to use a different embedding representation. Is this issue still valid? I saw that the PyTorch people closed the issue on their side.

@heiner
Copy link
Contributor Author

heiner commented Nov 4, 2020

The PyTorch issue has to do with the speed of embeddings and is more of an aside.

This issue here describes the fact that glyph ids are not great for ML necessarily (e.g., over half of all glyph ids are of type "swallow", 99% of which will never show up in the actual game).

We are experimenting with ways to preprocess these glyphs in our agent code.

@aleSuglia
Copy link

Thanks for clarifying this. So I assume this doesn't have an effect on the NeurIPS code release right?

@dmadeka
Copy link
Contributor

dmadeka commented Jul 30, 2021

Is there a mapping between the glyph id and the monster name? It would help to featurize attributes of the monster

@heiner
Copy link
Contributor Author

heiner commented Jul 30, 2021

@dmadeka
Copy link
Contributor

dmadeka commented Jul 30, 2021

Amazing - thank you!

Is there a doc for whats exposed through the FFI? (Im guessing you alls are using pybind11)

@heiner
Copy link
Contributor Author

heiner commented Jul 30, 2021

Not documentation per se but the actual source code shouldn't be too hard to read: https://github.com/facebookresearch/nle/blob/master/win/rl/pynethack.cc#L502

@dmadeka
Copy link
Contributor

dmadeka commented Jul 30, 2021

no - it isnt too hard to read! Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants