Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error if there are conflicting keys after merge #1053

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

RaulWCosta
Copy link

What?

Change the default behavior of merge to raise an error when there is a conflict in the key names after merge.
Right now, if there is a key conflict after merge, the last key specified silently overwrites the other. As shown in the test case below.

https://github.com/omry/omegaconf/blob/master/tests/test_create.py#L416

def test_yaml_merge() -> None:
    cfg = OmegaConf.create(
        dedent(
            """\
            a: &A
                x: 1
            b: &B
                y: 2
            c:
                <<: *A
                <<: *B
                x: 3
                z: 1
            """
        )
    )
    assert cfg == {"a": {"x": 1}, "b": {"y": 2}, "c": {"x": 3, "y": 2, "z": 1}}

Why?

I think there are two main points to justify this change:

  1. Duplicated keys are syntactically invalid for YAML files, as shown in the quote below (from https://yaml.org/spec/1.2-old/spec.html#id2764044)

The content of a mapping node is an unordered set of key: value node pairs, with the restriction that each of the keys is unique.

  1. This behavior can cause unexpected problems in running time for OmegaConf's users.

I executed the tests in a Windows 10 machine, using Python 3.10.8 running in a Conda env.

I hope this PR is useful :)

@omry
Copy link
Owner

omry commented Jan 18, 2023

Can you reproduce the problem without using yaml anchors?

@RaulWCosta
Copy link
Author

Can you reproduce the problem without using yaml anchors?

Without anchors, i.e. by loading a file with direct conflict in the keys (not 100% that was your question), it raises an error as expected. The behavior in this PR also raises an error for that case.

Do you think would be good to add a test case for that scenario?

@omry
Copy link
Owner

omry commented Jan 19, 2023

Generally speaking, anchors sucks, and you should use interpolations instead.
We are considering disabling anchors support by default to address #794.

z: 1
"""
)
)
assert cfg == {"a": {"x": 1}, "b": {"y": 2}, "c": {"x": 3, "y": 2, "z": 1}}
assert cfg == {"a": {"x": 1}, "b": {"y": 2}, "c": {"x": 1, "y": 2, "w": 3, "z": 1}}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change in behavior?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous test would raise an error considering the new behavior of anchor merge

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there previous test would not raise an error, why is the fix for it to change the expected return value?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the behavior proposed by this PR the previous test raises an error.

This PR intend to raise an error in the case there is conflict in the keys after resolving the anchors. In this test (before the change proposed) inside c: {} we have a conflict in the x key.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused because your fix for the raised error is to change the expected output instead of expecting the error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the expected output to still have an "it works" case for the anchor resolution (I created another test case to check the raising of the error). Do you think that makes sense? Otherwise, I can surely modify it to expect an error.

@RaulWCosta
Copy link
Author

Generally speaking, anchors sucks, and you should use interpolations instead. We are considering disabling anchors support by default to address #794.

Got it! I haven't tested that scenario with interpolation, but I can do it today :)

@RaulWCosta
Copy link
Author

Generally speaking, anchors sucks, and you should use interpolations instead. We are considering disabling anchors support by default to address #794.

I didn't manage to make the example you added in #794 run with OmegaConf.load (maybe there is some aditional step that I'm not aware of?)

# test.yml
lol1: "lol"
lol2: ["${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}"]

Output I got:

{'lol1': 'lol', 'lol2': ['${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}']}

Besides that, it looks like this topic is an ongoing discussion in the PR. So maybe, while this is not defined, would be useful to fix this behavior in merge.

@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 27, 2023

I didn't manage to make the example you added in #794 run with OmegaConf.load (maybe there is some aditional step that I'm not aware of?)

@RaulWCosta see the docs on OmegaConf interpolations.

from omegaconf import OmegaConf

yaml_data = """
# test.yml
lol1: "lol"
lol2: ["${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}"]
"""

config = OmegaConf.create(yaml_data)

assert config.lol2[0] == "lol"  # lazily access interpolated value ${lol1} -> "lol"

# Try printing `config` before and after calling OmegaConf.resolve:

print(config)
# prints {'lol1': 'lol', 'lol2': ['${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}']}

# Calling resolve would likely fill up your ram if you have lol1,lol2,...lol10 defined
OmegaConf.resolve(config)

print(config)
# prints {'lol1': 'lol', 'lol2': ['lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol']}

@omry
Copy link
Owner

omry commented Jan 28, 2023

@Jasha10, I am actually not sure interpolation can be used here: OmegaConf does not provide a parallel to YAML merge (e.g. <<: *A).

@RaulWCosta
Copy link
Author

I didn't manage to make the example you added in #794 run with OmegaConf.load (maybe there is some aditional step that I'm not aware of?)

@RaulWCosta see the docs on OmegaConf interpolations.

from omegaconf import OmegaConf

yaml_data = """
# test.yml
lol1: "lol"
lol2: ["${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}","${lol1}"]
"""

config = OmegaConf.create(yaml_data)

assert config.lol2[0] == "lol"  # lazily access interpolated value ${lol1} -> "lol"

# Try printing `config` before and after calling OmegaConf.resolve:

print(config)
# prints {'lol1': 'lol', 'lol2': ['${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}', '${lol1}']}

# Calling resolve would likely fill up your ram if you have lol1,lol2,...lol10 defined
OmegaConf.resolve(config)

print(config)
# prints {'lol1': 'lol', 'lol2': ['lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol', 'lol']}

Appreciate!

@RaulWCosta
Copy link
Author

Hey @omry really appreciate your time! Do you think it would be best if I close this PR?

@RaulWCosta RaulWCosta closed this May 15, 2023
@omry
Copy link
Owner

omry commented May 15, 2023

I would still like to get this reviewed.
I am not currently working on OmegaConf but there might be some new maintainers coming soon.

@omry omry reopened this May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants