Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use before assignment of calldata struct instance inside a function does not throw an error #15483

Open
haoyang9804 opened this issue Oct 7, 2024 · 7 comments
Labels

Comments

@haoyang9804
Copy link
Contributor

haoyang9804 commented Oct 7, 2024

Description

The following program is a trivially correct one. I copy data from calldata to memory.

contract A {
  struct S {
    int a;
  }
  function f (S calldata s) public pure {
    S memory s3;
    s3 = s;
  }
}

Then I mutate this test program into the following:

contract A {
  struct S {
    int a;
  }
  function f (S calldata s) public pure {
    S calldata s2;
    S memory s3;
    s3 = s2; // fail: This variable is of calldata pointer type and can be accessed without prior assignment, which would lead to undefined behaviour.
  }
}

This test program causes an error, saying that This variable is of calldata pointer type and can be accessed without prior assignment, which would lead to undefined behaviour.. This is understandable. calldata is used to receive data from other contracts. So the calldata in a function without initialization is empty and should be initialized first.

I continue the mutation by initializing s2 first like the below and it passed the compilation.

contract A {
  struct S {
    int a;
  }
  function f (S calldata s) public pure {
    S calldata s2 = s;
    S memory s3;
    s3 = s2; // pass
  }
}

Now I wonder if the compiler can find an incorrect initialization of a calldata slot. So I initialize the calldata slot with itself, an initialized calldata slot, to get the following program:

contract A {
  struct S {
    int a;
  }
  function f (S calldata s) public pure {
    S calldata s2;
    s2 = s2;
    S memory s3;
    s3 = s2; // pass
  }
}

Interestingly, the above test program does not trigger an expected error with message such as This variable is of calldata pointer type and can be accessed without prior assignment, which would lead to undefined behaviour. but passed the compilation.

Environment

  • Compiler version:0.8.28-develop.2024.9.30+commit.33b67f0a
  • Operating system: macos

Steps to Reproduce

Just compile the above programs and you will reproduce the compilation results.

@ekpyron
Copy link
Member

ekpyron commented Oct 7, 2024

This is somewhat intentional as a deliberate workaround to allow complex initialization patterns. I.e. the following is correct in the sense that there can never be an uninitialized access, but it's in general infeasible to detect this at compile-time:

contract C {
    struct A {
        uint256 x;
    }
    struct B {
        A a1;
        A a2;
    }
    function f(B calldata b, bool c) external pure returns (uint256) {
        A calldata a;
        if (c) {
            a = b.a1;
        }
        if (!c) {
            a = b.a2;
        }
        return a.x;
    }
}

Which is why deliberately assigning to self can be used to work around the detection, i.e. this works as deliberate workaround:

contract C {
    struct A {
        uint256 x;
    }
    struct B {
        A a1;
        A a2;
    }
    function f(B calldata b, bool c) external pure returns (uint256) {
        A calldata a;
        a = a;
        if (c) {
            a = b.a1;
        }
        if (!c) {
            a = b.a2;
        }
        return a.x;
    }
}

Now the question is whether allowing this mode of workaround is generally a good thing - the idea was that it's rather unlikely that an assignment to self happens unintentionally and is thus safe to allow for allowing otherwise valid patterns through analysis. But since this is a known workaround against this kind of analysis, we'd likely need to consider changing the behavior here a breaking change.

@haoyang9804
Copy link
Contributor Author

haoyang9804 commented Oct 7, 2024

I see. Thanks for the clarification.

@ekpyron
Copy link
Member

ekpyron commented Oct 7, 2024

Not sure if this is specifically documented, but what happens for c ? a : b is that you need a common type between a and b - which is the case just if a and b have the same type - or if a's type is implicitly convertible to the type of b - or if b's type is implicitly convertible to the type of a. In the latter two cases, the respective implicit conversion will happen and the ternary gets the "common type".

So e.g. for a in memory and b in calldata, c ? a : b will be in memory and b will be copied from calldata to memory.

@haoyang9804
Copy link
Contributor Author

Not sure if this is specifically documented, but what happens for c ? a : b is that you need a common type between a and b - which is the case just if a and b have the same type - or if a's type is implicitly convertible to the type of b - or if b's type is implicitly convertible to the type of a. In the latter two cases, the respective implicit conversion will happen and the ternary gets the "common type".

So e.g. for a in memory and b in calldata, c ? a : b will be in memory and b will be copied from calldata to memory.

Following this understanding, I found an issue:

contract C{
  struct S {
    int a;
  }
  S st;
  function f(S calldata cd) public {
    st = cd; // pass, S calldata -> S storage
    true ? st : cd; // fail: True expression's type struct C.S storage pointer does not match false expression's type struct C.S calldata.
  }
}

Does it reveal the inconsistency of type checking?

@ekpyron
Copy link
Member

ekpyron commented Oct 22, 2024

That's a tricky example :-). st in st = cd, i.e. on the left-hand-side of an assignment, is a storage reference - assigning to it will trigger a deep copy from cd to st. However st in true ? st : cd (so in right-hand-side context) is a storage pointer. And there is no conversion from S calldata -> S storage (pointer).

You can also see the following fail:

contract C{
  struct S {
    int a;
  }
  S st;
  function f(S calldata cd) public {
    S storage sp = st;
    sp = cd;
  }
}

If that passed, then c ? st : cd would also be expected to pass - but it doesn't.

Now that's the technical explanation in terms of the type system. But on a higher-level, it would be very weird behaviour if c ? st : cd was allowed and typed as S storage. The only way to interpret that semantically would be that if c is false, it would be a storage pointer to st, but then what happens if c is true? You need a storage pointer then as well - where should we point? You certainly won't expect cd to be copied to st in that case.

The only way to make c ? st : cd work semantically would be to copy both arguments to memory - but the current typing rules don't allow for that, and for good reason, since you're not guaranteed that each pair of types has a unique type that both are implicitly convertible to.

@haoyang9804
Copy link
Contributor Author

That's a tricky example :-). st in st = cd, i.e. on the left-hand-side of an assignment, is a storage reference - assigning to it will trigger a deep copy from cd to st. However st in true ? st : cd (so in right-hand-side context) is a storage pointer. And there is no conversion from S calldata -> S storage (pointer).

You can also see the following fail:

contract C{
  struct S {
    int a;
  }
  S st;
  function f(S calldata cd) public {
    S storage sp = st;
    sp = cd;
  }
}

If that passed, then c ? st : cd would also be expected to pass - but it doesn't.

Now that's the technical explanation in terms of the type system. But on a higher-level, it would be very weird behaviour if c ? st : cd was allowed and typed as S storage. The only way to interpret that semantically would be that if c is false, it would be a storage pointer to st, but then what happens if c is true? You need a storage pointer then as well - where should we point? You certainly won't expect cd to be copied to st in that case.

The only way to make c ? st : cd work semantically would be to copy both arguments to memory - but the current typing rules don't allow for that, and for good reason, since you're not guaranteed that each pair of types has a unique type that both are implicitly convertible to.

Really thanks for your patient clarification. It helps a lot for my building the semantics model of Solidity to generate Solidity programs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants
@ekpyron @haoyang9804 and others