Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maxstack calculation #3

Open
dmcclean opened this issue Feb 8, 2011 · 4 comments
Open

Maxstack calculation #3

dmcclean opened this issue Feb 8, 2011 · 4 comments

Comments

@dmcclean
Copy link
Contributor

dmcclean commented Feb 8, 2011

Partition III, Section 1.7.4 requires correct CIL instruction sequences include a limit on the height the stack may reach during execution of that sequence.

Relatedly, Section 1.7.5 requires that the shape of the stack is the same at each location in a valid instruction sequence no matter what series of branches resulted in entry to that location.

Doing an analysis of compliance with 1.7.5 gets you the answer to 1.7.4 as a side effect, and abstract interpretation to do the 1.7.5 analysis is only a little bit more complicated (track stack shapes including types, instead of just stack height), so it might be worth doing both. Doing the analysis for the .maxstack directive is actually required for correctness; unfortunately ilasm doesn't perform this analysis on its own, it relies on the input IL. (ilasm does do some other nice things though, including rewriting "long" branches to "short" branches where possible if you use the /optimize switch)

This is a medium-sized project, between writing the abstract interpretation and combing through the documents to record the behavior of each instruction.

@tomlokhorst
Copy link
Owner

I just read those two sections, and I agree it's a good idea to do those simultaneously.
The abstract interpreter might be a bit more complicated than the maxstack analysis, but I think a large chunk, if not most, of the work will be looking up the stack behaviour for all of the instructions.

The documentation for each instruction already contains a description of stack behaviour in textual form. If there's a separate data structure describing stack behaviour, the documentation could potentially even be generated from that.

I guess (off the top of my head) we need something like:

data StackEffect = StackEffect
  { pop  :: [TypeShape]
  , push :: [TypeShape]
  }

stackEffect :: OpCode -> StackEffect
stackEffect Add = StackEffect [undefined, undefined] [undefined]
stackEffect (CallVirt rt _ _ _ ps) =
   StackEffect (concatMap primitiveTypeToTypeShape ps)
               (primitiveTypeToTypeShape rt)

primitiveTypeToTypeShape :: PrimitiveType -> [TypeShape]
primitiveTypeToTypeShape Void = []
primitiveTypeToTypeShape _    = [undefined]

@dmcclean
Copy link
Contributor Author

dmcclean commented Feb 8, 2011

The machine-readable version (as a C header file) is in Partition VI, Annex C, Section C.2. It describes both the stack behavior and whether control goes straight through or potentially/unconditionally branches, etc.

This analysis probably does get more complicated in the face of exception handling code, that is a whole other ball of wax.

@dmcclean
Copy link
Contributor Author

dmcclean commented Feb 8, 2011

The file is opcodes.def, look in %ProgramFiles%\Microsoft SDKs\Windows\v7.0A\include

Your mileage may vary depending on which OS you are on, on 64-bit it goes in the x86 program files directory, and that version number can change too.

@dmcclean
Copy link
Contributor Author

I translated the opcodes.def file into Haskell. I'm working on the abstract analysis part for stack height only (not stack shape) because it is simpler to start with.

The one quirky thing is that the pop behavior of the ret opcode depends on the declared return type of the method where the opcode appears. All of the others either have a fixed behavior or have one that depends on their MethodRef (or for calli, which is also slightly squirrelly, its CallSiteDescr) parameter.

I'm making the analysis run in the error monad because the spec disallows a fair number of strange control flow patterns (Partition III, Section 1.7.2, 1.7.5, 1.7.6) specifically for the purpose of simplifying this analysis. I'm going to return an error if the maxstack calculation happens to discover a violation of those rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants