Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Transition to Control Flow Regions #1676

Open
wants to merge 113 commits into
base: main
Choose a base branch
from

Conversation

phschaad
Copy link
Collaborator

@phschaad phschaad commented Oct 8, 2024

This PR completes the transition to hierarchical control flow regions in DaCe. By nature of the significance in change that is brought through the transition to hierarchical control flow regions, this PR is rather substantial. An exhaustive listing of all adaptations is not feasible, but the most important changes and adaptations are listed below:

  • Change the default of the Python frontend to generate SDFGs using experimental CFG blocks. A subsequent PR will remove the option to not use experimental CFG blocks entirely, but this was left to a separate PR to avoid growing this one even more than it already has.
  • The option to write a pass or transformation that is not compatible with experimental blocks has been removed, forcing new transformations and passes to consider them in their design.
  • Simplifications to loop related transformations by adapting explicit loop regions.
  • Add a new pass base type, ControlFlowRegionPass: This pass works like StatePass or ScopePass, and can be extended to write a pass that applies recursively to each control flow region of an SDFG. An option can be set to either apply bottom-up or top-down.
  • A pass has been added to dead code elimination to prune empty or falsy conditional branches.
  • Include a control flow raising pass in the simplification pipeline, ensuring that even SDFGs generated without the explicit use of experimental blocks are raised to the new style SDFGs.
  • Adapt all passes and transformations currently in main DaCe to work with SDFGs containing experimental CFG blocks.
  • Almost all transformations and analyses now expect that experimental blocks are used for regular / reducible control flow, meaning some control flow analyses have been ditched to improve overall performance and reliability of DaCe, and remove redundancy.
  • Ensure all compiler backends correctly handle experimental blocks.
  • Adapt state propagation into a separate pass that has been made to use experimental blocks. Legacy state propagation has been left in for now, including tests that ensure it works as intended, to avoid making this PR even larger. However, it is planned to remove this in a subsequent PR soon.
  • A block fusion pass has been added to the simplification pipeline. This operates similar to StateFusion, but fuses no-op general control flow blocks (empty states or control flow blocks) with other control flow blocks. This reduces the number of nodes and edges in CFGs further.
  • Numerous bugfixes with respect to experimental blocks and analyses around them, thanks to the ability to now run the entire CI pipeline with them.

Note: The FV3 integration test fails and will continue to fail with this PR, since GT4Py cartesian, which is used by PyFV3, does not consider experimental blocks in their design. Since DaCe v1.0.0 will be released without this PR in it, my suggestion is to limit the application of the FV3 integration and regression tests to PRs which are made to a specific v1.0.0 maintenance branch, which is used for fixes to v1.0.0.

@phschaad phschaad added the no-ci Do not run any CI or actions for this PR label Oct 8, 2024
@tbennun tbennun removed their request for review October 29, 2024 18:30
@phschaad
Copy link
Collaborator Author

phschaad commented Nov 25, 2024

Now that 1.0 is released it is time to revisit this. I would be in favor of merging this ASAP due to the humongous change, which makes divergence a real issue. There are additionally several future additions which are at this point based on this branch, further complicating things when left unmerged. @tbennun thoughts?

@tbennun
Copy link
Collaborator

tbennun commented Nov 25, 2024

Sounds good, why are some tests failing?

@alexnick83
Copy link
Contributor

I have started a review but I don't think I will be able to finish it before sometime next week.

@phschaad
Copy link
Collaborator Author

@tbennun The only tests that are failing are expected to fail (FV3):

Note: The FV3 integration test fails and will continue to fail with this PR, since GT4Py cartesian, which is used by PyFV3, does not consider experimental blocks in their design. Since DaCe v1.0.0 will be released without this PR in it, my suggestion is to limit the application of the FV3 integration and regression tests to PRs which are made to a specific v1.0.0 maintenance branch, which is used for fixes to v1.0.0.

@tbennun
Copy link
Collaborator

tbennun commented Nov 26, 2024

@tbennun The only tests that are failing are expected to fail (FV3):

Note: The FV3 integration test fails and will continue to fail with this PR, since GT4Py cartesian, which is used by PyFV3, does not consider experimental blocks in their design. Since DaCe v1.0.0 will be released without this PR in it, my suggestion is to limit the application of the FV3 integration and regression tests to PRs which are made to a specific v1.0.0 maintenance branch, which is used for fixes to v1.0.0.

Why are they failing though (with the missing symbol)? Is this strictly a GT4Py API use problem or is there something else we can do with used_symbols?

@phschaad
Copy link
Collaborator Author

phschaad commented Nov 29, 2024

This is very hard to debug, but I did some investigating. I suspect the error observed in the pipeline is a delayed symptom of one (or more likely, many) earlier issues. When I set up the entire pyFV3-DaCe CI pipeline locally and attempt running it, there are various issues that appear before I even get to the error observed here. For instance:

  • GT4Py seems to have a bug in the DaCe backend during deserialization of certain nodes where default property values are not handled correctly (probably went under the radar so far because we only recently switched to not saving default values)
  • There are various instances where GT4Py expects node type X (usually SDFGState) but receives node type Y (usually a subclass of ControlFlowBlock)

Seeing this I conclude that this is not something we can easily address in this PR, but rather there will need to be a concentrated effort to adapt GT4Py to DaCe 1.x individually. The errors in the CI are not verbose or fine grained enough to fix these issues. I would argue that part of the motivation behind the 1.0 release was to be able to accept such a breakage on main, since there is a stable - and still maintainable - version of DaCe for GT4Py to use while adapting to DaCe 1.x (dace.next 🙃 )

@phschaad
Copy link
Collaborator Author

phschaad commented Nov 29, 2024

@FlorianDeconinck tagging you so you are aware of this discussion and can chime in :-)

@FlorianDeconinck
Copy link
Contributor

Quick notes:

  • First issue should be fixed. I pushed something a few months ago on this issue, I am a bit wondering why it's back -_-
  • We are in the process of cleaning up the bridge between gt4py.cartesian and dace which hopefully puts us in a nicer spot to fix all of it.

As always - and with 1.0 we indeed have a safe place to be- we are not here to block progress. We can unhook the test. My only ask would be to retain some future-Phillipp time so we can have support in moving to the Control Flow Regions

@phschaad
Copy link
Collaborator Author

phschaad commented Dec 2, 2024

My only ask would be to retain some future-Phillipp time so we can have support in moving to the Control Flow Regions

Absolutely, I am happy to help in adapting this change. In most places the adaptation is rather easy and involves small API changes.

@FlorianDeconinck
Copy link
Contributor

Ping @twicki @romanc - the future is breaking the bridge (in its current state).

When we unhook the bridge, please log an issue on the DaCe base I can link to on our side to re-hook everything once we transitioned.

@FlorianDeconinck
Copy link
Contributor

@tbennun One thing to note for us, is how this potentially interact with the Schedule Tree work. We would want a first version of the bridge to V1 SDFG so we can do a true A-Z and preliminary bench. Then we are happy to move into V2 realm, update our bridge and get the stree along

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 👀 In review
5 participants