Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pipeline composition RFC #723

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

Hardcode84
Copy link
Contributor

Please review these guidelines to help with the review process:

  • Have you provided a meaningful PR description?
  • Have you added a test, a reproducer, or a reference to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • If this PR is a work in progress, are you filing the PR as a draft?
  • Have you organized your commits logically and ensured each can be built by itself?

LogicalResult run(Operation *op);
};
```
`PipelineSchedule` object encapsulates compiled pipeline graph. Main method is `LogicalResult run(Operation *op);` which follows existing MLIR `PassManager::run`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by the compiled pipeline graph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PipelineGraph object is populated by set of pipelines with dependencies, and and then it compiles them into some internal representation which runs those pipelines in order, according to those dependencies.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the schedule the result of the linearization of the DAG or is it the class that will linearize it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PipelineSchedule is linearized DAG, createPipelineSchedule will do the linearization.

ArrayRef<StringRef> predecessors,
ArrayRef<StringRef> successors,
ArrayRef<StringRef> jumpTargets,
std::function<void(OpPassManager &)> populateFunc);
Copy link
Contributor

@chencha3 chencha3 Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the pipeline is a set of Patterns populated in populateFunc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipeline is set of passes.


## Motivation

TBD use cases from IREE, TPP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help reader to start with a motivation. I assume that the dependency-based graph would avoid some mistake when user configure the pipeline manually and unintentionally break the dependency. Is it correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded motivation.

After user populated the graph object they must call `createPipelineSchedule` method to compile the resulted graph into runnable schedule.
`createPipelineSchedule` will build a DAG from pipelines dependencies provided by user, and will try to get linear execution order to satify these dependencies.

If two pipelines doesn't have direct and indirect dependencies, order in which they will be executed is not specified, but stable.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's asking too much of this framework. I'd say "stability is depending on the passes accepting canonical forms from each other", and we make sure we always run canonicalization between DAG nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I only meant order of pipelines/passes is stable regardless of in which order registerPipelines were called (in POC impl I'm just sorting by pipeline name first to make it stable), but yes, I can remove this for more implementation freedom.

LogicalResult run(Operation *op);
};
```
`PipelineSchedule` object encapsulates compiled pipeline graph. Main method is `LogicalResult run(Operation *op);` which follows existing MLIR `PassManager::run`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the schedule the result of the linearization of the DAG or is it the class that will linearize it?

Passes inside pipeline can set this attribute to indicate they want compilatin flow to jump to the specific point.
After current pipeline is finished, runtime will check if module object have attribute set and if it does, jump to the selected pipeline and clear the attribute.

Setting attribute to the value, which wasnt in `jumpTargets` for the current pipeline will result in error and abort the compilation flow.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jumpTargets seem to be used for control flow.

I'd create a conditional and looping semantics instead as a type of sub-graph.

For example a pipeline node that lists a bunch of passes (or sub-nodes) and has arity (ex. until-converge-max-n). Or another that has two sub nodes with a select from an IR property (ex. DLTI target information).

Giving users the ability to jump to arbitrary targets is a foot gun that we might not want to create.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, for looping until convergence/fixed point I've added llvm/llvm-project#87166.

It's works fine for simple cases like canonicalization+CSE, but in numba-mlir I had a dozen of passes in the potential loop from multiple different pipelines so I wanted an explicit control when I want to loop.

Copy link

@rengolin rengolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some items are missing from our discussion, mainly how to build the DAG and how to schedule it.

Building the DAG

Building a DAG is simply taking all passes and insert in the first available slot (similar to tree insertion). Since there is no implicit ordering for passes, this may be restricted to O(n^2).

We could reduce the complexity by creating sub-graphs inside sub-graphs and connecting the super-graphs together.

For example:

  • All passes before bufferization are a sub-graph that leads into bufferization. There is no implicit order (needs to be scheduled). This is equivalent in saying bufferization depends on all of those passes, but explicitly joining all nodes into a single one.
  • Bufferization as a node with all cleanups
  • Same for vectorization, lowering, etc.
         /----\        /----\        /----\
Ingress -------- Buff -------- Vect -------- Lower -> HW
         \----/        \----/        \----/

Where Buff, Vect and Lower are fixed sequences of passes (per target, so can be conditional).

Scheduling

Each of those sub-graphs above will need to be scheduled. This is just graph scheduling, and can be much simpler if we hide loops and conditionals inside nodes.

Loops become a single node that is guaranteed to finish (run until convergence, but stop hard at N iterations, where N is configurable but less than a global MaxN).

If we follow the sub-graph design, then scheduling is always restricted to the sub-graph. This works well with a recursive algorithm that schedules the outer-most graph, then descends into sub-graphs, expanding them in linear from.

Cleaning up

After the graph is linear (with potential loop and conditional nodes), we can start the cleanup, for example, de-duplicating passes that have no writable transforms in between.

Failure

Failure can happen at any stage above and the error message must make clear which stage and what happened. Failed creating a DAG, sub-DAG, scheduling some sub-graph, etc.

void registerPipeline(
StringRef name,
ArrayRef<StringRef> predecessors,
ArrayRef<StringRef> successors,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also avoid having both predecessors and sucessors. This feels like a duplication and hard to get right on larger graphs.

What I had in mind is just:

  • Dependencies: Passes that you must run before (analyses and transforms)
  • Post-clean up: Canonicalization that can help the following passes

Dependencies can be bundles or specific passes. Bundles can be just a list of passes (ex. buff+ownership), a loop or a conditional (see below). Both bundles and passes have deps/cleanups and we can simplify the graph after linearization.

Post-cleanups would also be simplified (de-duped) if one pass lists it as its cleanups and the following pass lists it as its dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding having both predecessors and successors,

Following (hypothetical) pipeline:

numpy-to-linalg  torch-to-linalg
            \      /
          bufferization
            /      \
  linalg-to-cpu  linalg-to-gpu

We don't want to bufferization to know about specific ***-to-linalg pipelines, as it is a frontend details, irrelevant to bufferization, and we don't want it to know about linalg-to-*** either as it backend details.
So pipeline should looks like

numpy-to-linalg: [], [bufferization]
torch-to-linalg: [], [bufferization]
bufferization: [], []
linalg-to-cpu: [bufferization], []
linalg-to-gpu: [bufferization], []

@Hardcode84
Copy link
Contributor Author

Hardcode84 commented May 7, 2024

Subgarphs are useful by itself, but regarding encapsulating control flow into subgraphs, let's say we have following pipeline:

frontend
    |
    V
python-to-standard
    |
    V
lower-to-llvm
frontend: [], [], []
python-to-standard: [frontend], [lower-to-llvm], []
lower-to-llvm: [], [], []

Now, (external) user wants to add numpy-to-linalg stage and both python-to-standard and numpy-to-linalg stages must run until fixed point.

frontend
    |
    V
python-to-standard
   | ^
   V |
numpy-to-linalg
    |
    V
lower-to-llvm

With jumps they can just do

numpy-to-linalg: [python-to-standard], [lower-to-llvm], /*jump*/[python-to-standard]
... bufferization and such...

And the rest of the pipeline will stay unchanged.

With subgraphs they will have to extract existing python-to-standard stage, wrap both in subgraph and reinsert it into pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants