-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rust gates for ConsolidateBlocks #12704
Use rust gates for ConsolidateBlocks #12704
Conversation
One or more of the following people are relevant to this code:
|
0a1cddb
to
e449357
Compare
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 9768512668Details
💛 - Coveralls |
This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction.
e449357
to
6567728
Compare
Pull Request Test Coverage Report for Build 9781648307Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9782984075Details
💛 - Coveralls |
let mut matrix: Array2<Complex64> = match bit_map | ||
.map_bits(first_node.instruction.qubits.bind(py).iter())? | ||
.map(|x| x as u8) | ||
.collect::<SmallVec<[u8; 2]>>() | ||
.as_slice() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not this?
.map_bits(first_node.instruction.qubits.bind(py).iter())?
.collect::<Vec<_>>()
.as_slice()
Storage should not an issue here, and u8
at best won't be more performant. Maybe the 2
here in SmallVec<[u8; 2]
allows the compiler to optimize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it was for typing reasons, but I realized I was confusing this with the 2q decomposer and the euler 1q decomposer which are using SmallVec<u8>
as the output type. I agree in this case just collecting to a vec of u32 is fine. I can change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I guessed. It was not a straight copy and paste, but something like that, from a situation where it is important that the type be u8
.
for (op_matrix, q_list) in op_list.into_iter().skip(1) { | ||
let op_matrix = op_matrix.as_array(); | ||
for node in op_list.into_iter().skip(1) { | ||
let op_matrix = get_matrix_from_inst(py, &node.instruction)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 91 would convey more information as [..] => unreachable!()
@@ -71,8 +126,42 @@ pub fn change_basis(matrix: ArrayView2<Complex64>) -> Array2<Complex64> { | |||
trans_matrix | |||
} | |||
|
|||
#[pyfunction] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 50 could be [..] => None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two suggestions, doing [..] => None
, change lines not yet touched by this PR. Whether to do it here depends on whether in general you favor cleaning these things up in a PR or doing them in a separate PR. An argument against is that this is not strictly part of the point of the PR. An argument for is that a PR to do these things would not be a high priority, so it might never be done. So I favor making the change. OTOH, the arguments for leaving these as is are not unreasonable.
This PR does not touch |
I would want to check with @kevinhartman and @jakelishman before making this change since there is a fair amount in flux around using |
imo if we were to touch any meaningful parts of I haven't currently touched |
I was on the fence about it. But I think a separate PR is good. Especially because, yeah, there is a possible change in performance. In fact, I would add a method on |
I have a half-done PR that's reworking some of the |
Pull Request Test Coverage Report for Build 9844736287Details
💛 - Coveralls |
* Use rust gates for ConsolidateBlocks This commit moves to use rust gates for the ConsolidateBlocks transpiler pass. Instead of generating the unitary matrices for the gates in a 2q block Python side and passing that list to a rust function this commit switches to passing a list of DAGOpNodes to the rust and then generating the matrices inside the rust function directly. This is similar to what was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster to get the matrix for standard gates, it also reduces the eager construction of Python gate objects which was a significant source of overhead after Qiskit#12459. To that end this builds on the thread of work in the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for other passes to minimize eager gate object construction. * Add rust filter function for DAGCircuit.collect_2q_runs() * Update crates/accelerate/src/convert_2q_block_matrix.rs --------- Co-authored-by: John Lapeyre <[email protected]>
Summary
This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in #12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after #12459. To that end this builds on the thread of work in
the two PRs #12692 and #12701 which changed the access patterns for
other passes to minimize eager gate object construction.
Details and comments
TODO:
This PR is built on top of #12692 and #12701 and will need to rebased after both merge. To view the contents of just this PR you can look at the last commit on the branch (I'll be force pushing updates to that last commit): e449357