Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rust gates for ConsolidateBlocks #12704

Merged

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Jul 1, 2024

Summary

This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in #12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after #12459. To that end this builds on the thread of work in
the two PRs #12692 and #12701 which changed the access patterns for
other passes to minimize eager gate object construction.

Details and comments

TODO:

This PR is built on top of #12692 and #12701 and will need to rebased after both merge. To view the contents of just this PR you can look at the last commit on the branch (I'll be force pushing updates to that last commit): e449357

@mtreinish mtreinish added performance Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Jul 1, 2024
@mtreinish mtreinish added this to the 1.2.0 milestone Jul 1, 2024
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @kevinhartman
  • @mtreinish

@mtreinish mtreinish marked this pull request as draft July 1, 2024 21:56
@mtreinish mtreinish added the on hold Can not fix yet label Jul 1, 2024
@mtreinish mtreinish force-pushed the py-access-rs-data-consolidate-blocks branch from 0a1cddb to e449357 Compare July 2, 2024 21:59
@mtreinish mtreinish changed the title [WIP] Use rust gates for ConsolidateBlocks Use rust gates for ConsolidateBlocks Jul 2, 2024
@mtreinish mtreinish marked this pull request as ready for review July 2, 2024 22:04
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @kevinhartman
  • @mtreinish

@coveralls
Copy link

coveralls commented Jul 2, 2024

Pull Request Test Coverage Report for Build 9768512668

Details

  • 119 of 202 (58.91%) changed or added relevant lines in 10 files are covered.
  • 5 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.05%) to 89.737%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/optimization/consolidate_blocks.py 9 10 90.0%
crates/circuit/src/bit_data.rs 0 3 0.0%
crates/accelerate/src/convert_2q_block_matrix.rs 45 49 91.84%
crates/circuit/src/operations.rs 15 30 50.0%
crates/circuit/src/circuit_instruction.rs 5 35 14.29%
crates/circuit/src/dag_node.rs 22 52 42.31%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 5 93.38%
Totals Coverage Status
Change from base Build 9767176201: -0.05%
Covered Lines: 64632
Relevant Lines: 72024

💛 - Coveralls

This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.
@mtreinish mtreinish force-pushed the py-access-rs-data-consolidate-blocks branch from e449357 to 6567728 Compare July 3, 2024 16:35
@coveralls
Copy link

coveralls commented Jul 3, 2024

Pull Request Test Coverage Report for Build 9781648307

Details

  • 57 of 65 (87.69%) changed or added relevant lines in 4 files are covered.
  • 11 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.01%) to 89.829%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/optimization/consolidate_blocks.py 9 10 90.0%
crates/circuit/src/bit_data.rs 0 3 0.0%
crates/accelerate/src/convert_2q_block_matrix.rs 45 49 91.84%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 94.02%
crates/qasm2/src/lex.rs 4 92.11%
crates/qasm2/src/parse.rs 6 97.61%
Totals Coverage Status
Change from base Build 9780558630: -0.01%
Covered Lines: 65152
Relevant Lines: 72529

💛 - Coveralls

@mtreinish mtreinish removed on hold Can not fix yet labels Jul 3, 2024
@coveralls
Copy link

coveralls commented Jul 3, 2024

Pull Request Test Coverage Report for Build 9782984075

Details

  • 87 of 95 (91.58%) changed or added relevant lines in 4 files are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 89.841%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/optimization/consolidate_blocks.py 9 10 90.0%
crates/circuit/src/bit_data.rs 0 3 0.0%
crates/accelerate/src/convert_2q_block_matrix.rs 73 77 94.81%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 4 92.62%
Totals Coverage Status
Change from base Build 9782487328: 0.02%
Covered Lines: 65174
Relevant Lines: 72544

💛 - Coveralls

@jlapeyre jlapeyre self-requested a review July 3, 2024 20:33
Comment on lines 80 to 84
let mut matrix: Array2<Complex64> = match bit_map
.map_bits(first_node.instruction.qubits.bind(py).iter())?
.map(|x| x as u8)
.collect::<SmallVec<[u8; 2]>>()
.as_slice()
Copy link
Contributor

@jlapeyre jlapeyre Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not this?

        .map_bits(first_node.instruction.qubits.bind(py).iter())?
        .collect::<Vec<_>>()
        .as_slice()

Storage should not an issue here, and u8 at best won't be more performant. Maybe the 2 here in SmallVec<[u8; 2] allows the compiler to optimize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it was for typing reasons, but I realized I was confusing this with the 2q decomposer and the euler 1q decomposer which are using SmallVec<u8> as the output type. I agree in this case just collecting to a vec of u32 is fine. I can change it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I guessed. It was not a straight copy and paste, but something like that, from a situation where it is important that the type be u8.

for (op_matrix, q_list) in op_list.into_iter().skip(1) {
let op_matrix = op_matrix.as_array();
for node in op_list.into_iter().skip(1) {
let op_matrix = get_matrix_from_inst(py, &node.instruction)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 91 would convey more information as [..] => unreachable!()

@@ -71,8 +126,42 @@ pub fn change_basis(matrix: ArrayView2<Complex64>) -> Array2<Complex64> {
trans_matrix
}

#[pyfunction]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 50 could be [..] => None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two suggestions, doing [..] => None, change lines not yet touched by this PR. Whether to do it here depends on whether in general you favor cleaning these things up in a PR or doing them in a separate PR. An argument against is that this is not strictly part of the point of the PR. An argument for is that a PR to do these things would not be a high priority, so it might never be done. So I favor making the change. OTOH, the arguments for leaving these as is are not unreasonable.

@jlapeyre
Copy link
Contributor

jlapeyre commented Jul 5, 2024

This PR does not touch map_bits itself. But you might want to consider
this change. (jlapeyre@71e92a6) This simplifies how map_bits is implemented and how it is called.

@mtreinish
Copy link
Member Author

This PR does not touch map_bits itself. But you might want to consider this change. (jlapeyre@71e92a6) This simplifies how map_bits is implemented and how it is called.

I would want to check with @kevinhartman and @jakelishman before making this change since there is a fair amount in flux around using BitData between #12550 and #12730. I was trying to avoid chaning the rust interface for dealing with bit data too much in this PR.

@jakelishman
Copy link
Member

imo if we were to touch any meaningful parts of BitData (beyond just making it public), then we'd do it in a separate PR - this PR doesn't logically need to change the interfaces, and doing it separately would let us track performance changes.

I haven't currently touched BitData, though I imagine it probably will change a bit in coming PRs, so best to just leave it un-churned right now.

@jlapeyre
Copy link
Contributor

jlapeyre commented Jul 8, 2024

I was on the fence about it. But map_bits was called four times in circuit_data, and two calls were added here in convert_2q_block_matrix. There are no other calls.

I think a separate PR is good. Especially because, yeah, there is a possible change in performance.

In fact, I would add a method on PackedInstruction (if that's the right struct) because every call to map_bits starts with an instruction (or higher).

@jakelishman
Copy link
Member

I have a half-done PR that's reworking some of the Interner interfaces to avoid a couple of unnecessary allocations. I think I'm likely to touch BitData as part of that.

@coveralls
Copy link

coveralls commented Jul 8, 2024

Pull Request Test Coverage Report for Build 9844736287

Details

  • 86 of 94 (91.49%) changed or added relevant lines in 4 files are covered.
  • 5 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.03%) to 89.868%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/optimization/consolidate_blocks.py 9 10 90.0%
crates/circuit/src/bit_data.rs 0 3 0.0%
crates/accelerate/src/convert_2q_block_matrix.rs 72 76 94.74%
Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 2 88.35%
crates/qasm2/src/lex.rs 3 92.88%
Totals Coverage Status
Change from base Build 9844511268: 0.03%
Covered Lines: 65665
Relevant Lines: 73068

💛 - Coveralls

@jlapeyre jlapeyre added this pull request to the merge queue Jul 8, 2024
Merged via the queue into Qiskit:main with commit 4fe9dbc Jul 8, 2024
15 checks passed
@mtreinish mtreinish deleted the py-access-rs-data-consolidate-blocks branch July 11, 2024 17:22
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
* Use rust gates for ConsolidateBlocks

This commit moves to use rust gates for the ConsolidateBlocks transpiler
pass. Instead of generating the unitary matrices for the gates in a 2q
block Python side and passing that list to a rust function this commit
switches to passing a list of DAGOpNodes to the rust and then generating
the matrices inside the rust function directly. This is similar to what
was done in Qiskit#12650 for Optimize1qGatesDecomposition. Besides being faster
to get the matrix for standard gates, it also reduces the eager
construction of Python gate objects which was a significant source of
overhead after Qiskit#12459. To that end this builds on the thread of work in
the two PRs Qiskit#12692 and Qiskit#12701 which changed the access patterns for
other passes to minimize eager gate object construction.

* Add rust filter function for DAGCircuit.collect_2q_runs()

* Update crates/accelerate/src/convert_2q_block_matrix.rs

---------

Co-authored-by: John Lapeyre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog mod: transpiler Issues and PRs related to Transpiler performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants