Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid overhead for synthesized nodes lookup #13424

Merged
merged 1 commit into from
Nov 12, 2024

Conversation

mtreinish
Copy link
Member

Summary

After #12550 a hash implementation was added to the implementation of DAGOpNode to be able to have identical instances of dag nodes used be usable in a set or dict. This is because after #12550 changed the DAGCircuit so the DAGOpNode instances were just a python view of the data contained in the nodes of a dag. While prior to #12550 the actual DAGOpNode objects were returned by reference from DAG methods. However, this hash implementation has additional overhead compared to the object identity based version used before. This has caused a regression in some cases for high level synthesis when it's checking for nodes it's already synthesized. This commit addresses this by changing the dict key to be the node id instead of the node object. The integer hashing is significantly faster than the object hashing.

Details and comments

After Qiskit#12550 a hash implementation was added to the implementation
of DAGOpNode to be able to have identical instances of dag nodes used be
usable in a set or dict. This is because after Qiskit#12550 changed the
DAGCircuit so the DAGOpNode instances were just a python view of the
data contained in the nodes of a dag. While prior to Qiskit#12550 the actual
DAGOpNode objects were returned by reference from DAG methods. However,
this hash implementation has additional overhead compared to the object
identity based version used before. This has caused a regression in some
cases for high level synthesis when it's checking for nodes it's already
synthesized. This commit addresses this by changing the dict key to be
the node id instead of the node object. The integer hashing is
significantly faster than the object hashing.
@mtreinish mtreinish added performance Changelog: None Do not include in changelog labels Nov 12, 2024
@mtreinish mtreinish added this to the 1.3.0 milestone Nov 12, 2024
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@raynelfss
Copy link
Contributor

Thank you for this addition! This makes more sense. Is this improvement noticeable when benchmarking runtime?

@mtreinish
Copy link
Member Author

I did a quick asv run and it didn't flag anything as an improvement or a regression. But it is noticeable in some benchpress benchmarks it has a noticeable improvement for example running qiskit_gym/abstract_transpile/test_hamiltonians.py::TestWorkoutAbstractHamiltonians::test_hamiltonians[ham_ham_JW-18-all-to-all] it went goes from 5.5106774509768 sec with 1.3.0rc1 to 3.2585 sec with this PR.

@mtreinish mtreinish added the stable backport potential The bug might be minimal and/or import enough to be port to stable label Nov 12, 2024
@coveralls
Copy link

Pull Request Test Coverage Report for Build 11800658258

Details

  • 3 of 3 (100.0%) changed or added relevant lines in 1 file are covered.
  • 15 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.01%) to 88.922%

Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 94.02%
crates/qasm2/src/parse.rs 6 97.62%
crates/qasm2/src/lex.rs 8 91.48%
Totals Coverage Status
Change from base Build 11784569909: -0.01%
Covered Lines: 79053
Relevant Lines: 88902

💛 - Coveralls

Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -382,7 +382,7 @@ def _run(

# If the synthesis changed the operation (i.e. it is not None), store the result.
if synthesized is not None:
synthesized_nodes[node] = (synthesized, synthesized_context)
synthesized_nodes[node._node_id] = (synthesized, synthesized_context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine in this case since synthesized_nodes is only used with one DAG (and not e.g. shared by the recursive control flow block handling).

But this optimization is something we should be careful when applying in other places, since the semantics of uniqueness in keys shifts from being global to local within a specific DAG (which means that keys will clobber each other if node indices from more than a single DAG are used in the same map). This was responsible for a bug we had in the visualization code after the initial DAG port to Rust.

@Cryoris Cryoris added this pull request to the merge queue Nov 12, 2024
Merged via the queue into Qiskit:main with commit 8c6ad02 Nov 12, 2024
19 checks passed
mergify bot pushed a commit that referenced this pull request Nov 12, 2024
After #12550 a hash implementation was added to the implementation
of DAGOpNode to be able to have identical instances of dag nodes used be
usable in a set or dict. This is because after #12550 changed the
DAGCircuit so the DAGOpNode instances were just a python view of the
data contained in the nodes of a dag. While prior to #12550 the actual
DAGOpNode objects were returned by reference from DAG methods. However,
this hash implementation has additional overhead compared to the object
identity based version used before. This has caused a regression in some
cases for high level synthesis when it's checking for nodes it's already
synthesized. This commit addresses this by changing the dict key to be
the node id instead of the node object. The integer hashing is
significantly faster than the object hashing.

(cherry picked from commit 8c6ad02)
github-merge-queue bot pushed a commit that referenced this pull request Nov 12, 2024
After #12550 a hash implementation was added to the implementation
of DAGOpNode to be able to have identical instances of dag nodes used be
usable in a set or dict. This is because after #12550 changed the
DAGCircuit so the DAGOpNode instances were just a python view of the
data contained in the nodes of a dag. While prior to #12550 the actual
DAGOpNode objects were returned by reference from DAG methods. However,
this hash implementation has additional overhead compared to the object
identity based version used before. This has caused a regression in some
cases for high level synthesis when it's checking for nodes it's already
synthesized. This commit addresses this by changing the dict key to be
the node id instead of the node object. The integer hashing is
significantly faster than the object hashing.

(cherry picked from commit 8c6ad02)

Co-authored-by: Matthew Treinish <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance stable backport potential The bug might be minimal and/or import enough to be port to stable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants