Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use non-atomic scatterview in hist #980

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

pgrete
Copy link
Collaborator

@pgrete pgrete commented Nov 29, 2023

PR Summary

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • Change is breaking (API, behavior, ...)
    • Change is additionally added to CHANGELOG.md in the breaking section
    • PR is marked as breaking
    • Short summary API changes at the top of the PR (plus optionally with an automated update/fix script)
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

@BenWibking
Copy link
Collaborator

Are you compiling with -munsafe-fp-atomics? It should be safe to use this flag, since (I think) we don't depend on coherent memory while GPU kernels are running (https://www.exascaleproject.org/wp-content/uploads/2022/02/crusher_tips_and_tricks-final.pdf)

@pgrete
Copy link
Collaborator Author

pgrete commented Nov 30, 2023

Oh, good point(er). I'll test that.

@BenWibking
Copy link
Collaborator

Did that help?

@pgrete
Copy link
Collaborator Author

pgrete commented Dec 1, 2023

Indeed it did, so I'm now wondering if the performance issue we saw with Ascent on Frontier are also related to this.

@BenWibking
Copy link
Collaborator

Indeed it did, so I'm now wondering if the performance issue we saw with Ascent on Frontier are also related to this.

Ascent histograms are computed on the host, using unified memory support. I can't think of an obvious place it would otherwise use device atomics. Or are you thinking of a different performance issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants