-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel logger upgrades for better output and better code comparisons #276
base: develop
Are you sure you want to change the base?
Conversation
Yeah, yeah, I'll fix clang later. Right now I want feedback. |
The CI check formatting-check with clang-tidy is fine to not pass for now for the sake of a PR review. However, the CI check for simple-build is showing Also, JM2C, but it may be good to separate this out into two PRs. I think the first change does not depend on whether the Kokkos Tools user sets |
@vlkale This is a debugging tool that fences all over the place. Host-side conditionals are basically irrelevant in terms of the performance hit the fences create. As for the compile error, that's because the headers are no longer consistent with the headers in Kokkos. I'll add a fix to that in a second. |
Signed-off-by: Chris Siefert <[email protected]>
Compiles just fine now... |
Signed-off-by: Chris Siefert <[email protected]>
if (eid.type == DeviceType::Serial) | ||
device_label += "Serial"; | ||
else if (eid.type == DeviceType::OpenMP) | ||
device_label += "OpenMP"; | ||
else if (eid.type == DeviceType::Cuda) | ||
device_label += "Cuda"; | ||
else if (eid.type == DeviceType::HIP) | ||
device_label += "HIP"; | ||
else if (eid.type == DeviceType::OpenMPTarget) | ||
device_label += "OpenMPTarget"; | ||
else if (eid.type == DeviceType::HPX) | ||
device_label += "HPX"; | ||
else if (eid.type == DeviceType::Threads) | ||
device_label += "Threads"; | ||
else if (eid.type == DeviceType::SYCL) | ||
device_label += "SYCL"; | ||
else if (eid.type == DeviceType::OpenACC) | ||
device_label += "OpenACC"; | ||
else if (eid.type == DeviceType::Unknown) | ||
device_label += "Unknown"; | ||
else | ||
device_label += "Unknown to KokkosTools"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be opposed to pushing this part (or even the whole function) to Kokkos_Profiling_Interface.hpp
. Getting string out of the device id doesn't seem to be specific to the kernel logger tool (also see https://github.com/kokkos/kokkos-tools/pull/265/files#diff-839f34fcb31addd9a48252bebf4d37cf674f6fcdd16539c7039813192674b9c0R165-R187).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concur. I also have to maintain this function in Trilinos, so I would love to put it in Kokkos_Profiling_Interface.hpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @crtrott doesn't object, I'll open a Kokkos PR with that.
Co-authored-by: Daniel Arndt <[email protected]>
True about fencing (aside: maybe in the future this tool could reduce fencing more, e.g., just allowing for global fencing to be turned off, if some Kokkos users want that). Stepping back though and forgetting about efficiency, I think I may be actually misunderstanding the purpose. I thought you are handling a case where the output of kernel-logger is not different across different Kokkos backends for a particular Kokkos application like LAMMPS, but also is non-deterministic for a given Kokkos backend. Did I get that right from "comparing two versions of the code with roughly the same flow, but some different kernel calls here and there. Specifically, code paths with and without UnifiedMemory." ?
Thanks and I realized just now that this is the issue that Kokkos Tools Github Issue #275 brings up. |
I'm interested using the tool to track down cases in the case code flow is deterministic, but the answer is not due to an unintended race condition. |
using namespace Kokkos::Tools::Experimental; | ||
std::string device_label("("); | ||
ExecutionSpaceIdentifier eid = identifier_from_devid(deviceId); | ||
if (eid.type == DeviceType::Serial) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a switch case.
if (varVal) { | ||
std::string v = std::string(varVal); | ||
// default to false | ||
if (v == "1" || v == "ON" || v == "on" || v == "TRUE" || v == "true" || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time, it seems that this repo wants either 0 or 1 for the value of an environment variable.
Then use atoi
.
I think it is important that all environment variables follow consistent conventions.
|
||
bool suppressCounts() { | ||
static bool value = [](){ | ||
const char* varVal = std::getenv("KOKKOS_TOOLS_LOGGER_SUPPRESS_COUNTS"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update your PR description in which you're currently mentioning KOKKOS_PROFILE_SUPPRESS_COUNTS
.
"execution identifier %llu\n", | ||
devID, (unsigned long long)(*kID)); | ||
deviceIdToString(devID).c_str(), (unsigned long long)(output)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... What if we have several devices ? For instance, several Cuda
GPUs.
Seems this change is really not going in the direction of allowing devices to be distinguished.
I would be OKish with:
deviceIdToString(devID).c_str(), (unsigned long long)(output)); | |
printf( | |
"KokkosP: Executing parallel-scan kernel on device %d (%s) with unique " | |
"execution identifier %llu\n", | |
devID, deviceIdToString(devID).c_str(), (unsigned long long)(output)); |
"execution identifier %llu\n", | ||
devID, (unsigned long long)(*kID)); | ||
deviceIdToString(devID).c_str(), (unsigned long long)(output)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deviceIdToString(devID).c_str(), (unsigned long long)(output)); | |
deviceIdToString(devID).c_str(), suppressCounts() ? 0 : *kID); |
This PR does two things:
Upgrades the output on the Kernel logger to look more like the Tpetra DeepCopyTools --- rather than output space ID's we demanagle that and output names.
Allow you to use the KOKKOS_PROFILE_SUPPRESS_COUNTS variable to suppress the kernel counts in the output --- this is really useful if you're comparing two versions of the code with roughly the same flow, but some different kernel calls here and there. Specifically, code paths with and without UnifiedMemory. The call counting makes diff really hard and so turning it off can make diff easier.