Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing clang-analyzer-core.UndefinedBinaryOperatorResult with templates yields surprising performance improvement #308

Open
bcaddy opened this issue Jul 3, 2023 · 1 comment
Labels
question refactor for code that is functional but needs to be refactored

Comments

@bcaddy
Copy link
Collaborator

bcaddy commented Jul 3, 2023

I was working on clang-tidy checks during the hack session today and ran into something interesting I think we should discuss. The clang-analyzer-core.UndefinedBinaryOperatorResult check checks for cases where a variable might be undefined when it's used and so structures like this tend to trigger it:

  int o1, o2, o3;
  switch (dir) {
    case 0:
      o1 = grid_enum::momentum_x;
      o2 = grid_enum::momentum_y;
      o3 = grid_enum::momentum_z;
      break;
    case 1:
      o1 = grid_enum::momentum_y;
      o2 = grid_enum::momentum_z;
      o3 = grid_enum::momentum_x;
      break;
    case 2:
      o1 = grid_enum::momentum_z;
      o2 = grid_enum::momentum_x;
      o3 = grid_enum::momentum_y;
      break;
  }

// the check flags the usage of o1 on this line
velocity_x = dev_conserved[o1 * n_cells + id];

It is technically correct to flag this since o1 could be uninitialized if dir is not 0-2 even though we know it always will be in that range. An easy way to fix this is if all the kernels with a structure like this take dir as a template argument instead of a regular argument, thus allowing the switch statement to be determined at compile time.

While I don't think it's worth doing that just for this check I tried it out on a single kernel (the new PPMC_VL kernel) since templates can improve the performance of control statements by evaluating them at compile time. While I didn't expect it to help it actually did help. It reduced the execution time by ~4ms per time step, a ~2% improvement. If these results hold when applied to other kernels that could be an easy 10% performance improvement for very little effort. What do you all think?

@bcaddy bcaddy added question refactor for code that is functional but needs to be refactored labels Jul 3, 2023
@bcaddy
Copy link
Collaborator Author

bcaddy commented Jul 3, 2023

@evaneschneider, @alwinm, I'm especially interested in your takes on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question refactor for code that is functional but needs to be refactored
Projects
None yet
Development

No branches or pull requests

1 participant