Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-place shift of uint vectors corrupts s1 and further components #790

Open
proski opened this issue Dec 30, 2024 · 6 comments
Open

In-place shift of uint vectors corrupts s1 and further components #790

proski opened this issue Dec 30, 2024 · 6 comments
Labels
IGC Issue related to IGC

Comments

@proski
Copy link

proski commented Dec 30, 2024

The following demo should output "4 4" (2 shifted 1 bit to the left in both x and y). Instead, it outputs "4 0".

#include <CL/opencl.h>
#include <stdio.h>

#define DATA_SIZE 1024

#define CODE(...) #__VA_ARGS__

const char *KernelSource = CODE(

    __kernel void shift_test(__global uint *input, __global uint *output) {
      uint2 d0, d1, d2;
      d2 = (uint2)(input[0], input[1]);
      d1 = (uint2)(input[2], input[3]);
      d0 = (uint2)(input[4], input[5]);
      d2 = (d2 << 1) | (d1 >> 31);
      d1 = (d1 << 1) | (d0 >> 31);
      d0 = d0 << 1;

      int i = get_global_id(0);
      if (i == 0) {
        output[i] = d1.x;
      } else {
        output[i] = d1.y;
      }
    }

);

int main(int argc, char **argv) {
  int err;

  unsigned int data[6] = {1, 1, 2, 2, 3, 3};
  unsigned int results[DATA_SIZE];

  size_t global, local;

  cl_platform_id platform;
  cl_device_id device_id;
  cl_context context;
  cl_command_queue commands;
  cl_program program;
  cl_kernel kernel;

  cl_mem input, output;

  unsigned int count = DATA_SIZE;

  err = clGetPlatformIDs(1, &platform, NULL);
  if (err < 0) {
    perror("Couldn't identify a platform");
    exit(1);
  }

  int gpu = 1;
  err = clGetDeviceIDs(platform, gpu ? CL_DEVICE_TYPE_GPU : CL_DEVICE_TYPE_CPU,
                       1, &device_id, NULL);
  if (err != CL_SUCCESS) {
    printf("Error: Failed to create a device group!\n");
    return EXIT_FAILURE;
  }

  context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
  if (!context) {
    printf("Error: Failed to create a compute context!\n");
    return EXIT_FAILURE;
  }

  commands = clCreateCommandQueue(context, device_id, 0, &err);
  if (!commands) {
    printf("Error: Failed to create a command commands!\n");
    return EXIT_FAILURE;
  }

  program = clCreateProgramWithSource(context, 1, (const char **)&KernelSource,
                                      NULL, &err);
  if (!program) {
    printf("Error: Failed to create compute program!\n");
    return EXIT_FAILURE;
  }

  err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
  if (err != CL_SUCCESS) {
    size_t len;
    char *buffer;

    printf("Error: Failed to build program executable!\n");
    clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, 0, NULL,
                          &len);
    buffer = malloc(len);
    clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, len, buffer,
                          NULL);
    printf("%s\n", buffer);
    free(buffer);
    exit(1);
  }

  kernel = clCreateKernel(program, "shift_test", &err);
  if (!kernel || err != CL_SUCCESS) {
    printf("Error: Failed to create compute kernel!\n");
    exit(1);
  }

  input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(int) * count, NULL,
                         NULL);
  output = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(int) * count, NULL,
                          NULL);
  if (!output) {
    printf("Error: Failed to allocate device memory!\n");
    exit(1);
  }

  err = clEnqueueWriteBuffer(commands, input, CL_TRUE, 0, sizeof(int) * 6, data,
                             0, NULL, NULL);
  if (err != CL_SUCCESS) {
    printf("Error: Failed to write to source array!\n");
    exit(1);
  }

  err = 0;
  err |= clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);
  err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &output);
  if (err != CL_SUCCESS) {
    printf("Error: Failed to set kernel arguments! %d\n", err);
    exit(1);
  }

  err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE,
                                 sizeof(local), &local, NULL);
  if (err != CL_SUCCESS) {
    printf("Error: Failed to retrieve kernel work group info! %d\n", err);
    exit(1);
  }

  global = count;
  err = clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0,
                               NULL, NULL);
  if (err) {
    printf("Error: Failed to execute kernel!\n");
    return EXIT_FAILURE;
  }

  clFinish(commands);

  err = clEnqueueReadBuffer(commands, output, CL_TRUE, 0, sizeof(int) * count,
                            results, 0, NULL, NULL);
  if (err != CL_SUCCESS) {
    printf("Error: Failed to read output array! %d\n", err);
    exit(1);
  }

  printf("results = %d %d\n", results[0], results[1]);

  clReleaseMemObject(input);
  clReleaseMemObject(output);
  clReleaseProgram(program);
  clReleaseKernel(kernel);
  clReleaseCommandQueue(commands);
  clReleaseContext(context);

  return 0;
}

This demonstrates an issue in mfakto, an OpenCL program for Mersenne number factoring.
The mfakto issue: primesearch/mfakto#15

I know that mfakto is working with intel-opencl version 22.14.22890 but exhibits the problem with 23.43.27642. I still see the problem in 24.35.30872, the version shipped by Fedora 41.

@proski proski changed the title In-place shift of uint vectors corrupt s1 and further components In-place shift of uint vectors corrupts s1 and further components Dec 30, 2024
@JablonskiMateusz
Copy link
Contributor

Hi @proski

Could you please check if it compiler related?

If it is working fine on 22.14.22890 please install 22.14.22890, create program with source, dump program binary(https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#clBuildProgram).

The executable binary can be queried using clGetProgramInfo(program, CL_PROGRAM_BINARIES, …​) and can be specified to clCreateProgramWithBinary to create a new program object.

Then please install 23.43.27642 and run the demo but create program with binary instead of from source.

@proski
Copy link
Author

proski commented Jan 4, 2025

I was able to narrow down the breakage by using Ubuntu 22.04 in WSL, where I can install the precompiled packages from github. It turns out the last release (24.48.31907.7) fixed the issue. The original breakage occurred between 23.17.26241.22 and 23.22.26516.18. More specifically:

  • 23.17.26241.22 good
  • 23.22.26516.18 bad
  • 24.45.31740.9 bad
  • 24.48.31907.7 good

I'm attaching a zip file with the binaries for the above releases.
demo-bin.zip

@proski
Copy link
Author

proski commented Jan 4, 2025

By the way, mfakto saves the binary kernel and uses it as long as the configuration remains the same. If I compile and run mfakto and then install another version of Intel Compute Runtime, the behavior of mfakto doesn't change. If I remove the binary, mfakto rebuilds it and then its behavior matches the newly installed Intel Compute Runtime. In other words, the version of Intel Compute Runtime matters (w.r.t. this issue) when compiling the source to the binary, not when running the binary.

proski added a commit to proski/mfakto that referenced this issue Jan 4, 2025
The fallback implementation of amd_bitalign() triggers a bug with Intel Compute
Runtime (NEO) versions from 23.22.26516.18 to 24.45.31740.9 inclusive.

intel/compute-runtime#790

The bug exhibits itself as a failure to find factors in approximately half of
the self-tests using barrett32 kernels. The bug affects all but the first
component of the vectors, so using VectorSize=1 would fix the self-tests.

Add generic_bitalign() that is always implemented using shifts. Use 64-bit
shifts for Intel Compute Runtime, 32-bit shifts for other platforms.

Use generic_bitalign() instead of the equivalent shifts in all cases when the
destination is the same as one of the sources.

Make amd_bitalign() an alias to generic_bitalign() on systems where
amd_bitalign() is not available.
@JablonskiMateusz
Copy link
Contributor

By the way, mfakto saves the binary kernel and uses it as long as the configuration remains the same. If I compile and run mfakto and then install another version of Intel Compute Runtime, the behavior of mfakto doesn't change. If I remove the binary, mfakto rebuilds it and then its behavior matches the newly installed Intel Compute Runtime. In other words, the version of Intel Compute Runtime matters (w.r.t. this issue) when compiling the source to the binary, not when running the binary.

That means that it is related to compiler, not to compute runtime itself.

@JablonskiMateusz JablonskiMateusz added the IGC Issue related to IGC label Jan 4, 2025
@JablonskiMateusz
Copy link
Contributor

Please duplicate the issue to https://github.com/intel/intel-graphics-compiler repo

@proski
Copy link
Author

proski commented Jan 9, 2025

Please duplicate the issue to https://github.com/intel/intel-graphics-compiler repo

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IGC Issue related to IGC
Projects
None yet
Development

No branches or pull requests

2 participants