Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some basic example with Kernel #21

Open
kolkov opened this issue Oct 17, 2017 · 9 comments
Open

Some basic example with Kernel #21

kolkov opened this issue Oct 17, 2017 · 9 comments

Comments

@kolkov
Copy link

kolkov commented Oct 17, 2017

Hi!
Please provide some basic example how to work with this library properly.
How to create basic Kernel and use it into computation with GPU.

Thanks!

@chewxy
Copy link
Member

chewxy commented Oct 18, 2017

Yeah I'll get round to doing that.

@kolkov
Copy link
Author

kolkov commented Oct 18, 2017

It is better to give two examples. One is quite simple, to understand the basics, and one more complex, to understand where to move on.
Like in Nvidia docs.
Thanks!

@cobnst
Copy link

cobnst commented Dec 5, 2018

@chewxy @kolkov well how to found the example?0.0

@Icaro-Lima
Copy link

I'm here 2 years later ... Any examples?

@malkhamis
Copy link

maybe this test file can serve as an example 🤔 ?

@Icaro-Lima
Copy link

Icaro-Lima commented Aug 22, 2019

Thank you! This is exactly what I need!

@neurlang
Copy link
Contributor

I got a working kernel here

tl;dr: The order of operations which seems to be working is:
initCUDA()

device, err := cu.GetDevice(0)
ctx, err := device.MakeContext(cu.SchedAuto)
err = ctx.Lock()
// then do malloc etc for permanent memory
mod, err := cu.LoadData(kernel.PTXreduceCUDA)
fn, err := mod.Function("reduce")
stream, err := cu.MakeStream(cu.DefaultStream)

reduceCUDA()

err := cu.SetCurrentContext(ctx)
// single use memory malloc/ permanent memory clear (memset 0)
err = cu.MemcpyHtoD(d_input_nums, unsafe.Pointer(&input_numbers[0]), inputNumsSize) // copy some data to kernel
err = d_fn.LaunchAndSync(x[1][0], x[1][1], x[1][2], x[0][0], x[0][1], x[0][2], 0, d_stream, args)
err = cu.MemcpyDtoH(unsafe.Pointer(&result[0]), d_result, resultSize) // copy solution data back from device to host

destroyCUDA()

cu.MemFree(memory) // free all memories
ctx.Unlock()
ctx.Destroy()

the kernel itself is in the kernel/ subdir, compiled to .ptx as this:

#!/bin/bash
nvcc -ptx reduceCUDA.cu -o reduceCUDA.ptx

the kernel does have a sort of main function:

extern "C" __global__ void reduce(uint8_t *d_set, uint32_t *d_nums, uint32_t *alphabet, uint32_t* out) {
// the code
}

Hope this serves as a solid example to the next person.

@chewxy
Copy link
Member

chewxy commented May 23, 2024

hey @neurlang do you want to put this in a PR as an example?

@neurlang
Copy link
Contributor

pushing hashtron kernel to the main cu repo doesn't feel right. it would shift the maintenance burden to you, i still need to tweak the kernel sometimes, etc. I think I will keep it in my repo instead. Anyone can look there anyway.

Speaking of this repo (gorgonia/cu), I would prefer to put here more testcases which stress the code paths related to my kernel, to make sure that hashtron behavior is not broken here accidentally by you or by someone, now that it works pretty stable.

On the other hand, now that we (or simply me) know how to make a CU kernel, it would be a good idea to write it to the gorgonia.github.io site as a tutorial to make a CU kernel.

I'll open a separate issue on there instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants