-
Notifications
You must be signed in to change notification settings - Fork 4
OpenCL Backend #14
Comments
So from the small experimentation I have done, I was using https://github.com/cogciprocate/ocl for the actual interaction with the OpenCL runtime. You still need to write your own kernels. Additionally, if you are interested in using OpenCL specifically for AMD cards, the new Vega would support the ROCm stack, which I think has similar "optimized" kernels (e.g. convolutions etc...) for the AMD cards. Is there any technical paper about Spearow or how exactly it does things it does? From the examples, it looks like it runs all of the operations synchronously if I understand correctly? |
@botev the RX4xx series also supports the ROCm OpenCL backend, I have one here and did a few experiments with all 3 stacks on Linux/Fedora. @cathalgarvey I am totally with you, I actually talked a lot to @subversive-owl about it and how to go about. It describes all of the operations in sync, if things are done async is up to the backend implementation. If necessary, the API can easily adapted. So, yes this is plannd is what I am most excited about after fixing the last fallout bits of the dropout cudnn implementation and finishing LSTM/RNN layers. @botev right now there is the |
@cathalgarvey I'd be happy to streamline efforts and get more traction, every pull request is very welcome, and I'd happily talk about architecture stuff and how to share code or merge repositories in a combined effort |
Looking at @jonysy's work on these two repos: https://github.com/lychee-eng/parenchyma-blas ..it looks like the skeleton of a good OpenCL backend already exists for a Leaf-derived project. I don't know how compatible the code in those repos is with coaster/juice as-is but it's probably a good place to start. Regarding ROCm, I've experimented with it on my GPUs and my experiences haven't been great. It's sometimes unstable, and a recent apt-delivered update made my system so unstable I had to reinstall Ubuntu (and configure it to keep a stable kernel around for future use). I'm awaiting a fix now so I can resume using ROCm because the only way to do OpenCL 1.2+ on Ubuntu 16.04+ right now is ROCm. :( Meanwhile, I'm using Mesa, which gives an OpenCL 1.1 runtime that's stable, and which appears to work with arrayfire. So I'm going to resume my experiments with Arrayfire-rs for a while, I'd like to make some type-safe wrappings that could be useful for ML. It's a shame they haven't stabilised the Arrayfire-ML repo yet, nor provided Rust bindings to it. But, I'd prefer something that's pure-rust and targets many platforms at once, including framework-free CPU (e.g., I don't want to rely on Arrayfire being installed everywhere). Leaf/Juice/Parenchyma are the most mature-looking rusty platform to begin with, I think. |
@cathalgarvey I strongly recommend you to go with fedora and just install the opencl part of the AMDGPU-PRO packages, that works very well for me. This actually allows you to pick either of the ICDs. But this is a bit offtopic. I know about parenchyma, but half a year ago, when I talked to @jonysy it seemed we were following different goals. I am happy to re-evaluate that. The last time I checked arrayfire-rs it had a lot of open issues and seemed to be pretty slow compared to other frameworks (given the most important transformations used for ML, I am nost sure where that information came from though). OpenCL 1.1 is a total no go, in may places it is already stated that it is not threadsafe and as such pretty useless. Most vendors managed to get to at least 1.2 though I am eyeing on 2.x for the sake of features and ease of implementation. But that is open for discussion. I don't mind the other way round either. PRs welcome. |
I'd be happy to see OpenCL 1.2-2.x too; 1.1 would just have been the icing on an already great cake. :) If 1.2 is supported, then it's possible that Coriander could be used to maintain a hybrid CUDA/CL codebase, though Coriander isn't bug-free yet so a rigorous test suite would be required, and it would probably limit the flexibility of coding in CUDA. Arrayfire-RS is far, far from perfect; the lack of type safety and the poorly documented segfaulting exhausted me last time I tried to use it. Speed isn't much of a concern when the baseline is "No support for non-CUDA GPUs at all". Clearly, OpenCL or Vulkan etc. would be better than using an intermediary platform, but I'll take whatever I can get! I'm looking at Parenchyma now to see what it can do as is; it looks like it stalled some months back unfortunately. There have been compiler changes in Nightly that make some of it illegal now (const fields in Traits), so it doesn't compile. I'll need to learn a bit before fixing. |
That was one of the reasons I decided to discard arrayfire-rs. If the path of Vulkan continues as I expect it to, then OpenCL and Vulkan will merge. They are already similiar and with some hackery around Vulkan Compute shaders you can already do a lot. But for the time being: stick with OpenCL, I am not gambling and I am not keen on investing a huge chunk of time in a poorly performing backend. As such library choice is crucial. We can continue this chat in https://gitter.im/spearow/coaster i don't see much use in using coriander. There is no cuda code. All there is is cudnn API calls. Also: I'd rather invest effort in OpenCL kernels integrated into native rust than a language dependant abomination on top of C++. |
@cathalgarvey Id be happy to discuss a few more things on gitter/here regarding what has been done what is planned |
Hi! Busy few days, sorry. But, I have been committing a little time to this: One of the problems with bootstrapping OpenCL in languages other than C/++ is that any libraries that aim to make this easy, by providing kernels etc., often write their framework to dynamically generate kernels (using C Preprocessing, or a C engine). I gather that OpenCL can perform preprocessing to a certain extent, but the macros/defines would have to be in OpenCL source files. I found something promising in Samsung's sadly-defunct deep learning framework, VELES; the only one to make OpenCL a first class citizen, so far. They have what looks like a full set of "basic" kernels for data and NN operations, between the core and the neural networks extension. And, they look like "regular OpenCL" kernels. :) The license is Apache, is that good enough to use in Coaster if they were just copied in and worked around? |
I'd like to discuss a similar set of things, using https://github.com/djc/askama to generate structures from templates and fill them as required at runtime and save those artifacts in a cache so following runs don't required recompilation. This would allow to even merge a few operations into a single kernel, reducing GPU mem access and the introduced latency, but that is step 2. The reason I'd like to stick to dual licensing Apache/MIT is mostly to allow GPL linkage which I don't want to rule out but Apache itself does not. |
Cool crate! So, do you mean using templating to construct kernels on the fly using Rust Macros? That seems hard to optimise, though a fully-fledged Rust DSL that compiles into kernels would be the golden fleece for OpenCL as far as I'm concerned. :) |
@DiamondLovesYou pointed me to his great work of a compiler extension draft compiling rust functions to OpenCL which I am still looking into |
If anyone's interested, I've started working on Parenchyma again... I've updated Leaf to make it compatible but it's a bit outside my normal area of expertise in terms of implementing new algorithms. |
@jonysy I've seen the activity but I did not look into it yet, unfortunately I did not get around to get much done on juice / compute framework, this will hopefully change rather soon |
Leaf, and now Spearow, always promised OpenCL but has yet to deliver. I think that's a shame!
There are plenty of great folks working on ML experiments for Rust which feature OpenCL or frameworks thereof, including Arrayfire. I hope they'll forgive me for bringing Spearow to their attention:
@jonysy - One of the other fork-and-maintainers of Leaf
@botev - Past experiments in Autodiff for Rust/ML
@sebcrozet - Too many Rust/Math experiments to count, plus my favourite: rs2cl
@tedsta - Wrote a GPU N-array library and is making a Deep Learning toolkit based on that
@jramapuram - Using arrayfire-rs to make an ML Framework
There are lots of really clever individual efforts on Rust OpenCL ML, but I feel like a good push in one good framework would establish something useful. The above is kind of my dream-team; I hope flattery overcomes the annoyance of being mass-mentioned in here. :)
Any other suggestions?
The text was updated successfully, but these errors were encountered: