Failing in runing "00-classification.ipynb" example GPU mode with ARM Mali GPU #118

XiaoMaol · 2017-07-25T16:57:17Z

I really appreciate your help thus far. Sorry that I run into more problems again.
I installed my caffe as suggested by you with the command

ck install package:lib-caffe-bvlc-opencl-clblast-universal --env.DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON --env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=2 --env.CAFFE_BUILD_PYTHON=ON --env.CK_MAKE_CMD2="make pycaffe"

I have some problems run the example "00-classification.ipynb," provided by Caffe in the directory

~/CK-TOOLS/lib-caffe-bvlc-opencl-clblast-master-gcc-5.4.0-linux-32/install/examples

I run by the command

ck xset env tags=lib,caffe && . ./tmp-ck-env.bat && jupyter notebook

I can perfectly import caffe, and make the cpu load the weights, deploy, and run.
But I cannot make the gpu run. the program died in the line

caffe.set_device(0)  # if we have multiple GPUs, pick the first one
caffe.set_mode_gpu()
net.forward()  # run once before timing to set up memory
%timeit net.forward()

it produced error message

Error Message:
I0724 17:09:56.982797 25836 device.cpp:56] CL_DEVICE_HOST_UNIFIED_MEMORY: disabled
std::exception

when I opened the debugger, the error message changed into

Can you guys reproduce the error? Have you guy successfully run the Caffe Example on the Mali GPU ?

///////////////////////////////////////////////////////////////////////////////////////////////////////////////
P.S.

Suddenly, it reminded me the command for the installation of ck-caffe in #114 sugggested by @psyhtest :

ck install package:lib-caffe-bvlc-opencl-clblast-universal \
  --env.DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON \
  --env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=2

Espeically

--env.DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON

what does flag do? why does it cause the kernel to die?

Furthermore, I have tried to install by command

ck install package:lib-caffe-bvlc-opencl-clblast-universal \
  --env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=2

without the flag

--env.DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON

With this option, I can import caffe, but when I am trying to load the weights, the program's kernel died with after line

net = caffe.Net(model_def,      # defines the structure of the model
                model_weights,  # contains the trained weights
                caffe.TEST)     # use test mode (e.g., don't perform dropout)

with error

the error in command line is

It died while attemping to load data for weights.

The text was updated successfully, but these errors were encountered:

psyhtest · 2017-07-27T22:24:46Z

@XiaoMaol First, let me explain what the DISABLE_DEVICE_HOST_UNIFIED_MEMORY flag does. It was introduced as a workaround for a rather annoying ambiguity of the OpenCL specification that says that CL_DEVICE_HOST_UNIFIED_MEMORY property of clGetDeviceInfo():

Is CL_TRUE if the device and the host have a unified memory subsystem and is CL_FALSE otherwise.

Now, what exactly is "unified memory subsystem" is open to interpretation. When people come from desktop and server background, they assumed that memory is unified in the OpenCL 2.0 sense: the host and the device can share the same pointers; therefore, no copy between the host address space and the device address space is required. As a consequence, you can see code like this:

#define ZEROCOPY_SUPPORTED(device, ptr, size) \
             (device->is_host_unified())
<...>
              CHECK_EQ(mapped_ptr, cpu_ptr_)
                << "Device claims it support zero copy"
                << " but failed to create correct user ptr buffer";

(see syncedmem.cpp)

Unfortunately, mobile GPU vendors started returning CL_TRUE even for OpenCL 1.x implementations. In their interpretation, the CPU and the GPU in a system-on-chip typically share the same physical memory; therefore, they argue, the memory subsystem is unified. In fact, the CPU and the GPU cannot share the same pointers. You will hopefully see how this is at odds with the expectation from applications like OpenCL Caffe, and leads to checks similar to the above failing at runtime. (Probably this is what you are getting when trying to load the weights.) To workaround this issue, you can explicitly specify at build time that you wish Caffe to ignore whatever the driver is saying, and assume that the memory subsystem is not unified. As you have probably guessed by now, you use DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON to do that.

Now, the output you posted seems to suggest that disabling this property leads to an exception:

Error Message:
I0724 17:09:56.982797 25836 device.cpp:56] CL_DEVICE_HOST_UNIFIED_MEMORY: disabled
std::exception

I believe, however, this is just an unfortunate intermixing of log info (which gets printed even when everything goes well) with the exception message.

psyhtest · 2017-07-27T22:32:27Z

You may now ask, why do you have this Jupyter error in the first place? The honest answer is that I have no idea :). But if I were to guess, the device query seemed to be somewhat different for OpenCL. For example, for program:caffe, the query_gpu_cuda command looks like:

"run_cmd_main": "$<<CK_CAFFE_BIN>>$ device_query --gpu=$<<CAFFE_COMPUTE_DEVICE_ID>>$"

while the query_gpu_opencl command looks like:

"run_cmd_main": "$<<CK_CAFFE_BIN>>$ device_query"

So there may well be some difference in how a device gets selected too.

Maybe @naibaf7 has a better idea?

naibaf7 · 2017-07-28T16:08:14Z

Without having read the whole context, it is just important to use set_mode_gpu() before set_device(), otherwise it will fail. But this is the only "more strict" rule compared to CUDA Caffe.

psyhtest · 2017-07-28T19:00:49Z

From @XiaoMaol's initial comment, the order of these calls is reversed:

caffe.set_device(0) # if we have multiple GPUs, pick the first one
caffe.set_mode_gpu()

XiaoMaol · 2017-07-30T22:47:05Z

Hey, @naibaf7 and @psyhtest:

I have tried to reverse the order of the call, but it still does not work. I think the problem is somewhere else, but I do not know where.

what other more information can I provide to give a better description of the problem? Thanks!

XiaoMaol · 2017-07-31T19:06:44Z

Today, I have tried Caffe without the Python Layer, and this is what I get.

When I tried to run with the GPU device 0, the program is killed:

when I run the program with device1, I have gotten that (Notice that the first iteration takes around 6 mins, then the program is killed):

However, when I have tried to run device 1 again, the program seems to freeze.

When I tried Caffe with gdb, the program just seems to freeze.

naibaf7 · 2017-07-31T19:14:23Z

Sounds a lot like a faulty driver. Can you try an absolute minimal network with just one fully connected layer to differentiate if Caffe works on that driver at all or not?

XiaoMaol · 2017-08-02T01:23:49Z

Hey @naibaf7, @psyhtest :
The Caffe can run a smaller network. In addition, when I reduce the input image size of the Alexnet into 1 from 10, it also works. Thus, it seems to be a problem of memory rather than Caffe itself.
However, I am suspecting the crash is due to the memory of my device rather than GPU memory.
I am currently also using ARM computing library as well. Its AlexNet also crash when the input batch number is more than 5 or 6. (Here is the issue: ARM-software/ComputeLibrary#190)

and the memory of the device is

The platform is

Linux odroid 3.10.103-124 #1 SMP PREEMPT Tue Oct 11 11:51:06 UTC 2016 armv7l armv7l armv7l GNU/Linux

I am just wondering, Is 2GB of the memory too small to properly Caffe?

The GPU info is:

psyhtest · 2017-08-02T09:39:55Z

@XiaoMaol On Odroid and similar system-on-a-chip platform, the total system memory (e.g. 2 GB) is shared between all the devices. The GPU doesn't have any dedicated memory like on desktop or server cards. When you run out of this memory, you cannot use the GPU either.

XiaoMaol changed the title ~~effects of installation flag "--env.DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON"?~~ Failing in runing "00-classification.ipynb" example GPU mode with ARM Mali GPU Jul 26, 2017

psyhtest self-assigned this Jul 27, 2017

psyhtest added question wontfix labels Jul 27, 2017

psyhtest closed this as completed Jul 27, 2017

XiaoMaol mentioned this issue Jul 28, 2017

Python kernel is dead when I switch to GPU mode and run net.forward() BVLC/caffe#4826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing in runing "00-classification.ipynb" example GPU mode with ARM Mali GPU #118

Failing in runing "00-classification.ipynb" example GPU mode with ARM Mali GPU #118

XiaoMaol commented Jul 25, 2017 •

edited

Loading

psyhtest commented Jul 27, 2017 •

edited

Loading

psyhtest commented Jul 27, 2017 •

edited

Loading

naibaf7 commented Jul 28, 2017

psyhtest commented Jul 28, 2017

XiaoMaol commented Jul 30, 2017

XiaoMaol commented Jul 31, 2017

naibaf7 commented Jul 31, 2017

XiaoMaol commented Aug 2, 2017

psyhtest commented Aug 2, 2017

Failing in runing "00-classification.ipynb" example GPU mode with ARM Mali GPU #118

Failing in runing "00-classification.ipynb" example GPU mode with ARM Mali GPU #118

Comments

XiaoMaol commented Jul 25, 2017 • edited Loading

psyhtest commented Jul 27, 2017 • edited Loading

psyhtest commented Jul 27, 2017 • edited Loading

naibaf7 commented Jul 28, 2017

psyhtest commented Jul 28, 2017

XiaoMaol commented Jul 30, 2017

XiaoMaol commented Jul 31, 2017

naibaf7 commented Jul 31, 2017

XiaoMaol commented Aug 2, 2017

psyhtest commented Aug 2, 2017

XiaoMaol commented Jul 25, 2017 •

edited

Loading

psyhtest commented Jul 27, 2017 •

edited

Loading

psyhtest commented Jul 27, 2017 •

edited

Loading