-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to produce images > 800²px on cards with more than 4GB RAM #39
Comments
One thing I've noticed when running nvidia-smi -lms 100 (100 ms update) is that you see the memory size 'peak' for a moment. Thus while your image may only be 4GB of RAM, the actual usage will peak at something much higher than that for a moment (and if in excess of your RAM, crash out). I'll let @andersbll comment more completely, but it seems like it is probably related to how the cudarray and deeppy library handle memory. Keep in mind cudarray and deeppy are pretty much solo-developer frameworks, so probably not as well refined as torch or caffe. There is a torch implementation of style-transfer, you may want to try that as well. Torch is a much more developed deep framework, so it probably handles memory allocations better. At the very least, give it a shot and see if it handles 800 x 800 images. |
Thanks for the recommendations, @filmo. I have actually tested a couple of different implementations, including that (neural-style adam/lbfgs, neural-art, neural-style-tf). That said, neural_artistic_style in my eyes yields the best results of them all by far, which is pretty damn impressive, given the solo developer background. The more a pity it is that it is the only tested implementation that suffers from this limit (at least in some configurations). Also I want to say once again that I do not think this is a memory spike. As I said I am automatedly testing many different sizes of images and a 4GB card can produce the exact same size as a 12GB. |
To clarify, I do not completely rule out a memory spike, but it wouldn't be one on a "regular" scale. Such a massive spike that does not occur with "<= 4GB" but immediately after "> 4GB", filling up the remaining 8GB of the card, would actually be another description of the problem then. |
Sorry I wasn't clear on what you were asking. I'm using a 980ti with 6GB and I'm able to apply style to images that are 1600 x 1057 and 1750 x 976 in size respectively. I think my 1600 x 1057 is about as big as I can go using this implementation. (I'm using the cudnn4 version of cudarray and deeppy from back in February and have not yet upgraded to cudnn5 yet. Perhaps that's part of the difference. ??? I also wonder if there's something particular about the Titan?? ) I agree, I also prefer the images created by this implementation over neural_style. Not sure why there are significant difference, but there definitely are. |
Hi! Thanks for the nice writeup, I wish I could reproduce the error myself. I think you are right @neuralisator, this does not look like an out-of-memory problem. Is this the problem we are trying to solve? In that case, could you insert a Regarding the visual style, I perform layer-level normalization of gradients. In the VGG-net, the features in each layer may exist on different scales. By L1-normalizing the gradient signals, I get a more even contribution across the different layers. I suspect this is the secret sauce. :) |
Hello @andersbll and thanks for replying so quickly. Yes, the error you linked is the one it's about. Before we get to debugging: About the quality of the images: The style is just applied so much better than in any other implementation :) Now let's debug: I think you meant print(out_shape) ? I modified the code so it looks like this:
I also added a logline to show the currently processed layer (in style_network.py). I resized the tuebingen / starry_night images to an adequate size for the testing. This is the output: And these are the memory stats during the process: |
Regarding speed, maybe the other implementation are using a different optimization method. The original paper uses L-BFGS as far as I remememer. It is a bit heavy compared to first-order method I use (Adam). I have a GPU with 12 GB RAM and I can produce images larger than 800^2 pixels including the images you have attached. Thanks for the output.txt. I can't see anything wrong there. Can you provide me with the output of the commands |
I hope the formatting works better this time:
|
Thanks, can you try to install Anaconda Python and use that instead? |
Will do and report back, it can take a while though. |
Setting everything up with anaconda did the trick. It's beautiful :D Thanks @andersbll - I will give more details tomorrow. |
Hooray, I'm glad to hear that! What version of the package Cython is your old Python installation using? Could you try updating that to the latest and see if it helps? |
The version that is used by the default installation is 0.24. Python 2.7.6 (default, Jun 22 2015, 17:58:13)
Now here's the funny part: I have been writing an install script to do the steps that I did, for reference, and now it's broken again. I can't really make sense of it. It definitely worked, I created several high resolution images. And now I'm getting the exact same error with the anaconda installation. So I'll be busy for a bit, banging my head against the table and trying to figure out what's going on there. I'll report back. |
Sadly, I couldn't get it to work in a reproducible way. As I wasn't / am not deeply familiar with the python package management, different distributions and the library paths, I probably made a big mess at some point in this installation orgy and accidentally hit a working combination for a second time. In retrospective, I shouldn't have touched it anymore, but I wanted to write an install script that works reproducibly, and installing from that broke it a second time, and I am basically back to zero and get the error message every-single-time. Now I scripted this and am through installing with like every possible combination of versions/settings I can imagine. The installation process is as follows:
Without creating a conda environment, just prepending anaconda2/bin to PATH, this will install cudarray with numpy 1.10-4 and cython 0.23-4. I can upgrade to numpy 1.11 and cython 0.24 using conda install, tried all combinations. I actually started with the virtual environment and installed the newer versions which didn't work. Then I used the old versions by setting PYTHONPATH and somewhere after that it worked. So i figured the older versions are working and the newer aren't. Well, there was obviously more to that. Finally,
I have even run this in nvidia-docker with both regular python and anaconda exclusively installed, and again, no luck. |
To be clear, INSTALL_PREFIX of libcudarray.so would be ~/anaconda2/ and the file would reside in ~/anaconda2/lib then. |
Ok, thanks for the thorough description. I might try to use some other Python installations and see if I can reproduce the error. Just to be sure, you are using a 64-bit version of Python, right? 32-bit might be problematic above 4GB. :) |
Yes, both stock and anaconda python2.7 are reported as 64-bit LSB executable. |
In the meantime I installed it on a fresh Linux Mint 17 and a Debian 7.1 installation. Anaconda variant on Mint failed. On debian I used pip to install the libraries, which for a change gave me numpy 1.11.1rc1 - it also failed. Running the anaconda version on Debian - failed. If I didn't have the images I created back then, I'd start to think I hallucinated it ever working. How can it be it fails on every installation? How can it be it actually did work at some point? It must have used some libraries that were already present on the system or something. This all makes no sense to me. |
I just ran the conversion test with cuda-memcheck, and when it crashes it puts out many of the following error messages. What differs is the thread number, so i assume every thread crashes with the same error here.
|
I think I just found the solution. I didn't export the CUDNN_ENABLED=1 environment variable when installing cudarray but only ran CUDNN_ENABLED=1 make. Now I noticed that in the setup.py file it checks again for the variable. So I probably manually exported that some time inbetween. Now it's working again with the anaconda installation, didn't check anything else yet, just wanted to post that. I will not touch this now and verify it in a docker image or another system later. So basically i was too stupid to follow the installation instructions. Sorry for that. Will post more results later. |
Verified :D this was the problem. It also works with stock python now. However, the error still occurs if CUDNN_ENABLED=0 is set (or not set at all) during cudarray installation. So this might still be worth looking into. @andersbll let me know if you require any more info. Also thanks for your support, and keep up the good work :) I'll ping 2 more people who seemed to have the same problem: @FabienLavocat and @mirzman |
Ah, great job finding it finally! I admire your persistence. :) I will try to look into |
I exported CUDNN_ENABLED=1. But the problem stays... |
@mirzman just do make sure: export CUDNN_ENABLED=1 has to be set before compiling/installing libcudarray (for both make and setup.py). If you compiled it in the same folder before, I suggest deleting that and creating a new clone of the repo. I haven't tested if you also have to reinstall deeppy, but in case, the same might apply there. Also you have to make sure the freshly compiled libcudarray.so is actually used, and not another old version that might be lingering somewhere. cuda-memcheck shows that (see above). |
it failes
it outputs "CUDArray: CUDA back-end not available, using NumPy." |
@MirzaN if I export CUDNN_ENABLED=1 here on the usual user account and then sudo, the variable is not set in that context. Please check the compiler output, if -DCUDDN_ENABLED=1 shows up there. If not, that will probably be your issue. |
Correction: as you don't run sudo before make, that should be fine. But you can't run setup.py with sudo like this. It's the exact same situation that I had in the end. |
If you install with |
make outputs:
also:
|
@mirzman: what is the output of |
@mirzman It's not the same: |
I used sudo -E. Now it outputs:
|
That means it is finally using CUDNN during installation. So far so good. I'm sorry I can't help with the next error you just ran into though, as I never encountered it. But this cudnn/non-cudnn mixup is out of the way now. |
Outdated, but maybe relevant: #16 |
It works! I just uninstalled and installed cudnn for several times. |
@mirzman You have it working for larger images with your graphics card? |
Hi i'm landing on this thread, I didn't read all the thing, but I remarked this : the size limitation seems to be somewhat tied with the combined size of both the subject and the style pic (ie, for a subject of 800x800 and a style of 800x800, that would give 800x800+800x800=1 280 000 pixels in total. I found the limit between 611496 and 738048 for a GT635M (see #42, there is a link to my tests) For now, it seem working, I tried multiple combinations of pictures doing a total around 611496px and it works... I update the list linked in the issue as soon as I have new high/low limits... I may be totally wrong about my deductions, but then come on #42 to discuss :) |
@DylanAlloy Yes, now it works. It was 2 problems:
|
@mirzman First off, thanks for the quick reply and working together to help resolve this for everyone who stops by later. What exactly went wrong during cudnn installation? |
@DylanAlloy I don't know :) There was no errors. When I understood, that I have no ideas, I decided to uninstall and install cudnn, cuda and so on. After cudnn the problem have gone :) |
@mirzman If you don't mind me prying a little for more information, what is the largest image you've processed using the code since you've got the CUDA problems resolved? |
@DylanAlloy I processed 2298x1280 image. I can test more large image. |
@mirzman that's pretty good. I guess I'll try reinstalling everything. I'm working with a 4GB Amazon nvidia instance so I hope I can figure it out. What amount of memory are you working with? |
@DylanAlloy GeForce GTX TITAN 12GB |
@mirzman Ahhh, I've broken the installation but if you're getting 2298x1280 and I was getting 512x512 earlier, I think it sounds about right given the amount of memory we have. |
@DylanAlloy Are you sure? |
@mirzman True, but at best I'm expecting <1megapixel processing until we know otherwise as per the same math, which is a bummer. Not the fault of the code as my options are simply limited. |
A couple of users (here, here and here) including myself experienced the problem that they cannot create images with resolutions larger than around 800²px even though the GPU RAM should allow larger than that. I am running a script that finds the maximum possible image size that can be computed by divide and conquer. Images are scaled to a certain size and the conversion script is run against them. If it fails or succeeds, it adapts the size until a working image size is found. As the image size is reduced in every fail case, it can be ruled out that the cause of the error is a too large image size, as a size that works will always be found. I am running this on a Titan X with 12GB RAM and monitor it using nvidia-smi. What happens is that the tested images with respective sizes above 800²px do consume memory above 4GB, but even if they don't max it out, that is, reach the 12GB, they will still fail to convert unless the size is back down to what I could formerly also produce on a 4GB GPU. An exception is thrown at some point when the script is updating layer "deeppy.feedforward.activation_layers.ReLU".
Others reported to be able to create images with larger sizes (here and here). @alexjc noted that he had to "manually free some buffers" to create larger images, maybe he can give us some insights?
I am running Linux mint 17 / nvidia driver version 352.93 / cuda 7.5 / cudnn5 with cudarray master (w/ cudnn5 support, but the error occurs also with cudnn disabled) / deeppy master.
The text was updated successfully, but these errors were encountered: