Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error importation cudnn #38

Open
theoeiferman opened this issue Jun 25, 2020 · 20 comments
Open

error importation cudnn #38

theoeiferman opened this issue Jun 25, 2020 · 20 comments

Comments

@theoeiferman
Copy link

when launching :

th src/train.lua -phase train -gpu_id 1
-model_dir model
-input_feed -prealloc
-data_base_dir data/sample/images_processed/
-data_path data/sample/train_filter.lst
-val_data_path data/sample/validate_filter.lst
-label_path data/sample/formulas.norm.lst
-vocab_file data/sample/latex_vocab.txt
-max_num_tokens 150 -max_image_width 500 -max_image_height 160
-batch_size 20 -beam_size 1

cudnn is not found....
I tried" luarocks install cudnn"
still doesen't work

Screenshot 2020-06-25 at 17 20 28

@da03
Copy link
Collaborator

da03 commented Jun 25, 2020

Hmm can you try "require cudnn" in a torch prompt (by command "th") and see if that works?

@theoeiferman
Copy link
Author

theoeiferman commented Jun 25, 2020

Screenshot 2020-06-25 at 17 33 29
I have this as an answer, I m not very familiar with lua language

@da03
Copy link
Collaborator

da03 commented Jun 25, 2020

sorry I meant entering "th" first,

then in the prompt, enter "require cudnn"

@theoeiferman
Copy link
Author

thanks for the advice ! but it still doesn't work . It seems that cutorch is also not install . I tried "luarocks install cutorch " but I have failed building..!

Screenshot 2020-06-25 at 23 29 00

@da03
Copy link
Collaborator

da03 commented Jun 25, 2020

Oh I think it seems to be some issue with installing cudnn. Installing torch correctly might be hard, would you mind using docker? Here is a docker file that can be directly used: https://github.com/OpenNMT/OpenNMT/blob/master/Dockerfile

@theoeiferman
Copy link
Author

Ok I didn't know about the existence of Docker, thanks for the tips it looks great!
I tried to build an image by copying the content of the dockerfile you sent me but I get "unable to prepare context: context must be a directory: /Users/teiferman27/Dockerfile"
When I tried to launch directly "docker build https://github.com/OpenNMT/OpenNMT/blob/master/Dockerfile#L6"
Screenshot 2020-06-28 at 21 27 52

I have one more question if I succeed to launch this dockerfile. Then I can use the Lua language with file on my computer? or just inside the "container". Thanks you for the advice already !

@da03
Copy link
Collaborator

da03 commented Jun 28, 2020

For the first question, I think you need to put the dockerfile inside a folder, then inside this folder do docker build . will generate the image.

For the second question, it allows using Lua inside docker container only.

@da03
Copy link
Collaborator

da03 commented Jun 28, 2020

BTW, I think you might need to use nvidia-docker (https://github.com/NVIDIA/nvidia-docker) to support using GPUs inside docker container.

@theoeiferman
Copy link
Author

Ok I succeeded to use docker build -t operating_lua .
After running all the afternoon to build the image, I then have tried to launch the command docker run operating_lua but it is just opening and closing on the docker dashboard ..... You think docker doesn't support the container and I need to use nvidia-docker ? thanks for the respond

@da03
Copy link
Collaborator

da03 commented Jun 29, 2020

I think it should be nvidia-docker run -it operating_lua /bin/bash, but it might be better to directly check docker documentation.

@theoeiferman
Copy link
Author

I wanted to install nvidia-docker but I needed to install NVIDIA driver first. but it seems that this step require Linux operating system and I am on MacOs .... but I was surprised because one good point for docker was that everyone can run it from every operating system !

Then I tried docker run -dit operating_lua and I was able to open the container and read in it :

Screenshot 2020-06-29 at 22 16 15

Moreover when I tried to see if the module 'cudnn' is in the system I get :

Screenshot 2020-06-29 at 22 28 53

I am still confused about how to approach the big picture... Can I import files into the container? Do I really need nvidia-docker ?

Thank you again for your time @da03 .

@da03
Copy link
Collaborator

da03 commented Jun 29, 2020

Hmm I suspect that your CUDA driver version might be too outdated (what's the output of nvcc --version and nvidia-smi?), which caused issues both for require cudnn and for installing nvidia-docker. There are actually CUDA drivers available for mac: https://www.nvidia.com/en-us/drivers/cuda/mac-driver-archive/. Fixing the driver version issue might solve all problems.

@theoeiferman
Copy link
Author

On the link https://github.com/NVIDIA/nvidia-docker , they talk about Linux
Screenshot 2020-06-29 at 22 46 21
nvcc --version and nvidia-smi are unkown for now but probably because I didn't install nvidia-docker yet ? I am going to install CUDA drivers and then nvidia-docker.

@da03
Copy link
Collaborator

da03 commented Jun 29, 2020

Oh no, so it seems nvidia-docker would not work on Mac... I have never used GPUs on Mac, but I think with a proper CUDA installation (https://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x/index.html), you should get both nvcc --version and nvidia-smi working.

@theoeiferman
Copy link
Author

I can't install on my mac because Nvidia doesen't support mac system anymore.
I think my mac may be it is too old.
I check my graphics on the system information as shown in
https://www.quantstart.com/articles/Installing-Nvidia-CUDA-on-Mac-OSX-for-GPU-Based-Parallel-Computing/
But I don't have NVIDIA graphic card on my computer.

I think the incompatibility is also mentionned there
https://developer.nvidia.com/nvidia-cuda-toolkit-developer-tools-mac-hosts

I am surprised of this CUDA/NVIDIA requirement to use the container though.

@da03
Copy link
Collaborator

da03 commented Jun 29, 2020

Oh that explains why: this code (or cudnn) only supports CUDA and cannot run on systems without GPUs. While this version (https://opennmt.net/OpenNMT-py/im2text.html, code can be found at https://github.com/OpenNMT/OpenNMT-py) supports CPU only training, doing so would be extremely slow without the parallelism provided by GPUs. Another way might be using cloud computes such as Amazon EC2 or Google GCE or Microsoft Azure, and rent a GPU instance.

@theoeiferman
Copy link
Author

I manage to get another computer but the GPU is AMD Radeon and so I can't use the cudnn module ... I think it should be mentioned on the prerequisites since docker can't solve this hardware issue.

I was about to try CPU but I think that on the link you gave me (https://opennmt.net/OpenNMT-py/im2text.html) there is dependencies like torch vision and pytorch is required ( and so CUDA-enabled GPU are needed again no ? )

I try to follow the steps from https://opennmt.net/OpenNMT-py/im2text.html
but the command onmt_preprocess is not found. There is a step I have missed ?

I will try to use cloud computes probably.

But just to be sure ( correct me if I am wrong) :

@da03
Copy link
Collaborator

da03 commented Jul 3, 2020

Yes you are right that OpenNMT-py uses PyTorch and this project uses LuaTorch. PyTorch does not require GPUs (you can do CPU-only installation), but again, it might be extremely slow without using GPUs.

For the onmt_preprocess missing issue, have you installed OpenNMT-py following the instructions here? https://github.com/OpenNMT/OpenNMT-py

@theoeiferman
Copy link
Author

theoeiferman commented Jul 20, 2020

I had issues with onmt commands because I use python environment using google collab ( You can activate GPU on the settings and it seems to be a good free solution )
Installing OpenNMT-py with pip instead of clonings the project worked for the onmt command.

Is Using Google-collab a good way to perform GPU calculations ? I am trying to train the model, but it takes a lot of times, do you now how much ?

@da03
Copy link
Collaborator

da03 commented Jul 20, 2020

Yeah I think so! The only problem is that the runtime would be disconnected if it's idle for a certain period of time, and the instance would be freed so all progress would be lost. Therefore, you might want to connect to your google drive, and save progress (checkpoints) to your google drive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants