Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket.gaierror: [Errno -2] Name or service not known #2

Open
Chamberlain0w0 opened this issue Aug 3, 2020 · 9 comments
Open

socket.gaierror: [Errno -2] Name or service not known #2

Chamberlain0w0 opened this issue Aug 3, 2020 · 9 comments

Comments

@Chamberlain0w0
Copy link

Hello, I tried to run the code on Ubuntu 18.04, following the steps in the QuickStart, and it all went well. I successfully set up the web interface and I used 'docker logs -f dq-main' to inspect the realtime process in the terminal. However, when I tried to start a query task through the interface, that is, when I entered the query info and clicked the 'Run new query' button, I got this error in the logs in the terminal, repeatedly:

DuoquestServer listening on port 6001... Traceback (most recent call last): File "/home/duoquest/duoquest/nlq_client.py", line 21, in connect self.conn = Client(address, authkey=self.authkey) File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 492, in Client c = SocketClient(address) File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient s.connect(address) socket.gaierror: [Errno -2] Name or service not known

I wonder what the problem is. Is there something wrong with my network setting?

@chrisjbaik
Copy link
Collaborator

Hmm. What's happening is that the dq-main container is trying to connect to the dq-enum container on port 6000 (or whatever host/port is listed under the docker_cfg.ini file).

Either the network, or maybe the dq-enum container isn't set up correctly? Could you try looking at the log for that container and see what it says?

@Chamberlain0w0
Copy link
Author

Oh, I looked into the logs of dq-enum, I found that as soon as I run dq-main, the dq-enum container would stop instantly and report error as such:

Loading GloVE word embeddings...
Loading word embedding from /workspace/syntaxSQL/glove/glove.42B.300d.txt
Using fixed embedding
Traceback (most recent call last):
File "main.py", line 258, in
main()
File "main.py", line 169, in main
config.get('syntaxsql', 'glove_path'), args.toy)
File "main.py", line 92, in load_model
table_type='std', use_hs=True)
File "/workspace/syntaxSQL/supermodel.py", line 104, in init
self.multi_sql = MultiSqlPredictor(N_word=N_word,N_h=N_h,N_depth=N_depth,gpu=gpu, use_hs=use_hs)
File "/workspace/syntaxSQL/models/multisql_predictor.py", line 46, in init
self.cuda()
File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/rnn.py", line 113, in _apply
self.flatten_parameters()
File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/rnn.py", line 106, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

@chrisjbaik
Copy link
Collaborator

Ah.. My guess is that it might have something to do with the GPU driver version and how it interacts with the CUDA version in the container... What type of GPU/what driver are you using?

@Chamberlain0w0
Copy link
Author

I doubted that at first so I checked the version, but it seemed to be all right. You see, when I enter nvidia-smi in the terminal, I get:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 6000 Off | 00000000:3B:00.0 Off | Off |
| 31% 34C P0 1W / 260W | 0MiB / 24190MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

The GPU I used is Quadro RTX 6000, the Nvidia-driver version is 418.67 and the CUDA Version is 10.1. Although they are not the newest ones, the matching relationship seems right😢😢Shall I try to update the driver and Cuda?

@chrisjbaik
Copy link
Collaborator

Good question... I don't have an easy answer for this. I've always had to tinker around with the versions until it worked and it's pretty finicky...

Just to make sure, you did install this right? https://github.com/NVIDIA/nvidia-docker

Also this is the version for the machine I was running on... But I somehow doubt it will make a difference:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+

@Chamberlain0w0
Copy link
Author

Well, I think I have Installed Nvidia-docker right, because I can successfully run the example code they provided in the README and also, I successfully finished all the steps in QuickStart and I can get the webpage interface correctly...

I think I'll still try to update the versions of drivers and Cuda first, as it seems the only way to fix the problem now...

Anyway, genuine thanks for your help! 🙏😃

@chrisjbaik
Copy link
Collaborator

Ah I see. Hope it works! It's really unfortunate because in my ideal world this is exactly the type of problem Docker is supposed to fix :/

@Chamberlain0w0
Copy link
Author

Hello again, I have tried to update the driver version and Cuda version accordingly to the newest :

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+

But the problem is still there. 😭😭 The error reported in logs is the same:

Loading GloVE word embeddings...
Loading word embedding from /workspace/syntaxSQL/glove/glove.42B.300d.txt
Using fixed embedding
Traceback (most recent call last):
File "main.py", line 258, in
main()
File "main.py", line 169, in main
config.get('syntaxsql', 'glove_path'), args.toy)
File "main.py", line 92, in load_model
table_type='std', use_hs=True)
File "/workspace/syntaxSQL/supermodel.py", line 104, in init
self.multi_sql = MultiSqlPredictor(N_word=N_word,N_h=N_h,N_depth=N_depth,gpu=gpu, use_hs=use_hs)
File "/workspace/syntaxSQL/models/multisql_predictor.py", line 46, in init
self.cuda()
File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/rnn.py", line 113, in _apply
self.flatten_parameters()
File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/rnn.py", line 106, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

@chrisjbaik
Copy link
Collaborator

chrisjbaik commented Aug 5, 2020

Hmmmm. Again, no easy answer for this, as I had been using an older version of PyTorch so as to use the models/code trained for SyntaxSQLNet originally, which was made on Python 2. Googling pulls this up, which isn't super encouraging: https://discuss.pytorch.org/t/runtimeerror-cudnn-error-cudnn-status-success/28045/18

One thing to try is to build a new Docker container for dq-enum. The Dockerfile is in a git submodule (at enum/syntaxSQL) forked from SyntaxSQLNet (https://github.com/chrisjbaik/syntaxSQL/blob/e89832bdb621fc522be14250504c22d869c9cc1a/Dockerfile). Note that the top of the Dockerfile lists 2 additional files you should copy into your submodule directory before building - the GloVe pre-trained embeddings and the saved SyntaxSQLNet models, which you can find in the README.md of the submodule. My suggestion might be to try using a different PyTorch docker containers as the base container in the Dockerfile instead of FROM vanessa/pytorch-dev:py2, and/or installing different versions of CUDA toolkit in the Dockerfile and see if you have any luck :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants