Skip to content
This repository has been archived by the owner on Nov 14, 2018. It is now read-only.

Can't start instances in custom juliabox install on AWS with 0.4.0. #307

Closed
IanButterworth opened this issue Oct 8, 2015 · 8 comments
Closed

Comments

@IanButterworth
Copy link

Following the install instructions was seamless, the site loads and oauth works, but the cog just spins while trying to create an instance, until it timeouts with no further information on the page.

Any ideas? Is there a good log to look at?
Thanks

Edit: I've looked into JuliaBox/engine/logs/engineinteractive.log but the errors aren't any more descriptive. Here's a sample

2015-10-08 04:17:13,525 - DEBUG - juliabox.handlers.main.MainHandler - AJAX monitoring loading of session [ianb_d4cf6fbbfe9b8cd1d83f46a14cf6bc635be78646] user[[email protected]]...
2015-10-08 04:17:16,423 - DEBUG - juliabox.handlers.main.MainHandler - AJAX monitoring loading of session [ianb_d4cf6fbbfe9b8cd1d83f46a14cf6bc635be78646] user[[email protected]]...
2015-10-08 04:17:16,425 - ERROR - juliabox.handlers.main.MainHandler - Could not start instance. Session [ianb_d4cf6fbbfe9b8cd1d83f46a14cf6bc635be78646] for user [[email protected]] didn't load.
2015-10-08 04:17:16,425 - ERROR - juliabox.handlers.main.MainHandler - Could not start instance. Session [ianb_d4cf6fbbfe9b8cd1d83f46a14cf6bc635be78646] for user [[email protected]] didn't load.
2015-10-08 04:17:16,511 - DEBUG - juliabox.handlers.main.MainHandler - Monitoring loading of session [ianb_d4cf6fbbfe9b8cd1d83f46a14cf6bc635be78646] user[[email protected]]...
2015-10-08 04:17:16,513 - ERROR - juliabox.handlers.main.MainHandler - Could not start instance. Session [ianb_d4cf6fbbfe9b8cd1d83f46a14cf6bc635be78646] for user [[email protected]] didn't load.
2015-10-08 04:17:16,513 - ERROR - juliabox.handlers.main.MainHandler - Could not start instance. Session [ianb_d4cf6fbbfe9b8cd1d83f46a14cf6bc635be78646] for user [[email protected]] didn't load.
2015-10-08 04:20:51,196 - INFO - juliabox.interactive.sess_container.SessContainer - Starting container maintenance...
2015-10-08 04:20:51,198 - INFO - juliabox.plugins.vol_loopback.loopback.JBoxLoopbackVol - Loopback Disk free: 10/10

Although /JuliaBox/engine/logs/enginedaemon_err.log gives:

2015-10-08 05:31:01,383 - ERROR - juliabox.srvr_jboxd.JBoxd - Exception in jboxd_method launch_session
Traceback (most recent call last):
  File "/jboxengine/src/juliabox/srvr_jboxd.py", line 24, in wrapper f(*args, **kwargs)
  File "/jboxengine/src/juliabox/srvr_jboxd.py", line 125, in launch_session SessContainer.launch_by_name(name, email, reuse=reuse)
  File "/jboxengine/src/juliabox/interactive/sess_container.py", line 101, in launch_by_name cont = SessContainer._create_new(name, email)
  File "/jboxengine/src/juliabox/interactive/sess_container.py", line 50, in _create_new home_disk = VolMgr.get_disk_for_user(email)
  File "/jboxengine/src/juliabox/vol/volmgr.py", line 149, in get_disk_for_user disk = plugin.get_disk_for_user(email)
  File "/jboxengine/src/juliabox/plugins/vol_loopback/loopback.py", line 126, in get_disk_for_user loopvol.refresh_disk(mark_refreshed=False)
  File "/jboxengine/src/juliabox/plugins/vol_loopback/loopback.py", line 172, in refresh_disk self.restore_user_home(True)
  File "/jboxengine/src/juliabox/vol/jbox_volume.py", line 250, in restore_user_home with tarfile.open(JBoxVol.USER_HOME_IMG, 'r:gz') as user_home:
  File "/usr/lib/python2.7/tarfile.py", line 1678, in open return func(name, filemode, fileobj, **kwargs)
  File "/usr/lib/python2.7/tarfile.py", line 1722, in gzopen fileobj = bltn_open(name, mode + "b")
IOError: [Errno 2] No such file or directory: '/jboxengine/data/user_home.tar.gz'

Also, my relevant config is:

"numdisksmax" : 10, # max disks (more than sessions to allow for transitions)
...
"interactive": {
    "numlocalmax": 10  # max concurrent users to support
},
sudo JuliaBox/scripts/install/mount_fs.sh /jboxengine/data 10 200 ${USER}

and docker ps -a returns:

CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS               NAMES
b0f4a19b3597        juliabox/engineinteractive   "/jboxengine/src/jbox"   18 minutes ago      Up 18 minutes                           engineinteractive_jboxsvc
97b53d6e556a        juliabox/webserver           "/jboxweb/openresty/n"   18 minutes ago      Up 18 minutes                           webserver_jboxsvc
88010feeb1b3        juliabox/engineapi           "/jboxengine/src/jbap"   18 minutes ago      Up 18 minutes                           engineapi_jboxsvc
f590f37e6576        juliabox/enginedaemon        "/jboxengine/src/jbox"   18 minutes ago      Up 18 minutes                           enginedaemon_jboxsvc
@samuelpowell
Copy link
Contributor

@ianshmean, the error in /JuliaBox/engine/logs/enginedaemon_err.log suggests that the user home directories were never built. This occurs in the last part of installation step 3, did you definitely run JuliaBox/scripts/install/img_create.sh home /jboxengine/data?

At present, a default installation according to the instructions will install 0.3.11 (from the preconfigured docker image), but will also download and install 0.4.0-rc2. If you modified your installation to be based on a 0.4 image, it may be that the subsequent installation of 0.4.0-rc2 performed by the container Dockerfile has caused some conflict. I can't test this at the moment, @tanmaykm, is it obvious to you if this will be a problem?

You have two options if you want to use 0.4:

  1. Follow the standard installation instructions, and you will get 0.4.0-rc2 in addition to 0.3.11 (you can see how that happens in JuliaBox/container/interactive/Dockerfile)
  2. Repeat what you have done and look carefully for error messages, especially when you execute img_create.sh, and we will try to debug.

@IanButterworth
Copy link
Author

I hadn't realised 0.3.11 provided both, so I've gone back and installed 0.3.11 on a fresh server.. With greater success. I can now launch JuliaBox instances, but I'm having a websockets issue that prevents the notebook from activating, i.e. A connection to the notebook server could not be established.

I saw #271 and I think it may be a similar issue. Chrome is telling me:

WebSocket connection to 'ws://ip-xxx-x-x-xx.us-west-2.compute.internal/api/kernels/3741xx-afb6-4d4b-xx31-ca33aexxe/channels?session_id=A048Axx3A4xxxxx' failed: Error in connection establishment: net::ERR_NAME_NOT_RESOLVED

I'm accessing my juliabox via a custom domain i.e. juliabox.domain.com, but as you can see, the websockets are being pointed to the AWS FQDN above, rather than this custom one.

I tried to override Compute.get_alias_hostname and restarted the service, but I'm still getting the same behaviour.

Thanks for the help

@IanButterworth
Copy link
Author

Also, I'm assuming the websocket connection is happening over 80, and obviously have that open on AWS.

@tanmaykm
Copy link
Member

tanmaykm commented Oct 9, 2015

@ianshmean You need to make your server's IP resolve to juliabox.domain.com, instead of AWS provided name. Probably make an entry in /etc/hosts?

@IanButterworth
Copy link
Author

@tanmaykm perfect, that fixed it. I just added 127.0.1.1 juliabox.domain.com juliabox

Thanks all!

@fqiang
Copy link

fqiang commented Oct 4, 2016

I am actually running into the same problem, websocket failure on firefox for aws deployment.

Firefox can’t establish a connection to the server at ws://ip-xxx-xxx-xxx-xxx.us-west-2.compute.internal/api/kernels/0f0c0f5a-e661-4812-b03b-b5ddd8cb2b69/channels?session_id=412C1AC960AF4B8383B60D58191416F0.

I am trying to use the aws provided public dns name and added a line (127.0.0.1 localhost ec2-52-34-13-59.us-west-2.compute.amazonaws.com ec2-52-34-13-59
) in the /etc/hosts file, but the websocket still trying to connect the aws' private dns name which can not be resolved.
@tanmaykm, @ianshmean do you know if it is possible to use the public domain name for accessing the websocket?or do I must to have non-aws domain name ?

@tanmaykm
Copy link
Member

tanmaykm commented Oct 4, 2016

@fqiang Maybe this would work instead: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-hostname.html ?

@fqiang
Copy link

fqiang commented Oct 5, 2016

@tanmaykm Thanks for your suggestion. Now, I brought a domain and tried the setup for my google cloud instance. But I still no success yet.
If I config the hostname and /etc/hosts file, I got a blank page. I just opened #472 to describe my observation. Could you take a look if you have time? any suggestions are welcome.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants