- Step by step pipeline
- 1.1. Running jupyter notebook on the cluster
- 1.2. Open the ssh tunnel in your local machine
- 1.3. Jupyter notebook in your local machine browser
- Caveats and considerations
- 2.1. Port uniqueness
- 2.2. Close shh tunnels
- 2.3. ssh termination
- Automating script
- 3.1. Script potential issues and important details
- Acknowledgements
In this tutorial I'm going to explain the solution I found to run jupyter notebook in the RacimoLab servers remotely from a personal computer. I'm mainly based on this blog post which explains a similar problem. The different steps are summarized in Figure 1 and explained below. The idea is to create an ssh tunnel from a computing node, where your jupyther notebook will be running (Step 1), to your working station (Step 2), to finally open jupyter notebook in your browser (Step 3). I assume you've already installed jupyter notebook and all it's dependences and you know how to login to the RacimoLab servers.
Finally, I provide a bash script which automates the whole process. You might ask... Why not provinding just the script? Well, I decided to post the whole explanation in case someone is interested on the process and maybe improve it. But if you want a short and quick solution, jump to that section.
Notation summary:
- Computer (C):
- C -> Computing node in one of the RacimoLab servers. In Figure 1 denoted as
SSSSSS
- L -> Your local machine.
- C -> Computing node in one of the RacimoLab servers. In Figure 1 denoted as
- Ports (P):
- Cp -> C's port. Here denoted as
9999
. - Lp -> L's port. Here denoted as
7777
.
- Cp -> C's port. Here denoted as
It's important to notice that ports (P) must be 1024 >= P <= 65535. More info about ports can be found here and here.
Figure 1. Schematic representation to run Jupyter Notebook in RacimoLab servers. Highlighted are:
- Yellow : KU username
- Blue: two digit number out of the 6 possible servers (01, 02, ..., 06)
- Red : C
- Green : Cp
- Purple : Lp
- Log into the RacimoLab servers in which you want to run your Jupyter Notebook on (on Figure 1.1, you should replace the Xs highlighted in yellow for your user and the Ys highlighted in blue for the server you want to connect to).
- Activate your desired environment where you have installed jupyter notebook
- Run jupyter notebook in
--no-browser
mode. You must also specify Cp, which must be unique to your connection (so be creative :) ), otherwise it gives problems.
ssh XXXXXX@racimocompYYfl
conda activate env
jupyter lab --no-browser --port=9999
- On a new terminal, create a shh tunnel with the command shown in Figure 1.3. Xs highlighted in yellow represent your user to connect to RacimoLab servers. Again, you must decide a new port Lp (on Figure 1.3, represented as 7s highlighted in green) and indicate Cp (on Figure 1.3, represented as 9s highlighted in purple)
ssh XXXXXX@racimocompYYfl -L 7777:localhost:9999 -N
- Open your favourite browser and type
localhost:7777
- TADAAAA!
While Cp and Lp can be the same number (1024 >= P <= 65535), if there are multiple users using the same ports in the same "computer" it's going to create some conflicts and errors.
Sometimes, when I close the shh tunnels (Cntl+C), the process keeps running on the background, meaning that the port is still in use. Then, if I try to open again the tunnels, I get the error that... Surprise! the port is on use. To solve that, I kill the process that it's running that particular port with the following command
for job in `ps aux | egrep 9999 | egrep "ssh" | egrep XXXXXX | awk '{print $2}'`; do kill -9 ${job}; done
This selects from all processes running, the ones that have the "9999" (port-id), "ssh" and "XXXXXX" (username) and kill them.
Sometimes, when the ssh hasn't received orders for a while, it automatically closes down. This kills the ssh tunnel. To prevent that, I first start a tmux
session so that even when my session is killed, the process goes on and it does not stop my jupyter notebook while working.
Let me know if you find more problems while using these to run jupyter notebook that are not reported here and if you have improvements and suggestions!
I wrote the raju.sh (yes... not feeling creative to give it a better name :) ) bash script which automates all steps expained in 1. Step by step pipeline and improves in some aspects. If you want to use this script, at the header of the file, you will find variables which will need to be manually configured in order to configure the default variables (e.g. u
variable is your KU username, the environment name e
or the port numbers c
, l
and the default RacimoLab server number s
). Once you've configured the default variables, you can run:
bash raju.sh
Alternatively, you can specify any of those variables by using argumepts. For example, if I want to change the enviroment and the RacimoLab server, I can do so by
bash raju.sh -e env2 -s 03
If you want more details on which options you can use, you can run the help command:
bash raju.sh -h
There are 2 options I want to elaborate more since I believe they are very handy.
Because the tunnels and the jupyter notebook are running in different tmux sessions, I also incorporated a way to kill those tmux sessions to finish all processes. You can do that by running the following:
bash raju.sh -k
This command will kill the tmux sessions, and you can define some other options (in case your previous raju.sh
run was not with the default options).
Finally, there is the option to reestablish connection to the jupyternotebook by only running the tunnel in case connection was lost due to loosing Wi-Fi connection or VPN dropped. It is important to notice that for reconnection to work the tab in the browser where you had the jupyter nootebook running must not be closed down.
Once you regain Wi-Fi connection or reestablished VPN credentials, you just need to run:
bash raju.sh -r
This will just establish again the tunnel from your local computer to the RacimoLab server. This will enable you to continue using your jupyter notebook as before. It is possible that if you had a process running, the output might be lost from your browser. Nonetheless, all variables will be kept and you will be able to continue runing cells below witout running cells above to load previous processes and variables.
-
I use
tmux
in order to open a terminal from which I can log out both in the RacimoLab servers and in my local computer. Make sure you have installedtmux
in both. I installedtmux
in my local computer usingbrew
as shown in here. -
The pipeline checks if there is already a tmux session running with the defined name. If there is not,
tmux
returns a warning likefailed to connect to server
orno server running on /private/tmp/tmux-1349466776/default
. This will be printed in your terminal but is not a sign of things going wrong, it's just me not knowing how to avoid it to be printed on the terminal :) . -
Try to be creative and change the ports. As I say before, it is important that your port is unique!
-
Make sure you can access the RacimoLab servers without typing your password manually. If you want to generate your ssh key, check this link. In my case, the pipeline was getting stuck because to access a server for the first time my script was required to type a password, which was not prepared for.
-
The last command opens Google chrome to access to jupyter notebook. If you don't have the browser installed, it might lead to a problem.
-
I'm a Mac user, which might imply that my solutions and script works fine for other mac users, but not Windows or Linux OS.
Please, let me know if you encounter problems when you run my pipeline on your computer and also if you find solutions for your problems; I will post them on this github page so that other users can also benefit from your effort!
I would like to thank Graham Gower for his techical comments on ports and the proper way to kill a process (Cntrl-C) instead of suspending it (Cntrl-Z) when stopping shh tunnels. He's also giving me input for how atutomatize this whole process which I hope to achieve soon and update this instructions with it.
I thank Antonio Fernandez Guerra for his tricks on how to connect directly to one computing node by customizing .ssh/config
file.