Skip to content

1. Linux Introduction

lukemartinlogan edited this page Jul 10, 2023 · 10 revisions

All top500 supercomputers and the majority of cloud systems run Linux. This tutorial will cover the basic aspects of using Linux.

1.1. Choosing an OS

1.1.1. Which distro?

There are many Linux distributions out there, and they all have different quirks. The following is a list of distros which are currently being used, or we expect may eventually be used

  1. Ubuntu: we use Ubuntu currently in our cluster. It's not widely used in HPC, but it's a mature, well-maintained distro. There are many ubuntu-like systems (e.g., Linux Mint), feel free to use any of them. We recommend using an ubuntu-like system for the purposes of our research.
  2. Centos7/8: CentOS 7 and 8 are used pretty commonly in HPC. However, they have been discontinued. We don't recommend using them for your main OS, but you should be aware you may have to deal with them for software portability
  3. Rocky: a replacement to CentOS, and is a potential candidate for HPC systems.
  4. Alma: another CentOS-like system, which is also a potential candidate for HPC systems

1.1.2. What if I have Windows?

Microsoft Windows has the Windows Subsystem for Linux (WSL). It's competent for the majority of Linux development. If you don't want to kill your current windows installation, WSL should be sufficient for most cases of our research. WSL provides an Ubuntu installation. Follow Microsoft's instructions on how to enable WSL.

1.1.3. What if I have a Mac?

Macs are a bit tricky. Mac and Linux are NOT the same thing. Generally speaking, we highly recommend developing on a Linux distribution which is similar to what is used in HPC. Unfortunately, Macs don't have something like WSL. We recommend using either a container (e.g., Docker/Singularity) or a virtual machine (VirtualBox/Qemu) for development. Generally, containers are much faster than VMs by default. We recommend using a container for Ubuntu. We describe Docker here.

1.2. Basics of using a Terminal

In this section, we will use Ubuntu 22.04 as our Linux distro. First, we will discuss the basic aspects of using a Linux terminal.

A terminal provides a way of interacting with the OS using a command prompt. Users enter commands into the terminal based on memory instead of using a graphical user interface (GUI). This can be faster since it avoids clicking and memorizing menus. However, it is also necessary since many HPC machines are remote and do not support fancy GUIs. There are many commands Linux users should be familiar with in general.

1.2.1. Interacting with the Filesystem

Examples of the basic filesystem operations are as follows:

# Change directory to your home directory
# ~/ is shorthand for home directory
# ~ is a special character used by the terminal
cd ~/

# Create a directory
mkdir hello

# Create directories + subdirectories
mkdir -p hello/hi/hi2

# Change into the "hello" directory
# cd: "change directory"
cd hello

# Create 4 empty files
touch hi.txt
touch hi2.txt
touch hi3.txt hi4.txt

# List the hello directory (view its contents)
ls hello

# Change into the "hi2" directory
# NOTE: ./ is optional
# cd hi/hi2 would do the same thing
cd ./hi/hi2

# Go to the parent directory (hi)
cd ..
# Go to hello's parent
cd ../../

# Remove 3 of the files
# NOTE: rm is permanent, data recovery is not really possible
rm hello/hi.txt
rm hello/hi2.txt hello/hi3.txt

# Remove directories and subdirectories
# "r" means to "recursively" delete all data in the directory
# "f" means "force" delete the directory without asking for confirmation
rm -rf hello

# Append a string to a file
# >> is the append operator
# hello.txt contains 2525
echo "25" >> hello.txt
echo "25" >> hello.txt

# Create a new file with string as its data
# > truncates a file and replaces its text with the echo'd string
# hello.txt contains 30
echo "30" > hello.txt

1.2.2. Environment variables

Environment variables are used to store some sort of information without having to hard-code it each time. Many programs rely on environment variables as a way of passing information to the program.

1.2.2.1. Printing Environment Variables

To set an environment variable and print it, run:

MY_VAR=25
echo ${MY_VAR}

Output:

25

1.2.2.2. Environment Variable Scope

Scope refers to the visibility of a variable. For example, can a program read the environment variable after it has been set?

Let's say we have the following bash script named printenv.sh:

#!/bin/bash

echo ${MY_VAR}

To use this bash script, run:

cd ${SCS_TUTORIAL}/1.1.linux_intro

1.2.2.3. Limited Scope

Run the following code:

MY_VAR=25
bash printenv.sh

Output:

The output is empty. This is because the scope of MY_VAR is limited to the current shell. When launching printenv.sh, a new shell is created and the variable MY_VAR is not passed to it.

1.2.2.4. Pass Environment Variables

To pass environment variables to programs, run the following:

MY_VAR=25
MY_VAR=20 bash printenv.sh
echo ${MY_VAR}

Output:

20
25

In this example, MY_VAR=20 is passed to printenv.sh, which then prints 20. However, MY_VAR=20 does not change the value of MY_VAR in the parent shell. Running echo ${MY_VAR} prints 25, which was the original value.

1.2.2.5. Export Environment Variables

Exporting an environment variables modifies the value of the variable in the current shell and passes the variable to programs executed in the shell.

export MY_VAR=20
bash printenv.sh
echo ${MY_VAR}

Output:

20

1.2.2.6. Removing Environment Variables

To remove an environment variable

unset MY_VAR
bash printenv.sh

Output:

1.2.2.7. Common Environment Variables

Below we describe some environment variables that come automatically when you open a terminal:

# HOME represents your home directory
echo "HOME=${HOME}"
cd ${HOME}
# PWD stands for print working directory
# Output will be equivalent to ${HOME} in this case
echo "PWD=${PWD}"
# USER represents your username
echo "USER=${USER}"

This list is not comprehensive, and there are many more variables that are important -- but they will be discussed later.

1.2.2.8. Bashrc: Saving Environment Variables

In your home directory, there is a file called ~/.bashrc. This file is executed every time you open a shell. This file can be used for storing environment variables.

Bashrc contains a bunch of code. This code is used to initialize the state of a shell. Typically it's best to place environment variables at the bottom of the bashrc file. This can be done as follows:

echo "export MY_VAR=30" >> ~/.bashrc
  1. echo "MY_VAR=25" will print the string "MY_VAR=25".
  2. >> will append the string "MY_VAR=25" to ~/.bashrc

If you open ~/.bashrc you should see at the bottom of the file is that export statement.

Note, appending the export statement does NOT rerun the bashrc script. Your current shell will not be updated yet. To execute the bashrc script and update the current shell, run:

source ~/.bashrc
echo ${MY_VAR}

Output

30

1.2.3. Simple text editing

There are three main terminal text editors: nano, vim, and emacs. vim and emacs rely heavily on memorizing key bindings. For new users, this is typically challenging. In general, we do not code using terminal text editors, we only use them to do minor changes. We recommend that large changes to files be made in an IDE, office tool, or graphical text editor.

For this reason, we will discuss only the basics of vim and nano. We will not touch emacs, as vim and nano are almost always the default text editors. Generally, we recommend nano since it's simple. Some cases, vim may be the default, so it will be discussed too.

1.2.3.1. Nano

To open or create a file using nano, do the following:

nano ~/hello.txt

The file can be edited immediately (if you have edit rights to the file).

The main keybindings to be aware of are as follows:

  1. "Ctrl s" will save a file
  2. "Ctrl x" will close the file

NOTE: nano does not come by default on every single machine. You may have to install separately.

1.2.3.2. Vim

To open a file using vim, do the following:

vi ~/hello.txt

When the file is opened, the main keybindings to consider are is follows:

  1. Initially, the file is opened in "normal mode". You must press "i" in order to switch to "edit mode".
  2. When you have finished editing, press ESCAPE on your keyboard. This will bring you back to normal mode
  3. Press ":" to bring you into "command mode"
  4. Then type "wq" to "write" and then "quit". Press enter, and the editor will close

NOTE: if you accidentally press "Ctrl s", you will not be able to type anything (not even commands). To get out of this, type "Ctrl q"

1.3. SSH

SSH is a secure way of connecting to a remote machine. SSH relies on public-private key cryptography to secure the connection. The private key is a secret that only you should know. The public key should be given to other people. Generally, RSA is used as the algorithm for generating keys. SSH is the backbone of most HPC machines. You cannot access these machines without knowing how SSH works, so we introduce it here.

The following guide will demonstrate how to setup SSH for connecting to an SSH server. This guide does NOT discuss how to spawn an SSH server.

1.3.1. Creating the keys

SSH keys can be given passwords, but we recommend against. We consider the SSH key itself to be secret enough that a password is completely unnecessary. This is referred to as "passwordless-ssh". Passwordless-ssh is required for many HPC programs.

To create a public/private key pair, run the following command:

ssh-keygen

The default names for the keys are as follows:

  1. The private key is "~/.ssh/id_rsa"
  2. The public key is "~/.ssh/id_rsa.pub"

You can use other names (it doesn't have to be id_rsa), but we recommend against this in general. Many SSH-based tools become cumbersome with keys which are non-default.

1.3.2. Ensuring permissions

SSH is very particular about the permissions of the ~/.ssh directory and the files in that directory. Below describes the permissions that need to be set to make SSH behave.

For convenience, feel free to copy-paste this. A detailed description of what these do is under "How does chmod work?"

sudo chmod 700 ${HOME}/.ssh
sudo chmod 644 ${HOME}/.ssh/id_rsa.pub
sudo chmod 600 ${HOME}/.ssh/id_rsa
sudo chmod 600 ${HOME}/.ssh/authorized_keys
sudo chmod 600 ${HOME}/.ssh/config

1.3.2.1. How does chmod work?

chmod stands for "change mode". It has the following syntax

sudo chmod [mode] [path]
  • "mode" is a 3-digit code.
  • Each digit is between 0 and 7
  • The digits have the following meaning: [owner] [group] [user]
  • owner: typically you
  • group: files can be apart of a group. Only one group per file or directory.
  • user: typically anyone other than you

A single digit can have the following values:

  1. No permissions
  2. Execute only
  3. Write only
  4. Write and execute (2 + 1 = 3)
  5. Read only
  6. Read and execute (4 + 1 = 5)
  7. Read and write (4 + 2 = 6)
  8. Read, write, and execute (4 + 2 + 1 = 7)
# The SSH directory
# Owner has read, write, execute permissions.
# No one else can touch this directory.
sudo chmod 700 ${HOME}/.ssh

# The public key
# Owner has read + write permissions.
# Other users can read this file
sudo chmod 644 ${HOME}/.ssh/id_rsa.pub

# The private key
# Owner has read + write permissions
# Nobody else has permissions
sudo chmod 600 ${HOME}/.ssh/id_rsa

# Authorized keys
# Owner has read + write permissions
# Nobody else has permissions
sudo chmod 600 ${HOME}/.ssh/authorized_keys

# User Config
# Owner has read + write permissions
# Nobody else has permissions
sudo chmod 600 ${HOME}/.ssh/config

1.3.3. Key registration

Your key will then have to be registered with the SSH server. This is typically done using the ssh-copy-id.

ssh-copy-id -f -i ~/.ssh/id_rsa [USERNAME]@[IP]

If the machine has a custom port number, the command's syntax is as follows:

ssh-copy-id -f -i ~/.ssh/id_rsa -p [PORT] [USERNAME]@[IP]

1.3.4. Connecting to a machine

To connect to a machine, use the "ssh" command. The command roughly has the following syntax:

ssh -p [PORT] -i [PRIVATE_KEY] [USERNAME]@[IP]
  • [PORT]: Default is 22.
  • [PRVIATE_KEY]: Default is ~/id_rsa
  • [USERNAME]: Default is the current user
  • [IP]: The IP address or host name of the machine

Generally, if everything is default (SSH key, port number), the command would look like:

ssh [USERNAME]@[IP]
Clone this wiki locally