Statistics 506, Fall 2016

Linux shell skills


If you are using a Mac or Linux machine you already have access to a linux terminal, it is a program installed on your computer. You can use this shell to practice, or you can connect to a U-M Linux server using ssh:

ssh uniqname@login.itd.umich.edu

ssh uniqname@scs.dsc.umich.edu

ssh uniqname@mario.dsc.umich.edu

ssh uniqname@luigi.dsc.umich.edu

where uniqname is your U-M login id. The two machines named “mario” and “luigi” are identical. You can connect to either machine directly, or connect to scs.dsc.umich.edu and you will be routed to whichever machine has lower load.

To be able to connect to these machines you need an AFS home directory. If you do not have one already, you can generate your own AFS home directory by visiting mfile.umich.edu (follow the link to the “AFS self-provisioning tool”).

If you are using Windows you should download a terminal client, most people use putty (for now you only need the putty.exe program).

Filesystems

To use a Linux/Unix operating system effectively, perhaps the most important thing to understand is the file system. In Linux, essentially everything is a file. Executable programs, system libraries and data files, network sockets and devices, as well as all of your own documents, data, and configuration information are all files.

The files are organized hierarchically. At the base of everything is the root directory “/”. Every directory can contain files and directories, and those directories contain their own files and directories, and so on. Most of the time you will be working in your home directory. Typing cd will move you to the top level of your home directory. Typing echo $HOME will display the path to your home directory.

Since every program is a file, you can find the file containing the executable code for a given program using the which command. For example, which ls gives you the path to the ls program.

Networked file systems

On a networked computer, some of the directories may be part of a networked file system, meaning that the directory contents reside on a remote server. When you modify the contents of files in a networked directory, the changes are automatically synchronized with the remote server. For example, the file /afs/umich.edu/user/kshedden/file.txt is the same file regardless of which computer I are working on, since the /afs directory is mounted as a networked file system. However the file /bin/ls is a different file on every machine.

Most of the time you can work with files on a networked file system the same way you would work with local files. At U-M we use the AFS (Andrew File System). One thing to be aware of is that after a period of time (around one day) your “ticket” for reading and writing remote files will expire. Every time you connect to a new login shell the tickets are refreshed, but if you remain logged in for more than a day you will need to renew your AFS tickets as follows:

kinit -l25h
aklog

If you have a long running job (i.e. more than 24 hours), you generally should not write output to a file in an AFS directory because your program will not be able to write to the file system after the tickets have expired. This could lead to your program exiting with an I/O error before completing. Most Linux machines have a /var/tmp or /data diretory where you can write files to the local file system. Then when the job is complete you can copy the results to your AFS space.

Here are a few other useful AFS commands

  • fs lq : list your AFS quota and usage

  • fs la <filename> : displays the access information for a file

  • fs sa <file or directory> <group or indiviual> <permission> : change access properties for a file

Shells

A shell is a user interface, usually in the form of a command line interface (CLI). In the CLI, all the interactions with the machine are in text, the mouse can be used to paste between windows but has no role in navigating within a window.

Several different shells are available on modern Linux systems. The most commonly used shell is probably the “Bash” shell. To determine which shell you are using, type echo $SHELL at the command prompt.

Most of the “commands” that you use while working in the shell (ls, grep, etc.) are not part of the shell itself, they are separate programs that are invoked from the shell.

Some key features that the shell provides are:

  • Program execution: Type the name of any program or command at the shell prompt to execute that program. The file containing the program code must be in your “PATH” environment variable. To see the state of your PATH variable in the Bash shell, use echo $PATH. If you want to run a program in the background (so that the shell prompt returns immediately), use “&” at the end of the program name.

  • Filename wildcards: * matches any sequence of characters and ? matches any single character.

  • Pipes: The pipe symbol | sends the output of one command to another command. For example, consider:

    ls -la | grep 1925

    Working from left to right, the ls -la command lists all files in the current directory, and the | takes the output of this command and uses it as the input for the grep command. Then grep takes its input and retains only the lines which match the string “1925”, those lines are printed to the terminal. The -la options to ls request “long format” output (more detailed information about each file), and requests information about “all” files (otherwise so-called “dot files” whose file name begins with . would not be listed).

  • Redirection: The > symbol redirects output from the terminal to the named file. For example,

    ls -la > out

    lists all the files in the current working directory, and places this listing into the text file named “out”. Specifically, > redirects the “standard output” stream. If you also want to redirect error messages (which are usually sent to the “standard error” stream), then you would use “>&“.

  • Command line history: Different shells handle this in slightly different ways. In Bash, you can use the up/down arrow keys (or ctrl-p and ctrl-n) to cycle through the past commands you have run. You can use the “!” operator to execute a single command matching an initial string, for example !ls runs the last command you entered that began with “ls”. Some shells provide tab completion so you can type in the first few characters of a file or a program name and then press “tab”, then the name will be completed for you.

  • Environment variables: These are variables that determine certain aspects of the operating system’s behavior. Besides PATH and SHELL, other useful environment variables are HOME, and LD_LIBRARY_PATH. To extract the value of an environment variable in Bash, prepend it with $, e.g. $HOME gives you the full path of your home directory.

Resources

Discussion about the difference between a “shell” and a “terminal”.

The Bash reference manual

Process control

A Linux server runs many processes (sometimes called “jobs”). A “foreground process” is a process that you have launched from your current terminal, and that is currently blocking the terminal prompt (so you cannot use the shell while the process is running). A “background process” is running without blocking the terminal prompt. Here are some ways you can control how jobs are running on your machine:

  • If you type the name of an executable program at the shell prompt, it will begin running.

  • You can kill a process by typing ctrl-c.

  • You can suspend a running process by typing ctrl-z.

  • Typing jobs will produce a listing of your current jobs, along with their “job numbers”.

  • Type bg followed by the job number to restart a suspended job in the background.

  • Type fg followed by the job number to move any job that is running in the background to the foreground.

  • Type ps to get the process numbers for all of your processes.

  • Type kill -9 followed by the process number to kill a running or suspended process (the -9 is a signal required to kill a job).

  • Type pkill -9 followed by a string of text to kill all processes whose names contain the text.

  • Type top to see a continuously-updated listing of the processes running on the machine to which you are connected.

Some standard Linux tools

These are all programs that you invoke from the command line.

  • cd (change directory): cd .. move up one level

  • ls (list directory contents)

  • pwd (print working directory path)

  • mkdir (create a directory)

  • rm (remove a file)

  • rmdir (remove a directory)

  • mv (move a file)

  • find (find a file)

  • grep (search the contents of a text file)

  • cat send file contents to standard output (the screen), options to translate special characters

  • wget (download a file using its URL)

  • ps (print jobs)

  • top (display currently running jobs)

  • fg (move a background job to foreground)

  • bg (move a foreground job to background)

  • kill (kill a job)

  • pkill (kill a job matching a pattern)

  • which (return the path for an executable command)

  • scp (copy files or directories from one machine to another)

  • logout and exit (exits the current shell)

Resources

GNU Coreutils documentation

Editors

To edit files in the shell you will need to use a text editor. A basic editor that is easy to learn is nano. There are quite a few choices for more advanced editors; many people use vim or emacs. There are many tutorials for text editors on-line so we won’t provide another one here. If you want to learn vim you can type vimtutor into the shell to launch a tutorial program. There are many emacs tutorials, here is one that seems quite useful.

Terminal multiplexers

A terminal multiplexer is a software program that allows you to run multiple shell sessions under the same terminal connection, and allows you to keep a shell session running after the terminal connection has ended.

There are two main terminal multiplexers used on Linux systems: screen has been around for many years and is available on most linux servers, tmux is newer and is becoming very popular. Both programs are available on the U-M scs servers.

If you use tmux to keep a session alive on one of the scs machines, you will later want to reconnect to the same machine to continue working. Since the gateway scs.dsc.umich.edu is actually a pool of two machines (“mario” and “luigi”), it is not convenient to use tmux for logins to scs.dsc.umich.edu (since when you reconnect, you may be sent to the other machine). If you plan to use tmux with the scs machine pool, you should choose one of the two machines and connect to it directly by name.

Resources

A tmux tutorial

A video tmux tutorial