Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
194 changes: 193 additions & 1 deletion 06-job-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,200 @@ title: Extra Unix Shell Material
subtitle: Job control
minutes: 5
---

> ## Learning Objectives {.objectives}
>
> * FIX ME

FIX ME
Our next topic is how to control programs *once they're running*. This
is called [job control](glossary.html#job-control). While it's less
important today than it was back in the Dark Ages, it is coming back
into its own as more people begin to leverage the power of computer
networks, as job control allows you to run multiple processes from a single
bash session on a remote server. Job control is also helpful for
running programs on your local computer and trying to deal with run-amok
scripts.

When we talk about controlling programs, what we really mean is
controlling *processes*. A process is a program
that's in memory and executing. Some of the processes on your computer
are yours: they're running programs you explicitly asked for, like your
web browser. Many others belong to the operating system that manages
your computer for you, or, if you're on a shared machine, to other
users. You can use the `ps` command to list them, just as you use `ls`
to list files and directories:

~~~{.input}
$ ps
~~~
~~~{.output}
PID PPID PGID TTY UID STIME COMMAND
2152 1 2152 con 1000 13:19:07 /usr/bin/bash
2276 2152 2276 con 1000 14:53:48 /usr/bin/ps
~~~

Every process has a unique process id (PID). Remember, this is a
property of the process, not of the program that process is executing:
if you are running three instances of your browser at once, each will
have its own process ID. However, it is worth noting that your computer
has a limited stock of PIDs, so over the long run, processes will reuse
PIDs as they become free.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth noting that PIDs fall into a certain range and can "overflow" i.e. the same PID can refer to different processes over the uptime of the OS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note added - may or may not be technically accurate. :P

The second column in this listing, PPID, shows the ID of each process's
parent. Every process on a computer is spawned by another, which is its
parent (except, of course, for the bootstrap process that runs
automatically when the computer starts up).

The third column (labelled PGID) is the ID of the *process group* this
process belongs to. We won't discuss process groups in this lecture, but
they're often used to manage sets of related processes. Column 4 shows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break paragraph before Column 4

the ID of the terminal this process is running in. Once upon a time,
this really would have been a terminal connected to a central timeshared
computer. It isn't as important these days, except that if a process is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could slot in a sentence like

On modern unix systems this terminal is virtual and the process (normally) gets its input from and sends its output and error messages to this terminal.

The following sentence can then be shortened to

If a process is ...

a system service, such as a network monitor, `ps` will display a
question mark for its terminal, since it doesn't actually have one.

Column 5 is more interesting: it's the user ID of the user this process
is being run by. This is the user ID the computer uses when checking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computer -> operating system

permissions: the process is allowed to access exactly the same things as
the user, no more, no less.

Finally, Column 6 shows when the process started running, and Column 7
shows what program the process is executing. Your version of `ps` may
show more or fewer columns, or may show them in a different order, but
the same information is generally available everywhere.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a sentence:

If needed, the list of columns and their order can be customized by passing command line options to ps.


> ## Welcome to your keyboard {.callout}
> In what follows, we will use both `Control-N` and `^N` to indicate
> that you should hold down the `Control`/`ctrl` key on your keyboard
> and then type the indicated letter while holding down Control.

The shell provides several commands for stopping, pausing, and resuming
processes. To see them in action, let's run our `analyze` program on our
latest data files.

~~~{.input}
$ ./analyze results*.dat
~~~
~~~{.output}
Processing results01.dat
Processing line 1 out of 1150
~~~

After a few minutes go by, we realize that this is
going to take a while to finish. Being impatient, we kill the process by
typing Control-C. This stops the currently-executing program right away.
Any results it had calculated, but not written to disk, are lost.

~~~{.input}
Processing line 2 out of 1150
^C
~~~

Let's run that same command again, with an ampersand `&` at the end of
the line to tell the shell we want it to run in the
[background](glossary.html#background):

~~~{.input}
$ ./analyze results*.dat &
~~~

When we do this, the shell launches the program as before. Instead of
leaving our keyboard and screen connected to the program's standard
input and output, though, the shell hangs onto them. This means the
shell can give us a fresh command prompt, and start running other
commands, right away. Here, for example, we're putting some parameters
for the next run of the program in a file:

~~~{.input}
$ cat > params.txt
~~~
~~~{.output}
density: 22.0
viscosity: 0.75
^D
~~~

(Remember, \^D is the shell's way of showing Control-D, which means "end
of input".)

> ## Gotchas with Standard Out {.callout}
>
> If your program prints to stdout on a regular basis, running it in the
> background may be problematic. Even if a program is running in the
> background, it will still print to stdout. You will continue to have
> a fresh prompt to type other commands, but it will be periodically
> interrupted by statements printed by the background process.

We're finished with the other processes we were running in the foreground.
Now let's run the `jobs` command, which tells us what
processes are currently running in the background:

~~~{.input}
$ jobs
~~~
~~~{.output}
[1] ./analyze results01.dat results02.dat results03.dat
~~~

Since we're about to go and get coffee, we might as well use the
foreground command, `fg`, to bring our background job into the
foreground:

~~~{.input}
$ fg
~~~
~~~{.output}
Processing line 5 out of 1150
~~~

When `analyze` finishes running, the shell gives us a fresh prompt as
usual. If we had several jobs running in the background, we could
control which one we brought to the foreground using `fg %1`, `fg %2`,
and so on. The IDs are *not* the process IDs. Instead, they are the job
IDs displayed by the `jobs` command.

The shell gives us one more tool for job control: if a process is
already running in the foreground, Control-Z will pause it and return
control to the shell. We can then use `fg` to resume it in the
foreground, or `bg` to resume it as a background job. For example, let's
run `analyze` again, and then type Control-Z. The shell immediately
tells us that our program has been stopped, and gives us its job number:

~~~{.input}
$ ./analyze results01.dat
~~~
~~~{.output}
Processing results01.dat
Processing line 1 out of 1150
~~~
~~~{.input}
^Z
~~~{.output}
~~~{.output}
[1] Stopped ./analyze results01.dat
~~~

If we type `bg %1`, the shell starts the process running again, but in
the background. We can check that it's running using `jobs`, and kill it
while it's still in the background using `kill` and the job number. This
has the same effect as bringing it to the foreground and then typing
Control-C:

~~~{.input}
$ bg %1
$ jobs
~~~
~~~{.output}
[1] ./analyze results01.dat
~~~
~~~{.input}
$ kill %1
~~~

Job control was important when users only had one terminal window at a
time. It's less important now: if we want to run another program, it's
easy enough to open another window and run it there. However, these
ideas and tools are making a comeback, as they're often the easiest way
to run and control programs on remote computers without having several windows open, each running a different process.

17 changes: 17 additions & 0 deletions data/job-control/analyze
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

for file in $1
do
echo "Processing $file"
# set variables for loop
counter=0
total=`wc -l $file | cut -b -8`
#loop through file lines, doing analysis
while read line
do
counter=$((counter+1))
echo "Processing line $counter out of $total"
echo $line >> $file.analysis
sleep 180
done <$file
done
3 changes: 3 additions & 0 deletions data/job-control/results01.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
10 20 30 40
5 6 7 8
69 43 72 121
150 changes: 150 additions & 0 deletions data/job-control/results02.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
TACATACGAA
AAAGGTTCAG
GCCCGTTAGT
ACTCTTCTGG
TGCTCGTCAA
ATGGGATGCT
TCTTCGCTTG
GTAAGGGAGT
TAGAATAAGA
AAGGTGGCTT
TTATGACTTT
AAAACCAATA
ACATAACAAA
GCGACGCTTA
TCCCCGATCT
CCGATTGGGA
TTATCTCAGG
TCAGGTTAGC
GTTTAATGGA
AGGGCACTCT
TTTAGGGGCG
TAGTCGCCGA
GGATGATAGT
CCCCAGGTTA
CCCCTGGGGT
CTGACCGCAC
TGTGGCGCTC
CAACGTGCCT
TCCTCCGCAA
CTACAGTTAG
CAGGCACTAC
AAGCTTATAG
TGCCTAAGCC
TATCCCGTCG
TTTGGGTAGG
GCTACATGGA
ACACTCCGCG
ATGCAGCTCA
CACCTATCAT
TCGGGAGCTA
TGGCTAGTGT
GCGGTGAACG
CTGCTCTGAG
TTTTGAGGAC
CTCGTCGTCT
TATTTCATCC
GTCTTATGGT
ATAATCGATA
TGAATCAAGT
TGAGGAACCT
GCTCGAGCCG
AGATTGGCCA
AGAAGTTGGT
AGTGGTGTCG
ATACGCCGGT
CGCCCTAAGA
AGTTGGTAGC
ATGGAATTGT
AAGGTGGGGC
TCTGACGAAC
GACACGACTT
TAGTCAGCCG
AACTGTCATA
TTCTCGAATA
GCTCAAGCCG
TCACGTTGGC
CATACTGGTC
TTGAGACAAT
TTTGAACAGC
AGGTAGGACG
TAAAGGTCCT
AGAGGTAATA
TACCCAGGAT
AGCATTCCCT
ATATTCAGAG
TCATATGAAA
AGAAGTACAC
GTCCGAACAG
CTGCCGCTCT
GATACCCACT
CCAGTTGTAT
CTCCTTTTTT
TGGTAAGGCT
TCGGCTGGTG
AGCTCAATTA
CGATGGGTGA
CGGATGCCTT
ACTAACAATC
CCAAAGATGT
GTAAACATCA
GCAAGTAGCC
TCTAGTTATA
GCTTAATCAG
TGGTGCGCAA
CTATGGTACG
TTTCATTGGG
GCTAGCTGCC
GTCCTCGTAC
ACCAACGCTG
CATCAACCCT
ATAAAAGAGT
GTTGAGTGTC
GACGTAGGGA
TATGCAGATA
TGTGAAGAAG
CGAGGCGACC
GAAAGAAGTC
ATAGTCAAGC
ATCAGTGGGG
AAACCTGTGG
CCACTGCAAC
CCCTACTGCA
TGGTCTCCAA
CGTTTCGACG
AGTCTCGATT
TATTCGGAAA
TTATGTTCCT
CGCTCTGTCA
GAAGTCAGAT
GTGTCATTTT
TTTACACGAC
CTATGTGATT
GGTAAGCCGT
GTGCCTGCCG
CAATGGATCA
AGCGGGCAAC
GGTAAACGCT
GTCGCGAGGG
CGGCGCTCGA
GTCGCGACAA
TCACCGGTCA
AAGGAGCATA
GACACCTTTC
GTTCGATACC
GACCCCATGG
CTCGTGCGCG
TAATCCGTTC
GTGGGTTTGT
TATGCCATTC
TGCACCGGTA
GAGTTCCCAC
GGACGCGAAA
TAATTAGACC
ACAAGGTCAA
GTCCGCATAA
ATTCGAACGC
CGGTTGGACC
GAGGGCGTCG
AGGGCGTCGA
CTAGAAATCT
1 change: 1 addition & 0 deletions data/job-control/results03.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Data