hpcugent · EwDa291 · Aug 8, 2024 · Aug 8, 2024 · Aug 8, 2024 · Aug 9, 2024
diff --git a/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_1.txt b/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_1.txt
@@ -0,0 +1,61 @@
+Beyond the basics
+Now that you've seen some of the more basic commands, let's take a look
+at some of the deeper concepts and commands.
+Input/output
+To redirect output to files, you can use the redirection operators: >,
+>>, &>, and <.
+First, it's important to make a distinction between two different output
+channels:
+1.  stdout: standard output channel, for regular output
+2.  stderr: standard error channel, for errors and warnings
+Redirecting stdout
+> writes the (stdout) output of a command to a file and overwrites
+whatever was in the file before.
+$ echo hello > somefile
+$ cat somefile
+hello
+$ echo hello2 > somefile
+$ cat somefile
+hello2
+>> appends the (stdout) output of a command to a file; it does not
+clobber whatever was in the file before:
+$ echo hello > somefile
+$ cat somefile 
+hello
+$ echo hello2 >> somefile
+$ cat somefile
+hello
+hello2
+Reading from stdin
+< reads a file from standard input (piped or typed input). So you
+would use this to simulate typing into a terminal. < somefile.txt is
+largely equivalent to cat somefile.txt | .
+One common use might be to take the results of a long-running
+command and store the results in a file, so you don't have to repeat it
+while you refine your command line. For example, if you have a large
+directory structure you might save a list of all the files you're
+interested in and then reading in the file list when you are done:
+$ find . -name .txt > files
+$ xargs grep banana < files
+Redirecting stderr
+To redirect the stderr output (warnings, messages), you can use 2>,
+just like >
+$ ls one.txt nosuchfile.txt 2> errors.txt
+one.txt
+$ cat errors.txt
+ls: nosuchfile.txt: No such file or directory
+Combining stdout and stderr
+To combine both output channels (stdout and stderr) and redirect
+them to a single file, you can use &>
+$ ls one.txt nosuchfile.txt &> ls.out
+$ cat ls.out
+ls: nosuchfile.txt: No such file or directory
+one.txt
+Command piping
+Part of the power of the command line is to string multiple commands
+together to create useful results. The core of these is the pipe: |.
+For example, to see the number of files in a directory, we can pipe the
+(stdout) output of ls to wc (word count, but can also be used to
+count the number of lines with the -l flag).
+$ ls | wc -l
+    42
diff --git a/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_1_metadata.json b/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_1_metadata.json
@@ -0,0 +1,12 @@
+{
+    "main_title": "beyond_the_basics",
+    "subtitle": "Command-piping",
+    "source_file": "../../mkdocs/docs/HPC/linux-tutorial/beyond_the_basics.md",
+    "title_depth": 2,
+    "directory": "beyond_the_basics",
+    "parent_title": "",
+    "previous_title": null,
+    "next_title": "beyond_the_basics_paragraph_2",
+    "OS": "generic",
+    "reference_link": "https://docs.hpc.ugent.be/linux-tutorial/beyond_the_basics/#command-piping"
+}
diff --git a/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_2.txt b/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_2.txt
@@ -0,0 +1,48 @@
+A common pattern is to pipe the output of a command to less so you
+can examine or search the output:
+$ find . | less
+Or to look through your command history:
+$ history | less
+You can put multiple pipes in the same line. For example, which cp
+commands have we run?
+$ history | grep cp | less
+Shell expansion
+The shell will expand certain things, including:
+1.   wildcard: for example ls ttxt will list all files starting
+    with 't' and ending in 'txt'
+2.  tab completion: hit the <tab> key to make the shell complete your
+    command line; works for completing file names, command names, etc.
+3.  $... or ${...}: environment variables will be replaced with
+    their value; example: echo "I am $USER" or echo "I am ${USER}"
+4.  square brackets can be used to list a number of options for a
+    particular characters; 
+example: ls *.[oe][0-9].
+ This will list all
+    files starting with whatever characters (*), then a dot (.),
+    then either an 'o' or an 'e' ([oe]), then a character from '0' to
+    '9' (so any digit) ([0-9]). So this filename will match:
+    anything.o5, but this one won't: anything.o52.
+Process information
+ps and pstree
+ps lists processes running. By default, it will only show you the
+processes running in the local shell. To see all of your processes
+running on the system, use:
+$ ps -fu $USER
+To see all the processes:
+$ ps -elf
+To see all the processes in a forest view, use:
+$ ps auxf
+The last two will spit out a lot of data, so get in the habit of piping
+it to less.
+pstree is another way to dump a tree/forest view. It looks better than
+ps auxf but it has much less information so its value is limited.
+pgrep will find all the processes where the name matches the pattern
+and print the process IDs (PID). This is used in piping the processes
+together as we will see in the next section.
+kill
+ps isn't very useful unless you can manipulate the processes. We do
+this using the kill command. Kill will send a message
+(SIGINT) to
+the process to ask it to stop.
+$ kill 1234
+$ kill $(pgrep misbehaving_process)
diff --git a/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_2_metadata.json b/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_2_metadata.json
@@ -0,0 +1,15 @@
+{
+    "main_title": "beyond_the_basics",
+    "subtitle": "`kill`",
+    "source_file": "../../mkdocs/docs/HPC/linux-tutorial/beyond_the_basics.md",
+    "title_depth": 3,
+    "directory": "beyond_the_basics",
+    "links": {
+        "0": "https://en.wikipedia.org/wiki/Unix_signal#POSIX_signals"
+    },
+    "parent_title": "",
+    "previous_title": "beyond_the_basics_paragraph_1",
+    "next_title": "beyond_the_basics_paragraph_3",
+    "OS": "generic",
+    "reference_link": "https://docs.hpc.ugent.be/linux-tutorial/beyond_the_basics/#kill"
+}
diff --git a/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_3.txt b/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_3.txt
@@ -0,0 +1,48 @@
+Usually, this ends the process, giving it the opportunity to flush data
+to files, etc. However, if the process ignored your signal, you can send
+it a different message
+(SIGKILL)
+which the OS will use to unceremoniously terminate the process:
+$ kill -9 1234
+top
+top is a tool to see the current status of the system. You've probably
+used something similar in Task Manager on Windows or Activity Monitor in
+macOS. top will update every second and has a few interesting
+commands.
+To see only your processes, type u and your username after starting
+top, (you can also do this with 
+top -u $USER
+). The default is to
+sort the display by %CPU. To change the sort order, use < and >
+like arrow keys.
+There are a lot of configuration options in top, but if you're
+interested in seeing a nicer view, you can run htop instead. Be aware
+that it's not installed everywhere, while top is.
+To exit top, use q (for 'quit').
+For more information, see [Brendan Gregg's excellent site dedicated to
+performance analysis](http://brendangregg.com).
+ulimit
+ulimit is a utility to get or set user limits on the machine. For
+example, you may be limited to a certain number of processes. To see all
+the limits that have been set, use:
+$ ulimit -a
+Counting: wc
+To count the number of lines, words, and characters (or bytes) in a file, use wc (word count):
+$ wc example.txt
+      90     468     3189   example.txt
+The output indicates that the file named example.txt contains 90
+lines, 468 words, and 3189 characters/bytes.
+To only count the number of lines, use wc -l:
+$ wc -l example.txt
+      90    example.txt
+Searching file contents: grep
+grep is an important command. It was originally an abbreviation for
+"globally search a regular expression and print" but it's entered the
+common computing lexicon and people use 'grep' to mean searching for
+anything. To use grep, you give a pattern and a list of files.
+$ grep banana fruit.txt
+$ grep banana fruit_bowl1.txt fruit_bowl2.txt
+$ grep banana fruit*txt
+grep also lets you search for [Regular
+Expressions](https://en.wikipedia.org/wiki/Regular_expression), but
+these are not in scope for this introductory text.
diff --git a/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_3_metadata.json b/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_3_metadata.json
@@ -0,0 +1,15 @@
+{
+    "main_title": "beyond_the_basics",
+    "subtitle": "Searching-file-contents-`grep`",
+    "source_file": "../../mkdocs/docs/HPC/linux-tutorial/beyond_the_basics.md",
+    "title_depth": 2,
+    "directory": "beyond_the_basics",
+    "links": {
+        "0": "https://en.wikipedia.org/wiki/Unix_signal#POSIX_signals"
+    },
+    "parent_title": "",
+    "previous_title": "beyond_the_basics_paragraph_2",
+    "next_title": "beyond_the_basics_paragraph_4",
+    "OS": "generic",
+    "reference_link": "https://docs.hpc.ugent.be/linux-tutorial/beyond_the_basics/#searching-file-contents-grep"
+}
diff --git a/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_4.txt b/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_4.txt
@@ -0,0 +1,39 @@
+cut
+cut is used to pull fields out of files or pipes streams. It's a
+useful glue when you mix it with grep because grep can find the
+lines where a string occurs and cut can pull out a particular field.
+For example, to pull the first column (-f 1, the first field) from (an
+unquoted) CSV (comma-separated values, so -d ',': delimited by ,)
+file, you can use the following:
+$ cut -f 1 -d ',' mydata.csv
+sed
+sed is the stream editor. It is used to replace text in a file or
+piped stream. In this way, it works like grep, but instead of just
+searching, it can also edit files. This is like "Search and Replace" in
+a text editor. sed has a lot of features, but almost everyone uses the
+extremely basic version of string replacement:
+$ sed 's/oldtext/newtext/g' myfile.txt
+By default, sed will just print the results. If you want to edit the
+file inplace, use -i, but be very careful that the results will be
+what you want before you go around destroying your data!
+awk
+awk is a basic language that builds on sed to do much more advanced
+stream editing. Going in depth is far out of scope of this tutorial, but
+there are two examples that are worth knowing.
+First, cut is very limited in pulling fields apart based on
+whitespace. For example, if you have padded fields then
+cut -f 4 -d ' ' will almost certainly give you a headache as there
+might be an uncertain number of spaces between each field. awk does
+better whitespace splitting. So, pulling out the fourth field in a
+whitespace delimited file is as follows:
+$ awk '{print $4}' mydata.dat
+You can use -F ':' to change the delimiter (F for field separator).
+The next example is used to sum numbers from a field:
+$ awk -F ',' '{sum += $1} END {print sum}' mydata.csv
+Basic Shell Scripting
+The basic premise of a script is to execute automate the execution of
+multiple commands. If you find yourself repeating the same commands over
+and over again, you should consider writing one script to do the same. A
+script is nothing special, it is just a text file like any other. Any
+commands you put in there will be executed from the top to bottom.
+However, there are some rules you need to abide by.
diff --git a/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_4_metadata.json b/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_4_metadata.json
@@ -0,0 +1,12 @@
+{
+    "main_title": "beyond_the_basics",
+    "subtitle": "Basic-Shell-Scripting",
+    "source_file": "../../mkdocs/docs/HPC/linux-tutorial/beyond_the_basics.md",
+    "title_depth": 2,
+    "directory": "beyond_the_basics",
+    "parent_title": "",
+    "previous_title": "beyond_the_basics_paragraph_3",
+    "next_title": "beyond_the_basics_paragraph_5",
+    "OS": "generic",
+    "reference_link": "https://docs.hpc.ugent.be/linux-tutorial/beyond_the_basics/#basic-shell-scripting"
+}
diff --git a/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_5.txt b/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_5.txt
@@ -0,0 +1,59 @@
+Here is a very detailed guide should you
+need more information.
+Shebang
+The first line of the script is the so-called shebang (# is sometimes
+called hash and ! is sometimes called bang). This line tells the shell
+which command should execute the script. In most cases, this will
+simply be the shell itself. The line itself looks a bit weird, but you
+can copy-paste this line as you need not worry about it further. It is
+however very important this is the very first line of the script! These
+are all valid shebangs, but you should only use one of them:
+#!/bin/sh
+#!/bin/bash
+#!/usr/bin/env bash
+Conditionals
+Sometimes you only want certain commands to be executed when a certain
+condition is met. For example, only move files to a directory if that
+directory exists. The syntax:
+if [ -d directory ] && [ -f file ]
+then 
+  mv file directory 
+fi
+Or you only want to do something if a file exists:
+if [ -f filename ] 
+then 
+  echo "it exists" 
+fi
+Or only if a certain variable is bigger than one:
+if [ $AMOUNT -gt 1 ]
+then
+  echo "More than one"
+  # more commands
+fi
+Several pitfalls exist with this syntax. You need spaces surrounding the
+brackets, the then needs to be at the beginning of a line. It is best to just
+copy this example and modify it.
+In the initial example, we used -d to test if a directory existed.
+There are several more checks.
+Another useful example, is to test if a variable contains a value (so it's
+not empty):
+if [ -z $PBS_ARRAYID ]
+then
+  echo "Not an array job, quitting."
+  exit 1
+fi
+the -z will check if the length of the variable's value is greater than zero.
+Loops
+Are you copy-pasting commands? Are you doing the same thing with just different
+options? You most likely can simplify your script by using a loop.
+Let's look at a simple example:
+for i in 1 2 3
+do
+  echo $i
+done
+Subcommands
+Subcommands are used all the time in shell scripts. What they
+do is storing the output of a command in a variable. So this can later
+be used in a conditional or a loop for example.
+CURRENTDIR=`pwd`  # using backticks
+CURRENTDIR=$(pwd)  # recommended (easier to type)
diff --git a/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_5_metadata.json b/...rocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_5_metadata.json
@@ -0,0 +1,16 @@
+{
+    "main_title": "beyond_the_basics",
+    "subtitle": "Subcommands",
+    "source_file": "../../mkdocs/docs/HPC/linux-tutorial/beyond_the_basics.md",
+    "title_depth": 3,
+    "directory": "beyond_the_basics",
+    "links": {
+        "0": "http://www.tldp.org/LDP/Bash-Beginners-Guide/html/",
+        "1": "http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_07_01.html"
+    },
+    "parent_title": "",
+    "previous_title": "beyond_the_basics_paragraph_4",
+    "next_title": "beyond_the_basics_paragraph_6",
+    "OS": "generic",
+    "reference_link": "https://docs.hpc.ugent.be/linux-tutorial/beyond_the_basics/#subcommands"
+}
diff --git a/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_6.txt b/...atbot_preprocessor/parsed_mds/generic/beyond_the_basics/beyond_the_basics_paragraph_6.txt
@@ -0,0 +1,53 @@
+In the above example you can see the 2 different methods of using a
+subcommand. pwd will output the current working directory, and its
+output will be stored in the CURRENTDIR variable. The recommended way to
+use subcommands is with the $() syntax.
+Errors
+Sometimes some things go wrong and a command or script you ran causes an
+error. How do you properly deal with these situations?
+Firstly a useful thing to know for debugging and testing is that you can
+run any command like this:
+command 2>&1 output.log   # one single output file, both output and errors
+If you add 2>&1 output.log at the end of any command, it will combine
+stdout and stderr, outputting it into a single file named
+output.log.
+If you want regular and error output separated you can use:
+command > output.log 2> output.err  # errors in a separate file
+this will write regular output to output.log and error output to
+output.err.
+You can then look for the errors with less or search for specific text
+with grep.
+In scripts, you can use:
+set -e
+This will tell the shell to stop executing any subsequent commands when
+a single command in the script fails. This is most convenient as most
+likely this causes the rest of the script to fail as well.
+Advanced error checking
+Sometimes you want to control all the error checking yourself, this is
+also possible. Everytime you run a command, a special variable $? is
+used to denote successful completion of the command. A value other than
+zero signifies something went wrong. So an example use case:
+command_with_possible_error
+exit_code=$?  # capture exit code of last command
+if [ $exit_code -ne 0 ]
+then
+  echo "something went wrong"
+fi
+.bashrc login script
+[//]: # (sec:bashrc-login-script)
+If you have certain commands executed every time you log in (which
+includes every time a job starts), you can add them to your
+$HOME/.bashrc file. This file is a shell script that gets executed
+every time you log in.
+Examples include:
+-   modifying your $PS1 (to tweak your shell prompt)
+-   printing information about the current/jobs environment (echoing
+    environment variables, etc.)
+-   selecting a specific cluster to run on with
+    module swap cluster/...
+Some recommendations:
+-   Avoid using module load statements in your $HOME/.bashrc file
+-   Don't directly edit your .bashrc file: if there's an error in your
+    .bashrc file, you might not be able to log in again. To
+    prevent that, use another file to test your changes, then copy them
+    over when you tested the script.