You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Updated intro and warmup section
* Updated worfklow code and final files
* Tweaked the directory names for Config and Modules + relocated solutions
(Renamed both `scripts` and `intermediates` to `solutions` for more clarity
Copy file name to clipboardExpand all lines: docs/hello_nextflow/01_orientation.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ This directory contains all the code files, test data and accessory files you wi
20
20
tree . -L 2
21
21
```
22
22
23
-
You should see the following output:
23
+
You should see the following output:**TODO: UPDATE**
24
24
25
25
```console title="Directory contents"
26
26
/workspace/gitpod/hello-nextflow
@@ -35,7 +35,7 @@ You should see the following output:
35
35
├── hello-nf-test.nf
36
36
├── hello-world.nf
37
37
├── nextflow.config
38
-
└── scripts
38
+
└── solutions
39
39
├── hello-config-1.config
40
40
├── hello-config-2.config
41
41
├── hello-config-3.config
@@ -66,7 +66,7 @@ You should see the following output:
66
66
```
67
67
68
68
**The `data` directory** contains the input data we'll use in Part 3: Hello Genomics, which uses an example from genomics to demonstrate how to build a simple analysis pipeline.
69
-
The data are described in detail in that section of the course.
69
+
The dataset is described in detail in that section of the course.
70
70
71
71
**The file `nextflow.config`** is a configuration file that sets minimal environment properties.
72
72
@@ -77,4 +77,4 @@ In its initial state, it is NOT a functional workflow script.
77
77
78
78
**The remaining `.nf` files** are functional workflow scripts that serve as starting points for the corresponding parts of the course.
79
79
80
-
**The `scripts` directory** contains the completed workflow scripts that result from each step of the course. They are intended to be used as a reference to check your work and troubleshoot any issues. The name and number in the filename correspond to the step of the relevant part of the course. For example, the file `hello-world-4.nf` is the expected result of completing steps 1 through 4 of Part 1: Hello World.
80
+
**The `solutions` directory** contains the completed workflow scripts and other files that you will generate in each part of the course. They are intended to be used as a reference to check your work and troubleshoot any issues. The name and number in the filename correspond to the step of the relevant part of the course. For example, the file `hello-world-4.nf` is the expected result of completing steps 1 through 4 of Part 1: Hello World.
Copy file name to clipboardExpand all lines: docs/hello_nextflow/06_hello_config.md
+24-26Lines changed: 24 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,23 +11,22 @@ So far we've been working with a very loose structure, with just one workflow co
11
11
However, we're now moving into the phase of this training series that is more focused on code development and maintenance practices.
12
12
13
13
As part of that, we're going to adopt a formal project structure.
14
-
We're going to work inside a dedicated project directory called `projectC` (C for configuration), and we've renamed the workflow file `main.nf` to match the recommended Nextflow convention.
14
+
We're going to work inside a dedicated project directory called `hello-config`, and we've renamed the workflow file `main.nf` to match the recommended Nextflow convention.
15
15
16
-
### 0.1. Explore the `projectC` directory
16
+
### 0.1. Explore the `hello-config` directory
17
17
18
-
We want to launch the workflow from inside the `projectC` directory, so let's move into it now.
18
+
We want to launch the workflow from inside the `hello-config` directory, so let's move into it now.
19
19
20
20
```bash
21
-
cdprojectC
21
+
cdhello-config
22
22
```
23
23
24
24
Let's take a look at the contents.
25
25
You can use the file explorer or the terminal; here we're using the output of `tree` to display the top-level directory contents.
26
26
27
27
```console title="Directory contents"
28
-
projectC
28
+
hello-config
29
29
├── demo-params.json
30
-
├── intermediates
31
30
├── main.nf
32
31
└── nextflow.config
33
32
```
@@ -56,14 +55,12 @@ projectC
56
55
- **`demo-params.json`** is a parameter file intended for supplying parameter values to a workflow.
57
56
We will use it in section 5 of this tutorial.
58
57
59
-
- **`intermediates/`** is a directory containing the intermediate forms of the workflow and configuration files for each section of this tutorial.
60
-
61
58
The one thing that's missing is a way to point to the original data without making a copy of it or updating the file paths wherever they're specified.
62
59
The simplest solution is to link to the data location.
63
60
64
61
### 0.2. Create a symbolic link to the data
65
62
66
-
Run this command from inside the `projectC` directory:
63
+
Run this command from inside the `hello-config` directory:
67
64
68
65
```bash
69
66
ln -s ../data data
@@ -72,10 +69,10 @@ ln -s ../data data
72
69
This creates a symbolic link called `data` pointing to the data directory, which allows us to avoid having to change anything to how the file paths are set up.
73
70
74
71
```console title="Directory contents"
75
-
projectC
72
+
hello-config
76
73
├── data -> ../data
77
74
├── demo-params.json
78
-
├── intermediates
75
+
├── solutions
79
76
├── main.nf
80
77
└── nextflow.config
81
78
```
@@ -105,7 +102,7 @@ executor > local (7)
105
102
[ee/2c7855] GATK_JOINTGENOTYPING [100%] 1 of 1 ✔
106
103
```
107
104
108
-
There will now be a `work` directory and a `results_genomics` directory inside your current `projectC` directory.
105
+
There will now be a `work` directory and a `results_genomics` directory inside your `hello-config` directory.
109
106
110
107
### Takeaway
111
108
@@ -145,7 +142,7 @@ Let's see what happens if we run that.
145
142
146
143
### 1.2. Run the workflow without Docker
147
144
148
-
We are now launching the `main.nf` workflow from inside the `projectC` directory.
145
+
We are now launching the `main.nf` workflow from inside the `hello-config` directory.
149
146
150
147
```bash
151
148
nextflow run main.nf
@@ -156,7 +153,7 @@ As expected, the run fails with an error message that looks like this:
@@ -319,7 +316,7 @@ This will take a bit longer than usual the first time, and you might see the con
319
316
[- ] SAMTOOLS_INDEX -
320
317
[- ] GATK_HAPLOTYPECALLER -
321
318
[- ] GATK_JOINTGENOTYPING -
322
-
Creating env using conda: bioconda::samtools=1.20 [cache /workspace/gitpod/hello-nextflow/projectC/work/conda/env-6684ea23d69ceb1742019ff36904f612]
319
+
Creating env using conda: bioconda::samtools=1.20 [cache /workspace/gitpod/hello-nextflow/hello-config/work/conda/env-6684ea23d69ceb1742019ff36904f612]
323
320
```
324
321
325
322
That's because Nextflow has to retrieve the Conda packages and create the environment, which takes a bit of work behind the scenes. The good news is that you don't need to deal with any of it yourself!
@@ -401,7 +398,7 @@ Let's try running the workflow with Conda.
401
398
nextflow run main.nf -profile conda_on
402
399
```
403
400
404
-
It works!
401
+
It works! Convenient, isn't it?
405
402
406
403
```
407
404
N E X T F L O W ~ version 24.02.0-edge
@@ -491,7 +488,7 @@ nextflow
491
488
ERROR ~ Error executing process > 'SAMTOOLS_INDEX (3)'
492
489
493
490
Caused by:
494
-
java.io.IOException: Cannot run program "sbatch" (in directory "/workspace/gitpod/hello-nextflow/projectC/work/eb/2962ce167b3025a41ece6ce6d7efc2"): error=2, No such file or directory
491
+
java.io.IOException: Cannot run program "sbatch" (in directory "/workspace/gitpod/hello-nextflow/hello-config/work/eb/2962ce167b3025a41ece6ce6d7efc2"): error=2, No such file or directory
495
492
496
493
Command executed:
497
494
@@ -500,7 +497,7 @@ Command executed:
500
497
501
498
However, it did produce what we are looking for: the `.command.run` file that Nextflow tried to submit to Slurm via the `sbatch` command.
502
499
503
-
Let's take a look inside. **TODO: UPDATE NEXTFLOW VERSION SO WE CAN HAVE THIS SWEET OUTPUT**
500
+
Let's take a look inside. <!--**TODO: UPDATE NEXTFLOW VERSION SO WE CAN HAVE THIS SWEET OUTPUT**-->
The report is an html file, which you can download and open in your browser.
723
+
726
724
Take a few minutes to look through the report and see if you can identify some opportunities for adjusting resources.
727
725
Make sure to click on the tabs that show the utilization results as a percentage of what was allocated.
728
726
There is some [documentation](https://www.nextflow.io/docs/latest/reports.html) describing all the available features.
729
727
730
-
**TODO: insert images**
728
+
<!--TODO: insert images-->
731
729
732
730
One observation is that the `GATK_JOINTGENOTYPING` seems to be very hungry for CPU, which makes sense since it performs a lot of complex calculations.
733
731
So we could try boosting that and see if it cuts down on runtime.
734
732
735
733
However, we seem to have overshot the mark with the memory allocations; all processes are only using a fraction of what we're giving them.
736
734
We should dial that back down and save some resources.
737
735
738
-
### 4.3. Adjust resource allocations for a specific process
736
+
### 4.4. Adjust resource allocations for a specific process
739
737
740
738
We can specify resource allocations for a given process using the `withName` directive.
741
739
The syntax looks like this when it's by itself in a process block:
@@ -765,7 +763,7 @@ process {
765
763
With that specified, the default settings will apply to all processes **except** the `GATK_JOINTGENOTYPING` process, which is a special snowflake that gets a lot more CPU.
766
764
Hopefully that should have an effect.
767
765
768
-
### 4.4. Run again with the modified configuration
766
+
### 4.5. Run again with the modified configuration
769
767
770
768
Let's run the workflow again with the modified configuration and with the reporting flag turned on, but notice we're giving the report a different name so we can differentiate them.
771
769
@@ -778,7 +776,7 @@ Once again, you probably won't notice a substantial difference in runtime, becau
778
776
However, the second report shows that our resource utilization is more balanced now, and the runtime of the `GATK_JOINTGENOTYPING` process has been cut in half.
779
777
We probably didn't need to go all the way to 8 CPUs, but since there's only one call to that process, it's not a huge drain.
780
778
781
-
**TODO: screenshots?**
779
+
<!--**TODO: screenshots?**-->
782
780
783
781
As you can see, this approach is useful when your processes have different resource requirements. It empowers you to can right-size the resource allocations you set up for each process based on actual data, not guesswork.
784
782
@@ -792,7 +790,7 @@ As you can see, this approach is useful when your processes have different resou
792
790
793
791
That being said, there may be some constraints on what you can (or must) allocate depending on what computing executor and compute infrastructure you're using. For example, your cluster may require you to stay within certain limits that don't apply when you're running elsewhere.
794
792
795
-
### 4.5. Add resource limits to an HPC profile
793
+
### 4.6. Add resource limits to an HPC profile
796
794
797
795
You can use the `resourceLimits` directive to set the relevant limitations. The syntax looks like this when it's by itself in a process block:
798
796
@@ -982,7 +980,7 @@ executor > local (7)
982
980
983
981
However, you may be thinking, well, did we really override the configuration? How would we know, since those were the same files?
984
982
985
-
### 5.6. Remove or generalize default values from `nextflow.config`
983
+
### 5.5. Remove or generalize default values from `nextflow.config`
986
984
987
985
Let's strip out all the file paths from the `params` block in `nextflow.config`, replacing them with `null`, and replace the `cohort_name` value with something more generic.
988
986
@@ -1029,7 +1027,7 @@ This is great because, with the parameter file in hand, we'll now be able to pro
1029
1027
1030
1028
That being said, it was nice to be able to demo the workflow without having to keep track of filenames and such. Let's see if we can use a profile to replicate that behavior.
1031
1029
1032
-
### 5.7. Create a demo profile
1030
+
### 5.6. Create a demo profile
1033
1031
1034
1032
Yes we can! We just need to retrieve the default parameter declarations as they were written in the original workflow (with the `params.*` syntax) and copy them into a new profile that we'll call `demo`.
1035
1033
@@ -1100,7 +1098,7 @@ profiles {
1100
1098
1101
1099
As long as we distribute the data bundle with the workflow code, this will enable anyone to quickly try out the workflow without having to supply their own inputs or pointing to the parameter file.
0 commit comments