Skip to content
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#96](https://github.com/nf-core/seqinspector/pull/96) Added missing citations to citation tool
- [#103](https://github.com/nf-core/seqinspector/pull/103) Configure full-tests
- [#110](https://github.com/nf-core/seqinspector/pull/110) Update input schema to accept either tar file or directory as rundir, and fastq messages and patterns.
- [#127] (https://github.com/nf-core/seqinspector/pull/127) Added alignment tools - bwamem2 - index and mem
- [#128] (https://github.com/nf-core/seqinspector/pull/128) Added Picard tools - Collect Multiple Mterics to collect QC metrics

### `Fixed`

Expand Down
10 changes: 10 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,16 @@

- [Seqtk](https://github.com/lh3/seqtk)

- [BWAMEM2](https://ieeexplore.ieee.org/abstract/document/8820962)

> Vasimuddin Md, Misra S, Li H, Aluru S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE; 2019:314-324. doi:10.1109/IPDPS.2019.00041

- [SAMTOOLS](https://academic.oup.com/bioinformatics/article/25/16/2078/204688)

> Danecek P, Bonfield JK, Liddle J, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(2):giab008. doi:10.1093/gigascience/giab008

- [Picard Tools](https://broadinstitute.github.io/picard/)

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,11 @@

1. Subsample reads ([`Seqtk`](https://github.com/lh3/seqtk))
2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
3. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
3. Align reads to reference with ([`Bwamem2`](https://github.com/bwa-mem2/bwa-mem2))
4. Index aligned BAM files ([`SAMtools`](http://github.com/samtools))
5. Create FASTA index ([`SAMtools`](http://github.com/samtools))
6. Collect multiple QC metrics ([`Picard CollectMultipleMetrics`](https://broadinstitute.github.io/picard/picard-metric-definitions.html))
7. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))

## Usage

Expand Down
8 changes: 4 additions & 4 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: SEQTK_SAMPLE {
withName: 'SEQTK_SAMPLE' {
ext.args = '-s100'
}

withName: FASTQC {
withName: 'FASTQC' {
ext.args = '--quiet'
}

Expand All @@ -36,21 +36,21 @@ process {
}

withName: 'BWAMEM2_INDEX' {
ext.args = ''
publishDir = [
path: { "${params.outdir}/bwamem2_index" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'BWAMEM2_MEM' {
ext.args = ''
publishDir = [
path: { "${params.outdir}/bwamem2_mem" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'PICARD_COLLECTMULTIPLEMETRICS' {
ext.args = ''
publishDir = [
Expand Down
72 changes: 72 additions & 0 deletions conf/seqinspector_v1.0_stage_resources.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
process {


// TODO nf-core: Check the defaults for all processes
cpus = { 1 * task.attempt }
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }

errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
maxRetries = 2
maxErrors = '-1'


// Subsampling reads
withName: 'SEQTK_SAMPLE' {
cpus = 6
memory = { 4.GB * task.attempt }
time = { 4.h * task.attempt }
}

// Quality control
withName: 'FASTQC' {
cpus = 2
memory = { 36.GB * task.attempt }
time = { 8.h * task.attempt }
}
withName: 'SEQFU_STATS' {
cpus = 6
memory = { 4.GB * task.attempt }
time = { 4.h * task.attempt }
}
withName: 'FASTQSCREEN_FASTQSCREEN' {
cpus = 6
memory = { 36.GB * task.attempt }
time = { 8.h * task.attempt }
}

// Reference genome processing
withName: 'BWAMEM2_INDEX' {
cpus = 12
memory = { 72.GB * task.attempt }
time = { 16.h * task.attempt }
}
withName: 'BWAMEM2_MEM' {
cpus = 6
memory = { 35.GB * task.attempt }
time = { 16.h * task.attempt }
}
withName: 'SAMTOOLS_INDEX' {
cpus = 6
memory = { 4.GB * task.attempt }
time = { 4.h * task.attempt }
}
withName: 'SAMTOOLS_FAIDX' {
cpus = 6
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }
}
// Picard metrics
withName: 'PICARD_COLLECTMULTIPLEMETRICS' {
cpus = 4
memory = { 8.GB * task.attempt }
time = { 4.h * task.attempt }
}

// MultiQC aggregation
withName: 'MULTIQC_GLOBAL|MULTIQC_PER_TAG' {
cpus = 4
memory = { 4.GB * task.attempt }
time = { 4.h * task.attempt }
}
}
10 changes: 8 additions & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@
----------------------------------------------------------------------------------------
*/

process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: '1.h'
]
}

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
Expand All @@ -23,5 +31,3 @@ params {
// Genome references
genome = 'R64-1-1'
}


56 changes: 31 additions & 25 deletions main.nf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so for me bwa and other indexes should be fetch using getGenomeAttribute at this level from the igenomes.config file around L17

Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,24 @@
Website: https://nf-co.re/seqinspector
Slack : https://nfcore.slack.com/channels/seqinspector
----------------------------------------------------------------------------------------
*/

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS
GENOME PARAMETER VALUES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

include { SEQINSPECTOR } from './workflows/seqinspector'
include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_seqinspector_pipeline'
include { getGenomeAttribute } from './subworkflows/local/utils_nfcore_seqinspector_pipeline'
include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_seqinspector_pipeline'
params.fasta = getGenomeAttribute('fasta')

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GENOME PARAMETER VALUES
IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

// TODO nf-core: Remove this line if you don't need a FASTA file
// This is an example of how to use getGenomeAttribute() to fetch parameters
// from igenomes.config using `--genome`
// params.fasta = getGenomeAttribute('fasta')
include { SEQINSPECTOR } from './workflows/seqinspector'
include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_seqinspector_pipeline'
include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_seqinspector_pipeline'

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -41,7 +36,6 @@ include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_seqi
// WORKFLOW: Run main analysis pipeline depending on type of input
//
workflow NFCORE_SEQINSPECTOR {

take:
samplesheet // channel: samplesheet read in from --input

Expand All @@ -51,13 +45,14 @@ workflow NFCORE_SEQINSPECTOR {
// WORKFLOW: Run pipeline
//

SEQINSPECTOR (
samplesheet
SEQINSPECTOR(
samplesheet,
params.fasta,
)
emit:
global_report = SEQINSPECTOR.out.global_report // channel: /path/to/multiqc_report.html
grouped_reports = SEQINSPECTOR.out.grouped_reports // channel: /path/to/multiqc_report.html

emit:
global_report = SEQINSPECTOR.out.global_report // channel: /path/to/multiqc_report.html
grouped_reports = SEQINSPECTOR.out.grouped_reports // channel: /path/to/multiqc_report.html
}
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -67,32 +62,30 @@ workflow NFCORE_SEQINSPECTOR {

workflow {

main:


//
// SUBWORKFLOW: Run initialisation tasks
//

PIPELINE_INITIALISATION (
PIPELINE_INITIALISATION(
params.version,
params.validate_params,
params.monochrome_logs,
args,
params.outdir,
params.input
params.input,
)

//
// WORKFLOW: Run main workflow
//
NFCORE_SEQINSPECTOR (
PIPELINE_INITIALISATION.out.samplesheet,
NFCORE_SEQINSPECTOR(
PIPELINE_INITIALISATION.out.samplesheet
)
//
// SUBWORKFLOW: Run completion tasks
//
PIPELINE_COMPLETION (
PIPELINE_COMPLETION(
params.email,
params.email_on_fail,
params.plaintext_email,
Expand All @@ -105,6 +98,19 @@ workflow {

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
THE END
FUNCTIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

//
// Get attribute from genome config file e.g. fasta
//

def getGenomeAttribute(attribute) {
if (params.genomes && params.genome && params.genomes.containsKey(params.genome)) {
if (params.genomes[params.genome].containsKey(attribute)) {
return params.genomes[params.genome][attribute]
}
}
return null
}
15 changes: 0 additions & 15 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,21 +42,6 @@
"git_sha": "7b50cb7be890e4b28cffb82e438cc6a8d7805d3f",
"installed_by": ["modules"]
},
"picard/collectmultiplemetrics": {
"branch": "master",
"git_sha": "df124e87c74d8b40285199f8cc20151f5aa57255",
"installed_by": ["modules"]
},
"samtools/faidx": {
"branch": "master",
"git_sha": "c8be52dba1166c678e74cda9c3a3c221635c8bb1",
"installed_by": ["modules"]
},
"samtools/index": {
"branch": "master",
"git_sha": "c8be52dba1166c678e74cda9c3a3c221635c8bb1",
"installed_by": ["modules"]
},
"seqfu/stats": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
Expand Down
18 changes: 0 additions & 18 deletions modules/nf-core/bwamem2/mem/bwamem2-mem.diff

This file was deleted.

2 changes: 1 addition & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ params {
skip_tools = ''
// References
genome = null
fasta = null
igenomes_base = 's3://ngi-igenomes/igenomes/'
igenomes_ignore = false
sort_bam = true
bwa_index = null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should not be needed


// Fastqscreen options
fastq_screen_references = "${projectDir}/assets/example_fastq_screen_references.csv"
Expand Down
14 changes: 10 additions & 4 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"skip_tools": {
"type": "string",
"description": "Comma-separated string of tools to skip",
"pattern": "^((fastqc|fastqscreen|seqfu_stats|seqtk_sample|bwamem2_index|bwamem2_mem|picard_collectmultiplemetrics|samtools_faidx|samtools_index)?,?)*(?<!,)$"
"pattern": "^((fastqc|fastqscreen|seqfu_stats|seqtk_sample|bwamem2_index|bwamem2_mem)?,?)*(?<!,)$"
}
}
},
Expand Down Expand Up @@ -94,7 +94,7 @@
},
"fastq_screen_references": {
"type": "string",
"default": "${projectDir}/assets/example_fastq_screen_references.csv",
"default": "/Users/agrimabhatt/seqinspector/assets/example_fastq_screen_references.csv",
"fa_icon": "fas fa-search",
"description": "A .csv of reference genomes to be mapped against by FastQ Screen"
}
Expand Down Expand Up @@ -214,7 +214,8 @@
"modules_testdata_base_path": {
"type": "string",
"description": "Base path / URL for data used in the modules",
"hidden": true
"hidden": true,
"default": "s3://ngi-igenomes/testdata/nf-core/modules/"
},
"multiqc_config": {
"type": "string",
Expand Down Expand Up @@ -270,5 +271,10 @@
{
"$ref": "#/$defs/generic_options"
}
]
],
"properties": {
"bwa_index": {
"type": "string"
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should be in the reference_genome_options part

}
Loading