-
Notifications
You must be signed in to change notification settings - Fork 784
Description
Description of the bug
The getTrimGaloreReadsAfterFiltering
function in the FASTQ_FASTQC_UMITOOLS_TRIMGALORE
subworkflow incorrectly parses TrimGalore log files, leading to misleading read count reporting. See issue nf-core/modules#8576
Problem: The regex pattern expected exactly one standard space after the colon when extracting filtered read counts, but TrimGalore logs can contain tabs or multiple/non-standard spaces. This caused the regex to fail silently, returning the total number of reads processed instead of reads that passed filtering. This is reflected in the multiQC Cutadapt section.
Impact on Pipeline:
- Incorrect read count reporting – Users may see misleading numbers for reads retained after trimming
- Silent failure – No error thrown, making the issue hard to detect
Solution
Updated the regex pattern from requiring exactly one space to allowing any amount/type of whitespace directly in the FASTQ_FASTQC_UMITOOLS_TRIMGALORE
subworkflow. See PR nf-core/modules#8577
Command used and terminal output
nextflow run nf-core/rnaseq -r 3.18.0 \
--input 'samplesheet.csv' \
--fasta $REFDIR/GRCh38.primary_assembly.genome.fa \
--gtf $REFDIR/gencode.v44.primary_assembly.annotation.gtf \
--gencode \
--remove_ribo_rna \
--with_umi \
--umitools_bc_pattern NNNNN \
--extra_trimgalore_args "--clip_R1 3 -a A{11}" \
--extra_salmon_quant_args="--noLengthCorrection" \
-resume \
-profile singularity,crick
Relevant files
h0_C_2_trimmed.fastq.gz_trimming_report.txt
System information
Nextflow v24.10.2
HPC
slurm
Singularity v3.11.3