Skip to content

getTrimGaloreReadsAfterFiltering function fails to extract filtered read count from TrimGalore log #1557

@iraiosub

Description

@iraiosub

Description of the bug

The getTrimGaloreReadsAfterFiltering function in the FASTQ_FASTQC_UMITOOLS_TRIMGALORE subworkflow incorrectly parses TrimGalore log files, leading to misleading read count reporting. See issue nf-core/modules#8576

Problem: The regex pattern expected exactly one standard space after the colon when extracting filtered read counts, but TrimGalore logs can contain tabs or multiple/non-standard spaces. This caused the regex to fail silently, returning the total number of reads processed instead of reads that passed filtering. This is reflected in the multiQC Cutadapt section.

Impact on Pipeline:

  • Incorrect read count reporting – Users may see misleading numbers for reads retained after trimming
  • Silent failure – No error thrown, making the issue hard to detect

Solution

Updated the regex pattern from requiring exactly one space to allowing any amount/type of whitespace directly in the FASTQ_FASTQC_UMITOOLS_TRIMGALORE subworkflow. See PR nf-core/modules#8577

Command used and terminal output

nextflow run nf-core/rnaseq -r 3.18.0 \
--input 'samplesheet.csv' \
--fasta $REFDIR/GRCh38.primary_assembly.genome.fa \
--gtf $REFDIR/gencode.v44.primary_assembly.annotation.gtf \
--gencode \
--remove_ribo_rna \
--with_umi \
--umitools_bc_pattern NNNNN \
--extra_trimgalore_args "--clip_R1 3 -a A{11}" \
--extra_salmon_quant_args="--noLengthCorrection" \
-resume \
-profile singularity,crick

Relevant files

h0_C_2_trimmed.fastq.gz_trimming_report.txt

System information

Nextflow v24.10.2
HPC
slurm
Singularity v3.11.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions