A NextFlow pipeline for processing metagenomics data, implementing the curatedMetagenomics workflow.
This pipeline processes raw sequencing data through multiple steps:
- FASTQ extraction with
fasterq-dump
- Quality control with
KneadData
- Taxonomic profiling with
MetaPhlAn
- Functional profiling with
HUMAnN
(optional)
Basic usage:
nextflow run main.nf --metadata_tsv samples.tsv
With specific parameters:
nextflow run main.nf --metadata_tsv samples.tsv --skip_humann --publish_dir results
Parameter | Description | Default |
---|---|---|
metadata_tsv |
Path to TSV file with sample metadata | samples.tsv |
publish_dir |
Directory to publish results | results |
store_dir |
Directory to store reference databases | databases |
cmgd_version |
Curated Metagenomic Data version | 4 |
Parameter | Description | Default |
---|---|---|
skip_humann |
Skip HUMAnN functional profiling | false |
Parameter | Description | Default |
---|---|---|
metaphlan_index |
MetaPhlAn index to use | latest |
Parameter | Description | Default |
---|---|---|
chocophlan |
ChocoPhlAn database version | full |
uniref |
UniRef database version | uniref90_diamond |
The metadata_tsv
file should be a tab-separated values file with at least the following columns:
sample_id
: Unique sample identifierNCBI_accession
: SRA accession number(s), separated by semicolons for multiple files
Example:
sample_id NCBI_accession
sample1 SRR1234567
sample2 SRR2345678;SRR2345679
Results will be organized by sample in the publish_dir
directory:
results/
├── sample1/
│ ├── fasterq_dump/
│ ├── kneaddata/
│ ├── metaphlan_lists/
│ ├── metaphlan_markers/
│ ├── strainphlan_markers/
│ └── humann/
├── sample2/
│ └── ...
The pipeline comes with several execution profiles:
local
: For local executiongoogle
: For execution on Google Cloud Batchanvil
: For execution on AnVILalpine
: For execution on Alpine HPCunitn
: For execution on UNITN PBS Pro
Example:
nextflow run main.nf -profile google --metadata_tsv samples.tsv
This pipeline requires:
- Nextflow 22.10.0 or later
- Container support (Docker, Singularity, etc.)