Curated Metagenomics NextFlow Pipeline

A NextFlow pipeline for processing metagenomics data, implementing the curatedMetagenomics workflow.

Overview

This pipeline processes raw sequencing data through multiple steps:

FASTQ extraction with fasterq-dump
Quality control with KneadData
Taxonomic profiling with MetaPhlAn
Functional profiling with HUMAnN (optional)

Usage

Basic usage:

nextflow run main.nf --metadata_tsv samples.tsv

With specific parameters:

nextflow run main.nf --metadata_tsv samples.tsv --skip_humann --publish_dir results

Parameters

General Pipeline Parameters

Parameter	Description	Default
`metadata_tsv`	Path to TSV file with sample metadata	`samples.tsv`
`publish_dir`	Directory to publish results	`results`
`store_dir`	Directory to store reference databases	`databases`
`cmgd_version`	Curated Metagenomic Data version	`4`

Process Control Parameters

Parameter	Description	Default
`skip_humann`	Skip HUMAnN functional profiling	`false`

MetaPhlAn Parameters

Parameter	Description	Default
`metaphlan_index`	MetaPhlAn index to use	`latest`

HUMAnN Parameters

Parameter	Description	Default
`chocophlan`	ChocoPhlAn database version	`full`
`uniref`	UniRef database version	`uniref90_diamond`

Input Format

The metadata_tsv file should be a tab-separated values file with at least the following columns:

sample_id: Unique sample identifier
NCBI_accession: SRA accession number(s), separated by semicolons for multiple files

Example:

sample_id    NCBI_accession
sample1      SRR1234567
sample2      SRR2345678;SRR2345679

Output

Results will be organized by sample in the publish_dir directory:

results/
├── sample1/
│   ├── fasterq_dump/
│   ├── kneaddata/
│   ├── metaphlan_lists/
│   ├── metaphlan_markers/
│   ├── strainphlan_markers/
│   └── humann/
├── sample2/
│   └── ...

Profiles

The pipeline comes with several execution profiles:

local: For local execution
google: For execution on Google Cloud Batch
anvil: For execution on AnVIL
alpine: For execution on Alpine HPC
unitn: For execution on UNITN PBS Pro

Example:

nextflow run main.nf -profile google --metadata_tsv samples.tsv

Dependencies

This pipeline requires:

Nextflow 22.10.0 or later
Container support (Docker, Singularity, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
docker		docker
worker		worker
.gcloudignore		.gcloudignore
.gitignore		.gitignore
README.md		README.md
get_nextflow.sh		get_nextflow.sh
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
runthrough.sh		runthrough.sh
samplesheet.test.tsv		samplesheet.test.tsv
samplesheet.tsv		samplesheet.tsv
submit_alpine.sh		submit_alpine.sh
submit_anvil.sh		submit_anvil.sh
submit_biowulf.sh		submit_biowulf.sh
submit_bridges.sh		submit_bridges.sh
submit_unitn.sh		submit_unitn.sh
unitn_setup.md		unitn_setup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Curated Metagenomics NextFlow Pipeline

Overview

Usage

Parameters

General Pipeline Parameters

Process Control Parameters

MetaPhlAn Parameters

HUMAnN Parameters

Input Format

Output

Profiles

Dependencies

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

seandavi/curatedMetagenomicsNextflow

Folders and files

Latest commit

History

Repository files navigation

Curated Metagenomics NextFlow Pipeline

Overview

Usage

Parameters

General Pipeline Parameters

Process Control Parameters

MetaPhlAn Parameters

HUMAnN Parameters

Input Format

Output

Profiles

Dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages