Merge pull request #287 from d4straub/dev

d4straub · web-flow · commit aa27156a9dbb · 2021-06-28T10:36:39.000+02:00
Update docs &amp; prevent tempory files of ancom to be published
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # ![nf-core/ampliseq](docs/images/nf-core-ampliseq_logo.png)
 
-**16S rRNA amplicon sequencing analysis workflow using QIIME2**.
+**Amplicon sequencing analysis workflow using DADA2 and QIIME2**.
 
 [![DOI](https://zenodo.org/badge/150448201.svg)](https://zenodo.org/badge/latestdoi/150448201)
 [![Cite Publication](https://img.shields.io/badge/Cite%20Us!-Cite%20Publication-important)](https://doi.org/10.3389/fmicb.2020.550420)
@@ -20,7 +20,7 @@
 
 ## Introduction
 
-**nfcore/ampliseq** is a bioinformatics analysis pipeline used for amplicon sequencing data, supporting 16S, ITS and 18S data. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data.
+**nfcore/ampliseq** is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and, currently, taxonomic assignment of 16S, ITS and 18S amplicons. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.
 
 The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
 
@@ -41,6 +41,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
 4. Start running your own analysis!
 
     ```bash
+    #16S rRNA gene amplicon analysis of Illumina paired-end data
     nextflow run nf-core/ampliseq -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input "data" --FW_primer "GTGYCAGCMGCCGCGGTAA" --RV_primer "GGACTACNVGGGTWTCTAAT" --metadata "data/Metadata.tsv"
     ```
 
diff --git a/assets/email_template.html b/assets/email_template.html
@@ -4,7 +4,7 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <meta name="viewport" content="width=device-width, initial-scale=1">
 
-  <meta name="description" content="nf-core/ampliseq: 16S rRNA amplicon sequencing analysis workflow using QIIME2">
+  <meta name="description" content="nf-core/ampliseq: Amplicon sequencing analysis workflow using DADA2 and QIIME2">
   <title>nf-core/ampliseq Pipeline Report</title>
 </head>
 <body>
diff --git a/docs/output.md b/docs/output.md
@@ -24,8 +24,9 @@ and processes data using the following steps:
       * [Relative abundance tables](#relative-abundance-tables) - Exported relative abundance tables
       * [Barplot](#barplot) - Interactive barplot
       * [Alpha diversity rarefaction curves](#alpha-diversity-rarefaction-curves) - Rarefaction curves for quality control
-      * [Alpha diversity indices](#alpha-diversity-indices) - Diversity within samples
-      * [Beta diversity indices](#beta-diversity-indices) - Diversity between samples (e.g. PCoA plots)
+      * [Diversity analysis](#diversity-analysis) - High level overview with different diversity indices
+        * [Alpha diversity indices](#alpha-diversity-indices) - Diversity within samples
+        * [Beta diversity indices](#beta-diversity-indices) - Diversity between samples (e.g. PCoA plots)
       * [ANCOM](#ancom) - Differential abundance analysis
     * [Read count report](#Read-count-report) - Report of read counts during various steps of the pipeline
     * [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
@@ -76,7 +77,7 @@ DADA2 computes an error model on the sequencing reads (forward and reverse indep
 
 DADA2 reduces sequence errors and dereplicates sequences by quality filtering, denoising, read pair merging (for paired end Illumina reads only) and PCR chimera removal.
 
-Additionally, DADA2 taxonomically classifies the ASVs using pre-trained databases.
+Additionally, DADA2 taxonomically classifies the ASVs using a choice of supplied databases (specified with `--dada_ref_taxonomy`).
 
 **Output files:**
 
@@ -88,7 +89,7 @@ Additionally, DADA2 taxonomically classifies the ASVs using pre-trained database
   * `DADA2_stats.tsv`: Tracking read numbers through DADA2 processing steps, for each sample.
   * `DADA2_table.rds`: DADA2 ASV table as R object.
   * `DADA2_tables.tsv`: DADA2 ASV table.
-* `dada2/args/`: Directory containing all parameters for DADA2 steps.
+* `dada2/args/`: Directory containing files with all parameters for DADA2 steps.
 * `dada2/log/`: Directory containing log files for DADA2 steps.
 * `dada2/QC/`
   * `*.err.convergence.txt`: Convergence values for DADA2's dada command, should reduce over several magnitudes and approaching 0.
@@ -111,11 +112,11 @@ Optionally, the ITS region can be extracted from each ASV sequence using ITSx, a
 
 **Quantitative Insights Into Microbial Ecology 2** ([QIIME2](https://qiime2.org/)) is a next-generation microbiome bioinformatics platform and the successor of the widely used [QIIME1](https://www.nature.com/articles/nmeth.f.303).
 
-ASV sequences and counts as produced before with DADA2 are imported into QIIME2 and further analysed. First, ASVs are taxonomically classified, than filtered (`--exclude_taxa`, `--min_frequency`, `--min_samples`), and abundance tables exported. Following, diversity indices are calculated and testing for differential abundant features between sample groups is performed.
+ASV sequences, counts, and taxonomic classification as produced before with DADA2 are imported into QIIME2 and further analysed. Optionally, ASVs can be taxonomically classified also with QIIME2 against a database chosen with `--qiime_ref_taxonomy` (but DADA2 taxonomic classification takes precedence). Next, ASVs are filtered (`--exclude_taxa`, `--min_frequency`, `--min_samples`), and abundance tables are exported. Following, diversity indices are calculated and testing for differential abundant features between sample groups is performed.
 
 #### Taxonomic classification
 
-ASV abundance and sequences inferred in DADA2 are informative but routinely taxonomic classifications such as family or genus annotation is desireable.
+Taxonomic classification with QIIME2 is typically similar to DADA2 classifications. However, both options are available. When taxonomic classification with DADA2 and QIIME2 is performed, DADA2 classification takes precedence over QIIME2 classifications for all downstream analysis.
 
 **Output files:**
 
@@ -160,7 +161,7 @@ Absolute abundance tables produced by the previous steps contain count data, but
   * `rel-table-6.tsv`: Tab-separated relative abundance table at genus level.
   * `rel-table-7.tsv`: Tab-separated relative abundance table at species level.
   * `rel-table-ASV.tsv`: Tab-separated relative abundance table for all ASVs.
-  * `qiime2_ASV_table.tsv`: Tab-separated table for all ASVs with taxonomic classification, sequence and relative abundance.
+  * `qiime2_ASV_table.tsv`: Tab-separated table for all ASVs with taxonomic classification, sequence and relative abundance. *NOTE: This file is based on QIIME2 taxonomic classifications, contrary to all other files that are based on DADA2 classification, if available.*
 
 #### Barplot
 
@@ -180,24 +181,31 @@ Produces rarefaction plots for several alpha diversity indices, and is primarily
 * `qiime2/alpha-rarefaction/`
   * `index.html`: Interactive alphararefaction curve for taxa abundance per sample that can be viewed in your web browser.
 
-#### Alpha diversity indices
+#### Diversity analysis
 
-Alpha diversity measures the species diversity within samples. Diversity calculations are based on sub-sampled data rarefied to the minimum read count of all samples. This step calculates alpha diversity using various methods and performs pairwise comparisons of groups of samples. It is based on a phylogenetic tree of all ASV sequences.
+Diversity measures summarize important sample features (alpha diversity) or differences between samples (beta diversity). To do so, sample data is first rarefied to the minimum number of counts per sample. Also, a phylogenetic tree of all ASVs is computed to provide phylogenetic information.
 
 **Output files:**
 
+* `qiime2/diversity/`
+  * `Use the sampling depth of * for rarefaction.txt`: File that reports the rarefaction depth in the file name and file content.
 * `qiime2/phylogenetic_tree/`
   * `tree.nwk`: Phylogenetic tree in newick format.
   * `rooted-tree.qza`: Phylogenetic tree in QIIME2 format.
-* `qiime2/diversity/`
-  * `*.txt`: File that describes the rarefaction depth (file name and file contant).
+
+##### Alpha diversity indices
+
+Alpha diversity measures the species diversity within samples. Diversity calculations are based on sub-sampled data rarefied to the minimum read count of all samples. This step calculates alpha diversity using various methods and performs pairwise comparisons of groups of samples. It is based on a phylogenetic tree of all ASV sequences.
+
+**Output files:**
+
 * `qiime2/diversity/alpha_diversity/`
   * `evenness_vector/index.html`: Pielou’s Evenness.
   * `faith_pd_vector/index.html`: Faith’s Phylogenetic Diversity (qualitiative, phylogenetic).
   * `observed_otus_vector/index.html`: Observed OTUs (qualitative).
   * `shannon_vector/index.html`: Shannon’s diversity index (quantitative).
 
-#### Beta diversity indices
+##### Beta diversity indices
 
 Beta diversity measures the species community differences between samples. Diversity calculations are based on sub-sampled data rarefied to the minimum read count of all samples. This step calculates beta diversity distances using various methods and performs pairwise comparisons of groups of samples. Additionally principle coordinates analysis (PCoA) plots are produced that can be visualized with [Emperor](https://biocore.github.io/emperor/build/html/index.html) in your default browser without the need for installation. This calculations are based on a phylogenetic tree of all ASV sequences.
 
@@ -210,27 +218,22 @@ Beta diversity measures the species community differences between samples. Diver
 
 **Output files:**
 
-* `qiime2/phylogenetic_tree/`
-  * `tree.nwk`: Phylogenetic tree in newick format.
-  * `rooted-tree.qza`: Phylogenetic tree in QIIME2 format.
-* `qiime2/diversity/`
-  * `*.txt`: File that describes the rarefaction depth (file name and file contant).
 * `qiime2/diversity/beta_diversity/`
-  * `<method>_distance_matrix-<treatment>/index.html`
-  * `<method>_pcoa_results-PCoA/index.html`
+  * `<method>_distance_matrix-<treatment>/index.html`: Box plots and significance analysis (PERMANOVA).
+  * `<method>_pcoa_results-PCoA/index.html`: Interactive PCoA plot.
     * method: bray_curtis, jaccard, unweighted_unifrac, weighted_unifrac
     * treatment: depends on your metadata sheet or what metadata categories you have specified
 
 #### ANCOM
 
 Analysis of Composition of Microbiomes ([ANCOM](https://www.ncbi.nlm.nih.gov/pubmed/26028277)) is applied to identify features that are differentially abundant across sample groups. A key assumption made by ANCOM is that few taxa (less than about 25%) will be differentially abundant between groups otherwise the method will be inaccurate.
 
-ANCOM is applied to each suitable or specified metadata column for 6 taxonomic levels.
+ANCOM is applied to each suitable or specified metadata column for 5 taxonomic levels (2-6).
 
 **Output files:**
 
 * `qiime2/ancom/`
-  * `Category-<treatment>-<taxonomic level>/index.html`
+  * `Category-<treatment>-<taxonomic level>/index.html`: Statistical results and interactive Volcano plot.
     * treatment: depends on your metadata sheet or what metadata categories you have specified
     * taxonomic level: level-2 (phylum), level-3 (class), level-4 (order), level-5 (family), level-6 (genus), ASV
 
@@ -240,7 +243,7 @@ This report includes information on how many reads per sample passed each pipeli
 
 **Output files:**
 
-* `overall_summary.tsv`
+* `overall_summary.tsv`: Tab-separated file with count summary.
 
 ## Pipeline information
 
diff --git a/docs/usage.md b/docs/usage.md
@@ -30,7 +30,7 @@ results         # Finished results (configurable, see below)
 # Other nextflow hidden files, eg. history of pipeline runs and old logs.
 ```
 
-See the [nf-core/ampliseq website documentation](https://nf-co.re/ampliseq/usage#usage) for more information about pipeline specific parameters.
+See the [nf-core/ampliseq website documentation](https://nf-co.re/ampliseq/parameters) for more information about pipeline specific parameters.
 
 ### Updating the pipeline
 
diff --git a/modules/local/qiime2_ancom_tax.nf b/modules/local/qiime2_ancom_tax.nf
@@ -37,9 +37,9 @@ process QIIME2_ANCOM_TAX {
 
     # Extract summarised table and output a file with the number of taxa
     qiime tools export --input-path lvl${taxlevel}-${table} --output-path exported/
-    biom convert -i exported/feature-table.biom -o ancom/lvl${taxlevel}-${table}.feature-table.tsv --to-tsv
+    biom convert -i exported/feature-table.biom -o ${table.baseName}-level-${taxlevel}.feature-table.tsv --to-tsv
 
-    if [ \$(grep -v '^#' -c ancom/lvl${taxlevel}-${table}.feature-table.tsv) -lt 2 ]; then
+    if [ \$(grep -v '^#' -c ${table.baseName}-level-${taxlevel}.feature-table.tsv) -lt 2 ]; then
         echo ${taxlevel} > ancom/\"WARNING Summing your data at taxonomic level ${taxlevel} produced less than two rows (taxa), ANCOM can't proceed -- did you specify a bad reference taxonomy?\".txt
     else
         qiime composition add-pseudocount \
diff --git a/nextflow.config b/nextflow.config
@@ -78,7 +78,7 @@ params {
     singularity_pull_docker_container = false
     validate_params   = true
     show_hidden_params = false
-    schema_ignore_params = 'dada_ref_databases,qiime_ref_databases,modules'
+    schema_ignore_params = 'dada_ref_databases,qiime_ref_databases,modules,igenomes_base'
 
     // Defaults only, expecting to be overwritten
     max_memory        = 128.GB
@@ -199,7 +199,7 @@ manifest {
     name = 'nf-core/ampliseq'
     author = 'Daniel Straub, Alexander Peltzer'
     homePage = 'https://github.com/nf-core/ampliseq'
-    description = '16S rRNA amplicon sequencing analysis workflow using QIIME2'
+    description = 'Amplicon sequencing analysis workflow using DADA2 and QIIME2'
     mainScript = 'main.nf'
     nextflowVersion = '!>=21.04.0'
     version = '2.0.0dev'
diff --git a/nextflow_schema.json b/nextflow_schema.json