-
Notifications
You must be signed in to change notification settings - Fork 26
Home
Welcome to the ggpicrust2 wiki!
Wiki Documentation for https://github.com/cafferychen777/ggpicrust2
Generated on: 2025-05-03 22:02:53
- Introduction and Installation
- Workflow and Core Functions
- Differential Abundance Analysis (DAA)
- Visualization Functions
- Gene Set Enrichment Analysis (GSEA)
- FAQ and Troubleshooting
README.mdDESCRIPTION
Related topics: Workflow and Core Functions, Differential Abundance Analysis (DAA)
ggpicrust2 is an R package designed to enhance the analysis and visualization of functional predictions generated by PICRUSt2. It provides tools for differential abundance analysis (DAA), gene set enrichment analysis (GSEA), and advanced visualization of predicted functional profiles in microbial communities.
The package aims to facilitate biological interpretation of microbial community functions by providing statistical rigor and intuitive visualizations. It supports KEGG, MetaCyc, and GO pathways, offering flexibility in functional annotation and analysis.
ggpicrust2 provides the following key functionalities:
- Differential Abundance Analysis (DAA): Identifies statistically significant differences in functional profiles between different experimental groups using methods like LinDA.
- Gene Set Enrichment Analysis (GSEA): Determines whether specific sets of genes or pathways are enriched in a dataset.
- Pathway Annotation: Annotates predicted functions and pathways, facilitating biological interpretation.
- Data Visualization: Generates publication-quality figures, including heatmaps, bar plots, dot plots, and network diagrams.
- Reference Database Support: Supports KEGG, MetaCyc, and GO databases.
ggpicrust2 can be installed from CRAN or directly from GitHub. Installation from CRAN is recommended for most users.
- R (>= 4.0)
BiocManager
-
Install
BiocManager:install.packages("BiocManager") -
Install
ggpicrust2:install.packages("ggpicrust2") -
Load the package:
library(ggpicrust2)
-
Install
devtools:install.packages("devtools") -
Install
ggpicrust2from GitHub:devtools::install_github("cafferychen777/ggpicrust2")
A basic workflow using ggpicrust2 involves loading PICRUSt2 output, performing statistical analysis, annotating pathways, and generating visualizations.
# Load necessary libraries
library(ggpicrust2)
# Example data (replace with your actual data)
data("example")
metadata <- example$metadata
pathway_abundance <- example$KO_pathway
# Perform DAA
daa_result <- perform_daa(
input.data = pathway_abundance,
sample.metadata = metadata,
group = "Group"
)
# Visualize DAA results
visualize_daa(daa_result)This example demonstrates how to perform differential abundance analysis using the perform_daa function and visualize the results.
ggpicrust2 integrates with the standard PICRUSt2 workflow, taking predicted functional abundances as input and providing tools for statistical analysis and visualization. The architecture is modular, allowing users to perform specific steps or complete end-to-end analyses.
graph TD
A[PICRUSt2 Output] --> B{ggpicrust2 Functions};
B --> C[DAA];
B --> D[GSEA];
C --> E[DAA Results];
D --> F[GSEA Results];
E --> G[DAA Visualization];
F --> H[GSEA Visualization];
G --> I[Plots];
H --> I[Plots];
This diagram illustrates the flow of data and the relationships between different components within the ggpicrust2 package.
R/ggpicrust2.RR/ko2kegg_abundance.RR/pathway_daa.R
Related topics: Introduction and Installation, Differential Abundance Analysis (DAA), Visualization Functions
This page describes the core functions and workflow of the ggpicrust2 package, focusing on the conversion of KO abundances to KEGG pathway abundances and the subsequent differential abundance analysis (DAA). The key files are R/ggpicrust2.R, R/ko2kegg_abundance.R, and R/pathway_daa.R.
The primary workflow involves converting predicted KO abundances to KEGG pathway abundances, followed by identifying differentially abundant pathways between sample groups. The ggpicrust2 function often acts as a wrapper to streamline this process. With the addition of GSEA functionality in version 2.1.0, the workflow can also include Gene Set Enrichment Analysis.
graph TD
A[Input: Predicted KO Abundance Table] --> B{ko_to_kegg_abundance};
B --> C[Output: KEGG Pathway Abundance Table];
C --> D{pathway_daa};
D --> E[Output: Differential Pathway Abundance Results];
C --> F{pathway_gsea};
F --> G[Output: GSEA Results];
A --> H{ggpicrust2};
H --> E;
H --> G;
C --> H;
-
Purpose: Converts predicted KO abundances to KEGG pathway abundances using reference data that maps KOs to KEGG pathways.
-
Key Function:
ko_to_kegg_abundance()takes a KO abundance table (samples as columns, KOs as rows) and calculates KEGG pathway abundances for each sample. -
Role in Workflow: Translates gene-level predictions (KOs) into pathway-level predictions (KEGG pathways).
-
Usage Example:
# Assuming 'ko_abundance_df' is a data frame with KO abundances
kegg_pathway_abundance <- ko_to_kegg_abundance(ko_abundance_df)
head(kegg_pathway_abundance)-
Purpose: Performs Differential Abundance Analysis (DAA) on KEGG pathway abundances to identify pathways with significant abundance differences between sample groups.
-
Key Function:
pathway_daa()takes the KEGG pathway abundance table and sample metadata (including group information) as input. It supports various DAA methods (e.g., ANCOMBC, LinDA, DESeq2, edgeR). -
Role in Workflow: Identifies statistically significant changes in functional potential between conditions.
-
Usage Example:
# Assuming 'kegg_pathway_abundance' and 'metadata_df' are data frames
daa_results <- pathway_daa(
pathway_abundance = kegg_pathway_abundance,
metadata = metadata_df,
group_col = "group",
method = "LinDA"
)
head(daa_results)-
Purpose: Contains the main wrapper function,
ggpicrust2()orggpicrust2_extended(), that integrates KO to pathway conversion, pathway DAA, and optionally GSEA into a single function call. -
Key Function:
ggpicrust2()orggpicrust2_extended()provides a high-level interface, taking KO or pathway abundances, metadata, and parameters for conversion (if starting from KOs) and DAA/GSEA. -
Role in Workflow: Simplifies the analysis process by combining the core steps into a single command.
-
Usage Example:
# Assuming 'ko_abundance_df' and 'metadata_df' are data frames
full_workflow_results <- ggpicrust2(
ko_abundance = ko_abundance_df,
metadata = metadata_df,
group_col = "group",
method = "ANCOMBC", # DAA method
pathway_level = "KEGG" # Specify target pathway level
)
head(full_workflow_results)Since version 2.1.0, ggpicrust2 includes GSEA functionality, expanding the workflow to identify pathways that are significantly enriched in a given condition, even if individual pathways don't show significant differential abundance. This is facilitated by functions like pathway_gsea(), visualize_gsea(), and compare_gsea_daa(). The ggpicrust2_extended() function integrates GSEA with the existing DAA workflow.
The primary inputs are:
- Abundance Table: A data frame or matrix where rows represent features (KO IDs or KEGG Pathway IDs) and columns represent samples. Values are typically counts or relative abundances.
- Metadata Table: A data frame where rows represent samples (matching column names in the abundance table) and columns contain sample information, including the grouping variable for DAA/GSEA.
- Install the package:
# Install ggpicrust2 from CRAN
install.packages("ggpicrust2")
# Or install the latest version from GitHub
# install.packages("devtools")
devtools::install_github("cafferychen777/ggpicrust2")- Load the package:
library(ggpicrust2)- Prepare input data: Ensure your KO abundance and metadata are correctly formatted as data frames.
-
Run the analysis: Use the
ggpicrust2()orggpicrust2_extended()function with appropriate parameters. Refer to the function documentation for details on available options and input formats.
Recent versions have included significant updates to the reference databases used for KO to KEGG mapping. Version 2.1.4, for example, increased the EC reference data by 163% and the KO reference data by 15.4%, leading to more comprehensive and accurate pathway annotations. Version 2.2.2 fixed column name compatibility issues in the reference data files.
The ggpicrust2 package provides a comprehensive workflow for analyzing functional potential from microbial community data. By converting KO abundances to KEGG pathway abundances and performing DAA or GSEA, researchers can gain insights into the functional differences between microbial communities in different conditions. The wrapper function ggpicrust2 streamlines this process, making it accessible to a wide range of users.
R/pathway_daa.RR/compare_daa_results.R
Related topics: Workflow and Core Functions, Visualization Functions
Differential Abundance Analysis (DAA) is a key feature in ggpicrust2 for identifying statistically significant differences in pathway abundances between different groups of samples. This analysis uses functions defined primarily in R/pathway_daa.R and R/compare_daa_results.R.
DAA aims to determine which pathways or functional categories exhibit significant changes in abundance across different experimental conditions or sample groups. This helps researchers understand the functional impact of these conditions on the microbial community.
-
pathway_daa.R: Contains the core functionpathway_daa()for performing differential abundance tests on pathway data. -
compare_daa_results.R: Provides thecompare_daa_results()function to compare and consolidate results from multiple DAA tests, particularly useful for multi-group comparisons.
The pathway_daa() function takes a pathway abundance table, sample metadata, and a grouping variable as input. It applies a statistical method (e.g., LinDA) to test for differential abundance of each pathway across the defined groups.
Key Features:
- Performs differential abundance testing using statistical methods.
- Accommodates various experimental designs.
- Generates comprehensive output, including p-values, adjusted p-values, and effect sizes.
Code Example:
# Load pathway abundance and metadata
# pathway_abundance_df: Pathway abundance data frame
# metadata_df: Metadata data frame
# Perform DAA
daa_results <- pathway_daa(
pathway_abundance_df = pathway_abundance_df,
metadata_df = metadata_df,
group_col = "Treatment",
comparison_pair = c("Control", "Drug"),
method = "LinDA"
)
# Print the first few rows of the results
print(head(daa_results))Explanation: This example demonstrates how to use pathway_daa() to compare pathway abundances between "Control" and "Drug" groups, using the "Treatment" column in the metadata for grouping and LinDA for the statistical test.
The compare_daa_results() function is designed to compare and combine the results of multiple pathway_daa() analyses, especially when dealing with more than two groups.
Key Features:
- Combines results from multiple pairwise comparisons.
- Facilitates identification of pathways that are consistently differentially abundant across multiple comparisons.
- Allows for filtering and merging of results based on specific criteria.
Code Example:
# Assuming daa_result_A_vs_B and daa_result_A_vs_C are the results from pathway_daa()
# Combine the results
combined_results <- compare_daa_results(
daa_results_list = list(
"A_vs_B" = daa_result_A_vs_B,
"A_vs_C" = daa_result_A_vs_C
)
)
# Print the first few rows of the combined results
print(head(combined_results))Explanation: This example shows how to use compare_daa_results() to combine the results from two separate DAA comparisons: "A vs B" and "A vs C".
DAA is typically performed after predicting pathway abundances using PICRUSt2 and processing them within ggpicrust2. The results from DAA can be used for downstream visualization and further analysis.
graph TD
A[Input Data] --> B(PICRUSt2 Prediction);
B --> C[Pathway Abundance Table];
C --> D{pathway_daa};
D --> E[DAA Results];
E --> F{compare_daa_results};
F --> G[Combined Results];
E --> H[Visualization];
G --> H;
E --> I[Further Analysis];
G --> I;
To use DAA, you need:
- A pathway abundance table (generated from PICRUSt2 and processed by
ggpicrust2). - A metadata table that maps sample IDs to experimental groups or conditions.
- Ensure the
ggpicrust2package is installed.
Steps:
- Load the necessary data (pathway abundance table and metadata).
- Run
pathway_daa()to perform the differential abundance analysis. - (Optional) If you have multiple groups or comparisons, use
compare_daa_results()to combine the results. - Visualize and interpret the results.
Example Setup:
# Install ggpicrust2 (if not already installed)
# install.packages("ggpicrust2")
# Load the ggpicrust2 library
library(ggpicrust2)
# Load your data
# pathway_abundance_df <- read.table("pathway_abundance.txt", header = TRUE, sep = "\t")
# metadata_df <- read.table("metadata.txt", header = TRUE, sep = "\t")Remember to consult the function documentation (?pathway_daa, ?compare_daa_results) for more details on available parameters and options.
R/pathway_errorbar.RR/pathway_heatmap.RR/pathway_pca.R
Related topics: Workflow and Core Functions, Differential Abundance Analysis (DAA)
This page details the visualization functions pathway_errorbar, pathway_heatmap, and pathway_pca available in the ggpicrust2 package, which are used to visualize pathway abundance data.
-
Purpose: Visualizes the abundance of a single pathway across different groups using error bars.
-
Functionality: Displays the mean or median abundance of a specified pathway for each group, along with error bars representing the standard deviation or standard error. Useful for comparing the abundance of a specific pathway across different experimental conditions or groups.
-
Usage: Requires a pathway abundance data frame, metadata with group information, and the name of the pathway to visualize.
-
Code Example:
# Load necessary libraries and data
library(ggpicrust2)
library(ggplot2)
# Assuming pathway_abundance and metadata are already loaded
# Example usage of pathway_errorbar
pathway_errorbar(
pathway_abundance = pathway_abundance,
metadata = metadata,
group_col = "group",
pathway_name = "ko00010",
error_bar_type = "se"
) +
labs(title = "Glycolysis Pathway Abundance")-
Explanation:
-
pathway_abundance: A data frame where rows represent pathways and columns represent samples. -
metadata: A data frame containing sample metadata, including a column specifying the group to which each sample belongs. -
group_col: The name of the column in the metadata data frame that specifies the group for each sample. -
pathway_name: The name of the pathway to visualize. -
error_bar_type: Specifies whether to use standard error ("se") or standard deviation ("sd") for the error bars.
-
-
Purpose: Creates a heatmap to visualize the abundance of multiple pathways across samples or groups.
-
Functionality: Displays pathway abundance data as a heatmap, allowing for the identification of patterns and relationships between pathways and samples. Supports aggregating samples by group to show representative abundance values (e.g., mean abundance per group) using the
aggregate_by_groupparameter (added in version 2.2.1). -
Usage: Requires pathway abundance data and optionally metadata for grouping and annotation.
-
Code Example:
# Load necessary libraries and data
library(ggpicrust2)
library(ggplot2)
# Assuming pathway_abundance and metadata are already loaded
# Example usage of pathway_heatmap
pathway_heatmap(
pathway_abundance = pathway_abundance,
metadata = metadata,
group_col = "group",
top_n_pathways = 20,
aggregate_by_group = TRUE,
aggregate_fun = "mean"
) +
labs(title = "Pathway Abundance Heatmap")-
Explanation:
-
pathway_abundance: A data frame where rows represent pathways and columns represent samples. -
metadata: A data frame containing sample metadata, including a column specifying the group to which each sample belongs. -
group_col: The name of the column in the metadata data frame that specifies the group for each sample. -
top_n_pathways: The number of top pathways to display in the heatmap (ranked by variance). -
aggregate_by_group: A boolean indicating whether to aggregate samples by group. -
aggregate_fun: The function to use for aggregation (e.g., "mean", "median").
-
-
Purpose: Performs Principal Component Analysis (PCA) on pathway abundance data and visualizes the results.
-
Functionality: Reduces the dimensionality of pathway abundance data and visualizes sample clustering based on pathway profiles. Useful for exploring the major sources of variation in the data and identifying relationships between samples.
-
Usage: Requires pathway abundance data and metadata for coloring or shaping points based on group or other variables.
-
Code Example:
# Load necessary libraries and data
library(ggpicrust2)
library(ggplot2)
# Assuming pathway_abundance and metadata are already loaded
# Example usage of pathway_pca
pathway_pca(
pathway_abundance = pathway_abundance,
metadata = metadata,
color_by = "group",
label_samples = FALSE
) +
labs(title = "Pathway PCA Plot")-
Explanation:
-
pathway_abundance: A data frame where rows represent pathways and columns represent samples. -
metadata: A data frame containing sample metadata, including a column to color points. -
color_by: The name of the column in the metadata data frame to use for coloring points. -
label_samples: A boolean indicating whether to label points with sample names.
-
These visualization functions are a crucial final step in the ggpicrust2 workflow. They transform processed pathway abundance data into interpretable visualizations.
graph TD
A[Raw Sequencing Data] --> B(PICRUSt2 Prediction);
B --> C[Functional Abundance Data];
C --> D(Pathway Annotation);
D --> E[Pathway Abundance Table];
E --> F{Visualization Functions};
F --> G[Errorbar Plot];
F --> H[Heatmap];
F --> I[PCA Plot];
style A fill:#f9f,stroke:#333,stroke-width:2px
style B fill:#ccf,stroke:#333,stroke-width:2px
style C fill:#ccf,stroke:#333,stroke-width:2px
style D fill:#ccf,stroke:#333,stroke-width:2px
style E fill:#ccf,stroke:#333,stroke-width:2px
style F fill:#ffc,stroke:#333,stroke-width:2px
style G fill:#fff,stroke:#333,stroke-width:2px
style H fill:#fff,stroke:#333,stroke-width:2px
style I fill:#fff,stroke:#333,stroke-width:2px
-
Install ggpicrust2:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("ggpicrust2")
-
Prepare Input Data: Ensure your pathway abundance data is in a data frame format with pathways as rows and samples as columns. Your metadata should be a data frame with samples as rows and sample information as columns.
-
Run PICRUSt2 and pathway annotation: Follow the
ggpicrust2documentation to generate the pathway abundance data from your raw sequencing data using PICRUSt2 and thepathway_annotationfunction. -
Use Visualization Functions: Load the
ggpicrust2library and use thepathway_errorbar,pathway_heatmap, andpathway_pcafunctions as shown in the examples above. Adjust parameters as needed to customize the visualizations.
R/pathway_gsea.RR/visualize_gsea.RR/compare_gsea_daa.RR/gsea_pathway_annotation.R
Related topics: Workflow and Core Functions
GSEA is a method to determine whether a priori defined set of genes (in this case, pathways) are statistically enriched in a ranked list of genes. In ggpicrust2, GSEA is used to identify pathways that are significantly enriched or depleted based on differential abundance analysis (DAA) results. The functions pathway_gsea(), visualize_gsea(), compare_gsea_daa(), and gsea_pathway_annotation() facilitate this analysis. This functionality was introduced in ggpicrust2 version 2.1.0.
The GSEA module in ggpicrust2 enables users to:
- Perform GSEA on predicted functional profiles (KEGG, MetaCyc, GO).
- Visualize GSEA results using enrichment plots, dotplots, barplots, network plots, and heatmaps.
- Compare GSEA results with traditional DAA results on pathway abundances.
- Annotate GSEA results with pathway descriptions.
- Purpose: Implements the GSEA algorithm. It takes DAA results and a feature-to-pathway mapping to determine pathway enrichment.
-
Function:
pathway_gsea() -
Key Parameters:
-
daa_results: DAA results (e.g., fromDESeq2,edgeR,LinDA). Must contain feature IDs, log2 fold changes, and p-values/adjusted p-values. -
pathway_map: Feature-to-pathway mapping (e.g., PICRUSt2 output). -
pathway_database: Pathway database ('KEGG', 'MetaCyc', 'GO'). -
pvalue_threshold: P-value threshold for significance. -
log2fc_threshold: Log2 fold change threshold.
-
- Output: Data frame with GSEA results, including enrichment scores, p-values, and adjusted p-values.
# Example
# Assuming 'daa_results_df' and 'pathway_map_df' are your data frames
library(ggpicrust2)
gsea_results <- pathway_gsea(
daa_results = daa_results_df,
pathway_map = pathway_map_df,
pathway_database = "KEGG",
pvalue_threshold = 0.05,
log2fc_threshold = 1
)
head(gsea_results)- Purpose: Adds descriptive annotations to pathway IDs in GSEA results.
-
Function:
gsea_pathway_annotation() -
Key Parameters:
-
gsea_results: Output frompathway_gsea(). -
pathway_database: Pathway database ('KEGG', 'MetaCyc', 'GO'). -
annotation_file: Optional custom annotation file.
-
-
Output: Input
gsea_resultsdata frame with added pathway name/description columns.
# Example
gsea_results_annotated <- gsea_pathway_annotation(
gsea_results = gsea_results,
pathway_database = "KEGG"
)
head(gsea_results_annotated)- Purpose: Generates plots to visualize GSEA results.
-
Function:
visualize_gsea() -
Key Parameters:
-
gsea_results_annotated: Annotated GSEA results. -
plot_type: Plot type ('enrichment_plot', 'dotplot', 'barplot', 'network_plot', 'heatmap'). -
pathway_database: Pathway database ('KEGG', 'MetaCyc', 'GO'). -
top_n: Number of top pathways to display. -
...: Additional plot-specific parameters.
-
-
Output: A
ggplotobject or a list of plots.
# Example
dotplot <- visualize_gsea(
gsea_results_annotated = gsea_results_annotated,
plot_type = "dotplot",
pathway_database = "KEGG",
top_n = 15
)
print(dotplot)- Purpose: Compares pathways identified by GSEA with those from DAA of pathway abundances.
-
Function:
compare_gsea_daa() -
Key Parameters:
-
gsea_results_annotated: Annotated GSEA results. -
pathway_daa_results: DAA results on pathway abundances. -
pathway_database: Pathway database ('KEGG', 'MetaCyc', 'GO'). -
pvalue_threshold: Significance threshold.
-
- Output: Data frame summarizing the comparison.
# Example
comparison_results <- compare_gsea_daa(
gsea_results_annotated = gsea_results_annotated,
pathway_daa_results = pathway_daa_results_df,
pathway_database = "KEGG",
pvalue_threshold = 0.05
)
head(comparison_results)The GSEA functions are designed to integrate with the broader ggpicrust2 workflow, particularly following differential abundance analysis. The ggpicrust2_extended() function offers an integrated pipeline including GSEA.
- Perform DAA on microbial features (ASVs).
- Use
pathway_gsea()with DAA results and feature-to-pathway mapping. - Use
gsea_pathway_annotation()to add pathway descriptions. - Use
visualize_gsea()to generate plots. - (Optional) Perform DAA on pathway abundances and use
compare_gsea_daa()to compare results.
graph TD
A[DAA Results] --> B(pathway_gsea)
C[Pathway Map] --> B
B --> D[GSEA Results]
D --> E(gsea_pathway_annotation)
E --> F[Annotated Results]
F --> G(visualize_gsea)
G --> H[GSEA Plots]
F --> I(compare_gsea_daa)
J[Pathway DAA] --> I
I --> K[Comparison Results]
Ensure ggpicrust2 (version 2.1.0 or later) is installed. The functions require data frames formatted according to PICRUSt2 outputs and DAA packages (e.g., DESeq2, edgeR, LinDA).
# Installation
# devtools::install_github("cafferychen777/ggpicrust2")
# Load library
library(ggpicrust2)
# Help pages
?pathway_gsea
?visualize_gsea
?compare_gsea_daa
?gsea_pathway_annotation
---
<a id='page-6'></a>
## FAQ and Troubleshooting
### Related Files
- `README.md`
### Related Pages
Related topics: [Introduction and Installation](#page-1), [Workflow and Core Functions](#page-2), [Differential Abundance Analysis (DAA)](#page-3), [Visualization Functions](#page-4), [Gene Set Enrichment Analysis (GSEA)](#page-5)
I am unable to provide content for the `README.md` file as the provided context is from the `NEWS.md` file.
---