Skip to content

A pipeline to analyse Thermal Protein Profiling (TPP) proteomics experiments. Import tools for multiple proteomics software analyses including DDA, DIA, TMT and LFQ. Visualisation tools allow plotting melting curves at any point in the pipeline. Output data is easily accessed, displayed and exported in a tidy format.

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

jackrogan/tidyTPP

Repository files navigation

tidyTPP

Codecov test coverage R-CMD-check

Introduction

tidyTPP provides an analysis pipeline for Thermal Protein Profiling (TPP) proteomics data, with functions for reading in data from protein quantification software, normalising data, analysis, and hit-finding.

The goal of tidyTPP is to provide a range of functions, each of which requires data in a straightforward data frame or tibble and returns long-format data in a tibble that are readily understood and easily used for further analysis. The style of data is inspired by the tidyverse, and tidyTPP makes use of several tidyverse packages - in particular tibble and ggplot2.

Currently supports:

Importing:

  • Importing from Thermo Proteome Discoverer

  • Importing from Spectronaut

  • Importing with custom settings

Normalisation and Analysis:

  • Single- and multi-threaded fitting of temperature-dependent protein melting curves (TPP-TR)

  • Significance scores by melting point difference ($\Delta T_m$)1

  • Significance scores by non-pararametric analysis of response curves (NPARC)2

Hit-finding

  • Identifying hits by $\Delta T_m$ p-value

  • Identifying hits by NPARC F-score and p-value

Plotting

  • Plotting TPP-TR melting-curves with ggplot2

Installation

You can install the development version of tidyTPP from GitHub with:

# install.packages("pak")
pak::pak("jackrogan/tidyTPP")

Example: Pre-built pipeline

A version of the entire pipeline, with defaults to match common uses, is available with apply_TPP_pipeline. This executes the following sequence:

import_TPP() |>
  normalise_TPP() |>
  analyse_TPP() |>
  export_TPP() |>
  get_TPP_hits()

The function uses the following defaults in addition to the default behaviour of each included function:

  • Import supported proteomics quantification data formats

  • Calculate NPARC and $\Delta T_m$ significance scores

  • Report all hits with:

    • Adjusted NPARC-derived p-value < 0.05

    • $\Delta T_m$ in the same direction

    • $\Delta T_m$ vs. control > $\Delta T_m$ between controls

    • Full results table and identified hits are exported as .xlsx files with input file names modified with “_Results.xlsx” and “_Hits.xlsx” respectively, in the same location as the initial file input.

Usage:

# Example pipeline call
apply_TPP_pipeline(
  datafile = "path_to_data_input.csv",
  config = "path_to_config_file.csv",
  path = "[optional]_path_to_shared_dir",
  import_format = "format_name",
  to_plot = FALSE,
  max_cores = 4)

The character argument import_format defines which of the inbuilt import functions to use to read in data files. Currently:

  • “Spectronaut”, “SN” - imports from peptide tables exported from Spectronaut DIA quantification reports.

  • “ProteomeDiscoverer”, “PD” - imports from protein tables exported from Thermo Proteome Discoverer DDA quantification results.

  • Default “Spectronaut”.

The (boolean) argument to_plot defines whether to show automated plots across all methods; if TRUE, normalisation, NPARC score distribution, and hit melting-point curves will all be plotted.

  • Default FALSE.

The (integer) argument max_cores is passed to curve-fitting methods and defines maximum parallel cores to use for these operations.

  • Default 4.
# Pipeline example: 4-protein test data
library(tidyTPP)

four_prot_report <-
    system.file("extdata", "4_protein_peptide_report.csv", package = "tidyTPP")
experiment_config <-
    system.file("extdata", "4_protein_config.csv", package = "tidyTPP")

# Apply pipeline
TPP_hits <- 
  apply_TPP_pipeline(datafile = four_prot_report,
                    config = experiment_config,
                    import_format = "spectronaut",
                    to_plot = TRUE,
                    max_cores = 4)

Result:

TPP_hits
#> # A tibble: 1 × 10
#>   Protein_ID Condition Comparison         F_scaled p_adj_NPARC max_adj_pvalue
#>   <chr>      <chr>     <chr>                 <dbl>       <dbl>          <dbl>
#> 1 Protein_A  Treated   Treated_vs_Control     4.52     0.00179          0.604
#> # ℹ 4 more variables: mean_melt_point <dbl>, mean_control_melt_point <dbl>,
#> #   mean_diff_melt_point <dbl>, mean_control_diff_melt_point <dbl>

Example: Function detail

This example will walk through the main functions using 4-protein example data

Import

import_ functions read two files - quantification results output in a program-specific format, and a configuration file, defining conditions, replicates and temperatures, in the format shown.

# 4-protein example data
four_prot_report <-
    system.file("extdata", "4_protein_peptide_report.csv", package = "tidyTPP")
experiment_config <-
    system.file("extdata", "4_protein_config.csv", package = "tidyTPP")
experiment_dir <- system.file("extdata", package = "tidyTPP")
  
# Config file contents
read.csv(experiment_config)[1:10,]
#>    Experiment Condition Replicate Temp
#> 1           1   Control         1   37
#> 2           2   Control         1   41
#> 3           3   Control         1   44
#> 4           4   Control         1   47
#> 5           5   Control         1   50
#> 6           6   Control         1   53
#> 7           7   Control         1   56
#> 8           8   Control         1   59
#> 9           9   Control         1   63
#> 10         10   Control         1   67

# Import using spectronaut import format
four_prot_quan_data <- 
  import_spectronaut(datafile = "4_protein_peptide_report.csv",
                     config = "4_protein_config.csv",
                     path = experiment_dir)
#> 
#> --------------------
#> TPP Data Import
#> --------------------
#> Read in:
#> 4_protein_peptide_report.csv 
#> 4_protein_config.csv 
#> --------------------
#> Pivoting to long table...
#> Transforming experiment names...
#> Matching to experiment config data...
#> Finding relative quantity values...
#> Data imported.
#> Found 4 proteins.
#> --------------------

# Resulting tibble
four_prot_quan_data
#> # A tibble: 160 × 8
#>    Protein_ID Pep_N Match_N Condition Replicate  Temp rel_quantity raw_quantity
#>    <chr>      <dbl>   <dbl> <chr>     <chr>     <int>        <dbl>        <dbl>
#>  1 Protein_A     36      62 Control   01           37      1           7913466.
#>  2 Protein_A     36      62 Control   01           41      0.915       7238843 
#>  3 Protein_A     36      62 Control   01           44      0.895       7083090 
#>  4 Protein_A     36      62 Control   01           47      0.689       5454540.
#>  5 Protein_A     36      62 Control   01           50      0.416       3291901 
#>  6 Protein_A     36      62 Control   01           53      0.195       1542812.
#>  7 Protein_A     36      62 Control   01           56      0.0458       362589.
#>  8 Protein_A     36      62 Control   01           59      0.0168       132594.
#>  9 Protein_A     36      62 Control   01           63      0.0258       204328.
#> 10 Protein_A     36      62 Control   01           67      0.00986       78023.
#> # ℹ 150 more rows

Data at this stage can be plotted to directly observe relative but not yet normalised protein curves.

plot_melt(four_prot_quan_data)

Normalisation

normalise_TPP transforms relative TPP-TR relative intensity data, normalising against fitted median melting curves, as described by Savitsky et al. 2014.1

# Normalise four-protein data, with visualisation
four_prot_normalised <- 
  normalise_TPP(TPP_tbl = four_prot_quan_data,
                to_plot = TRUE)
#> --------------------
#> TPP Normalisation
#> --------------------
#> Quality Criteria:
#>     col lower upper
#> 1 Pep_N     2   Inf
#> 
#> jointP contains 4 Proteins.
#> 
#> normP criteria:
#>   Temp lower upper
#> 1   56   0.4   0.6
#> 2   63  -Inf   0.3
#> 3   67  -Inf   0.2
#> 
#> normP contains 2 Proteins.
#> --------------------
#> Fit melting curve to normP medians:
#> 
#> Fitting 4 protein curves...
#> Estimated total process time: 0.92 s
#> |==        | 1 of 4|=====     | 2 of 4|=======   | 3 of 4|==========| 4 of 4
#> 4 of 4 fitted successfully.
#> 
#> Total elapsed time: 0.66 s
#> 
#> Best fitted normP median curve:
#>   Condition Replicate  R_sq
#> 1   Control        02 0.998
#> --------------------
#> 4 proteins normalised
#> --------------------

Resulting in the normalised data:

# Plotted melting data points
plot_melt(four_prot_normalised)
#> # A tibble: 160 × 9
#>    Condition Replicate  Temp Protein_ID Pep_N Match_N rel_quantity raw_quantity
#>    <chr>     <chr>     <int> <chr>      <dbl>   <dbl>        <dbl>        <dbl>
#>  1 Control   01           37 Protein_A     36      62        1.000     7913466.
#>  2 Control   01           37 Protein_B     62     133        1.000   371610272 
#>  3 Control   01           37 Protein_C     23      49        1.000    94161240 
#>  4 Control   01           37 Protein_D      5       7        1.000     4286970.
#>  5 Control   01           41 Protein_D      5       7        1.13      4460470.
#>  6 Control   01           41 Protein_A     36      62        0.992     7238843 
#>  7 Control   01           41 Protein_B     62     133        1.01    345026016 
#>  8 Control   01           41 Protein_C     23      49        0.871    75621648 
#>  9 Control   01           44 Protein_D      5       7        1.14      4613956.
#> 10 Control   01           44 Protein_C     23      49        0.861    76666176 
#> # ℹ 150 more rows
#> # ℹ 1 more variable: norm_coefficient <dbl>

Analysis

analyse_TPP fits sigmoidal melting curves are fitted to each unique combination of protein, condition and replicate, and features of the curve (including $T_m$) calculated. Significance statistics are optionally calculated from the data:

  • FDR-adjusted p-values are calculated for melting point differences from control conditions ($\Delta T_m$) for each replicate1

  • Scaled F-scores and FDR-adjusted p-values are calculated from NPARC analysis for each protein2

Default behaviour is to apply both analyses:

# Analyse four-protein data
four_prot_analysed <- 
  analyse_TPP(TPP_tbl = four_prot_normalised,
              control_name = "Control",
              p_value_methods = c("melting_point", "NPARC"))

# Analysed data and fitted curves
plot_melt(four_prot_analysed)

All statistics are appended as new measurements to the tibble:

# Full table
four_prot_analysed
#> # A tibble: 160 × 31
#>    Protein_ID Condition Replicate  Temp Pep_N Match_N rel_quantity raw_quantity
#>    <chr>      <chr>     <chr>     <int> <dbl>   <dbl>        <dbl>        <dbl>
#>  1 Protein_A  Control   01           37    36      62       1.000      7913466.
#>  2 Protein_A  Control   01           56    36      62       0.0580      362589.
#>  3 Protein_A  Control   01           47    36      62       0.718      5454540.
#>  4 Protein_A  Control   01           53    36      62       0.221      1542812.
#>  5 Protein_A  Control   01           41    36      62       0.992      7238843 
#>  6 Protein_A  Control   01           50    36      62       0.493      3291901 
#>  7 Protein_A  Control   01           63    36      62       0.0231      204328.
#>  8 Protein_A  Control   01           44    36      62       0.947      7083090 
#>  9 Protein_A  Control   01           67    36      62       0.0111       78023.
#> 10 Protein_A  Control   01           59    36      62       0.0167      132594.
#> # ℹ 150 more rows
#> # ℹ 23 more variables: norm_coefficient <dbl>, F_scaled <dbl>,
#> #   p_adj_NPARC <dbl>, a <dbl>, b <dbl>, plateau <dbl>, melt_point <dbl>,
#> #   infl_point <dbl>, slope <dbl>, R_sq <dbl>, RSS <dbl>, sigma <dbl>,
#> #   n_coeffs <int>, n_obs <int>, log_lik <dbl>, AICc <dbl>, Comparison <chr>,
#> #   diff_melt_point <dbl>, min_comparison_slope <dbl>, min_R_sq <dbl>,
#> #   min_slope <dbl>, max_control_plateau <dbl>, adj_pvalue <dbl>

# Statistics only
one_prot_stats <- 
  four_prot_analysed |>
  dplyr::filter(Protein_ID == "Protein_A", Condition != "Control") |>
  dplyr::select(Protein_ID, 
                Condition, 
                Replicate, 
                F_scaled, 
                p_adj_NPARC,
                melt_point,
                diff_melt_point,
                adj_pvalue
                ) |>
  dplyr::distinct()

# 1. Melting point
one_prot_stats |>
  dplyr::select(Protein_ID, 
                Condition, 
                Replicate,
                melt_point,
                diff_melt_point,
                adj_pvalue
                )
#> # A tibble: 2 × 6
#>   Protein_ID Condition Replicate melt_point diff_melt_point adj_pvalue
#>   <chr>      <chr>     <chr>          <dbl>           <dbl>      <dbl>
#> 1 Protein_A  Treated   01              51.3            1.63      0.604
#> 2 Protein_A  Treated   02              50.5            1.36      0.604

# 2. NPARC
one_prot_stats |>
  dplyr::select(Protein_ID, 
                Condition,
                F_scaled, 
                p_adj_NPARC,
                ) |>
  dplyr::distinct()
#> # A tibble: 1 × 4
#>   Protein_ID Condition F_scaled p_adj_NPARC
#>   <chr>      <chr>        <dbl>       <dbl>
#> 1 Protein_A  Treated       4.52     0.00179

Hit Identification

get_TPP_hits filters and summarises analysed data. Hits, by default, have an NPARC-derived, FDR-adjusted p-value below 0.05,2 $\Delta T_m$ in the same direction across replicates, and $\Delta T_m$ for comparisons against controls greater than between controls.1

# Get hits from analysed four-protein data
four_prot_hits <- 
  get_TPP_hits(TPP_data = four_prot_analysed,
               hit_criteria = "default_hit_criteria",
               to_plot = TRUE,
               annotate = "melt_point")
#> --------------------
#> TPP Hit Identification
#> --------------------
#> Hit Criteria:
#>  NPARC_pvalue_threshold DTm_same_sign DTm_gt_Dcontrol
#>                    0.05          TRUE            TRUE
#> --------------------
#> Exporting hit data...
#> --------------------
#> TPP Export
#> --------------------
#> Export format: xlsx 
#> Saving TPP_hits.xlsx ...
#> Saved.
#> --------------------
#> Plotting hit melting curves...
#> 1 hits found.
#> --------------------

four_prot_hits
#> # A tibble: 1 × 10
#>   Protein_ID Condition Comparison         F_scaled p_adj_NPARC max_adj_pvalue
#>   <chr>      <chr>     <chr>                 <dbl>       <dbl>          <dbl>
#> 1 Protein_A  Treated   Treated_vs_Control     4.52     0.00179          0.604
#> # ℹ 4 more variables: mean_melt_point <dbl>, mean_control_melt_point <dbl>,
#> #   mean_diff_melt_point <dbl>, mean_control_diff_melt_point <dbl>

Export

get_TPP_hits exported hit data by default, and full data can also be exported as an excel .xlsx spreadsheet with export_TPP. Alternatives are delimited text files (.csv, .tsv) or R data.

# Export as .xlsx
export_TPP(TPP_data = four_prot_analysed,
           file_name = "TPP_results.xlsx",
           format = "xlsx")

References

1. Savitski, M. M. et al. Tracking cancer drugs in living cells by thermal profiling of the proteome. Science 346, (2014).

2. Childs, D. et al. Nonparametric analysis of thermal proteome profiles reveals novel drug-binding proteins*. Molecular & Cellular Proteomics 18, 2506–2515 (2019).

About

A pipeline to analyse Thermal Protein Profiling (TPP) proteomics experiments. Import tools for multiple proteomics software analyses including DDA, DIA, TMT and LFQ. Visualisation tools allow plotting melting curves at any point in the pipeline. Output data is easily accessed, displayed and exported in a tidy format.

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages