|
5 | 5 | [](https://badge.fury.io/py/quantms-rescoring)
|
6 | 6 | [](https://opensource.org/licenses/Apache-2.0)
|
7 | 7 |
|
8 |
| -quantms-rescoring is a Python tool for rescoring peptide-spectrum matches (PSMs) in idXML files. It is part of the quantms ecosystem package and leverages the MS²Rescore framework to improve identification confidence in proteomics data analysis. |
| 8 | +quantms-rescoring is a Python tool that aims to add features to peptide-spectrum matches (PSMs) in idXML files using multiple tools including SAGE features, quantms spectrum features, MS2PIP and DeepLC. It is part of the quantms ecosystem package and leverages the MS²Rescore framework to improve identification confidence in proteomics data analysis. |
9 | 9 |
|
10 |
| -## Features |
| 10 | +### Core Components |
11 | 11 |
|
12 |
| -- Enhanced Rescoring: Utilizes advanced rescoring engines like Percolator to refine PSM scores. |
13 |
| -- Flexible Feature Generators: Supports feature extraction using tools like MS²PIP, DeepLC, and custom generators. |
14 |
| -- Metadata Retention: Preserves essential metadata from the input idXML files. |
15 |
| -- Error Handling: Skips invalid PSMs and logs issues for transparent processing. |
16 |
| -- Seamless Integration: Built to integrate into proteomics workflows. |
| 12 | +- **Annotator Engine**: Integrates [MS2PIP](https://github.com/compomics/ms2pip) and [DeepLC](https://github.com/compomics/DeepLC) models to improve peptide-spectrum match (PSM) confidence. |
| 13 | +- **Feature Generation**: Extracts signal-to-noise ratios, spectrum metrics, SAGE extra features and add them to each PSM for posterior downstream with Percolator. |
| 14 | +- **OpenMS Integration**: Processes idXML and mzML files with custom validation methods. |
17 | 15 |
|
18 |
| -## Installation |
| 16 | +### CLI Tools |
19 | 17 |
|
20 |
| -To use quantms-rescoring, ensure the following dependencies are installed: |
| 18 | +```sh |
| 19 | + quantms-rescoring msrescore2feature --help |
| 20 | +``` |
| 21 | +Annotates PSMs with prediction-based features from MS2PIP and DeepLC |
21 | 22 |
|
22 |
| -- Python 3.8+ |
23 |
| -- click |
24 |
| -- pyopenms |
25 |
| -- ms2rescore |
26 |
| -- psm_utils |
| 23 | +```sh |
| 24 | + quantms-rescoring add_sage_feature --help |
| 25 | +``` |
| 26 | +Incorporates additional features from SAGE into idXML files. |
| 27 | + |
| 28 | +```sh |
| 29 | + quantms-rescoring spectrum2feature --help |
| 30 | +``` |
| 31 | +Add additional spectrum feature like signal-to-noise to each PSM in the idXML. |
| 32 | + |
| 33 | +### Technical Implementation Details |
| 34 | + |
| 35 | +#### Model Selection and Optimization |
| 36 | + |
| 37 | +- **MS2PIP Model Selection**: |
| 38 | + - Automatically evaluate the quality of the MS2PIP model selected by the user. If the correlation between predicted and experiemtanl spectra is lower than a given threshold, we will try to find the best model to use (`annotator.py`) |
| 39 | +- **DeepLC Model Selection**: |
| 40 | + - Automatically select the best DeepLC model for each run based on the retention time calibration and prediction accuracy. Different to ms2rescore, the tool will try to use the best model from MS2PIP and benchmark it with the same model by using transfer learning (`annotator.py`). The best model is selected to be used to predict the retention time of PSMs. |
| 41 | + |
| 42 | +#### Feature Engineering Pipeline |
| 43 | + |
| 44 | +- **Retention Time Analysis**: |
| 45 | + - Calibrates DeepLC models per run to account for chromatographic variations. |
| 46 | + - Calculates delta RT (predicted vs. observed) as a discriminative feature |
| 47 | + - Normalizes RT differences for cross-run comparability |
| 48 | + |
| 49 | +- **Spectral Feature Extraction**: |
| 50 | + - Computes signal-to-noise ratio using maximum intensity relative to background noise |
| 51 | + - Calculates spectral entropy to quantify peak distribution uniformity |
| 52 | + - Analyzes TIC (Total Ion Current) distribution across peaks for quality assessment |
| 53 | + - Determines weighted standard deviation of m/z values for spectral complexity estimation |
| 54 | +- **Feature Selection**: The parameters `only_features` allows to select the features to be added to the idXML file. For example: `--only_features "DeepLC:RtDiff,DeepLC:PredictedRetentionTimeBest,Ms2pip:DotProd"`. |
| 55 | + |
| 56 | +##### Features |
| 57 | + |
| 58 | +<details> |
| 59 | +<summary>MS2PIP Feature Mapping Table</summary> |
| 60 | + |
| 61 | +| MMS2Rescore MS2PIP Feature | quantms-rescoring Name | |
| 62 | +|--------------------------------|-----------------------------------| |
| 63 | +| spec_pearson | MS2PIP:SpecPearson | |
| 64 | +| cos_norm | MS2PIP:SpecCosineNorm | |
| 65 | +| spec_pearson_norm | MS2PIP:SpecPearsonNorm | |
| 66 | +| dotprod | MS2PIP:DotProd | |
| 67 | +| ionb_pearson_norm | MS2PIP:IonBPearsonNorm | |
| 68 | +| iony_pearson_norm | MS2PIP:IonYPearsonNorm | |
| 69 | +| spec_mse_norm | MS2PIP:SpecMseNorm | |
| 70 | +| ionb_mse_norm | MS2PIP:IonBMseNorm | |
| 71 | +| iony_mse_norm | MS2PIP:IonYMseNorm | |
| 72 | +| min_abs_diff_norm | MS2PIP:MinAbsDiffNorm | |
| 73 | +| max_abs_diff_norm | MS2PIP:MaxAbsDiffNorm | |
| 74 | +| abs_diff_Q1_norm | MS2PIP:AbsDiffQ1Norm | |
| 75 | +| abs_diff_Q2_norm | MS2PIP:AbsDiffQ2Norm | |
| 76 | +| abs_diff_Q3_norm | MS2PIP:AbsDiffQ3Norm | |
| 77 | +| mean_abs_diff_norm | MS2PIP:MeanAbsDiffNorm | |
| 78 | +| std_abs_diff_norm | MS2PIP:StdAbsDiffNorm | |
| 79 | +| ionb_min_abs_diff_norm | MS2PIP:IonBMinAbsDiffNorm | |
| 80 | +| ionb_max_abs_diff_norm | MS2PIP:IonBMaxAbsDiffNorm | |
| 81 | +| ionb_abs_diff_Q1_norm | MS2PIP:IonBAbsDiffQ1Norm | |
| 82 | +| ionb_abs_diff_Q2_norm | MS2PIP:IonBAbsDiffQ2Norm | |
| 83 | +| ionb_abs_diff_Q3_norm | MS2PIP:IonBAbsDiffQ3Norm | |
| 84 | +| ionb_mean_abs_diff_norm | MS2PIP:IonBMeanAbsDiffNorm | |
| 85 | +| ionb_std_abs_diff_norm | MS2PIP:IonBStdAbsDiffNorm | |
| 86 | +| iony_min_abs_diff_norm | MS2PIP:IonYMinAbsDiffNorm | |
| 87 | +| iony_max_abs_diff_norm | MS2PIP:IonYMaxAbsDiffNorm | |
| 88 | +| iony_abs_diff_Q1_norm | MS2PIP:IonYAbsDiffQ1Norm | |
| 89 | +| iony_abs_diff_Q2_norm | MS2PIP:IonYAbsDiffQ2Norm | |
| 90 | +| iony_abs_diff_Q3_norm | MS2PIP:IonYAbsDiffQ3Norm | |
| 91 | +| iony_mean_abs_diff_norm | MS2PIP:IonYMeanAbsDiffNorm | |
| 92 | +| iony_std_abs_diff_norm | MS2PIP:IonYStdAbsDiffNorm | |
| 93 | +| dotprod_norm | MS2PIP:DotProdNorm | |
| 94 | +| dotprod_ionb_norm | MS2PIP:DotProdIonBNorm | |
| 95 | +| dotprod_iony_norm | MS2PIP:DotProdIonYNorm | |
| 96 | +| cos_ionb_norm | MS2PIP:CosIonBNorm | |
| 97 | +| cos_iony_norm | MS2PIP:CosIonYNorm | |
| 98 | +| ionb_pearson | MS2PIP:IonBPearson | |
| 99 | +| iony_pearson | MS2PIP:IonYPearson | |
| 100 | +| spec_spearman | MS2PIP:SpecSpearman | |
| 101 | +| ionb_spearman | MS2PIP:IonBSpearman | |
| 102 | +| iony_spearman | MS2PIP:IonYSpearman | |
| 103 | +| spec_mse | MS2PIP:SpecMse | |
| 104 | +| ionb_mse | MS2PIP:IonBMse | |
| 105 | +| iony_mse | MS2PIP:IonYMse | |
| 106 | +| min_abs_diff_iontype | MS2PIP:MinAbsDiffIonType | |
| 107 | +| max_abs_diff_iontype | MS2PIP:MaxAbsDiffIonType | |
| 108 | +| min_abs_diff | MS2PIP:MinAbsDiff | |
| 109 | +| max_abs_diff | MS2PIP:MaxAbsDiff | |
| 110 | +| abs_diff_Q1 | MS2PIP:AbsDiffQ1 | |
| 111 | +| abs_diff_Q2 | MS2PIP:AbsDiffQ2 | |
| 112 | +| abs_diff_Q3 | MS2PIP:AbsDiffQ3 | |
| 113 | +| mean_abs_diff | MS2PIP:MeanAbsDiff | |
| 114 | +| std_abs_diff | MS2PIP:StdAbsDiff | |
| 115 | +| ionb_min_abs_diff | MS2PIP:IonBMinAbsDiff | |
| 116 | +| ionb_max_abs_diff | MS2PIP:IonBMaxAbsDiff | |
| 117 | +| ionb_abs_diff_Q1 | MS2PIP:IonBAbsDiffQ1 | |
| 118 | +| ionb_abs_diff_Q2 | MS2PIP:IonBAbsDiffQ2 | |
| 119 | +| ionb_abs_diff_Q3 | MS2PIP:IonBAbsDiffQ3 | |
| 120 | +| ionb_mean_abs_diff | MS2PIP:IonBMeanAbsDiff | |
| 121 | +| ionb_std_abs_diff | MS2PIP:IonBStdAbsDiff | |
| 122 | +| iony_min_abs_diff | MS2PIP:IonYMinAbsDiff | |
| 123 | +| iony_max_abs_diff | MS2PIP:IonYMaxAbsDiff | |
| 124 | +| iony_abs_diff_Q1 | MS2PIP:IonYAbsDiffQ1 | |
| 125 | +| iony_abs_diff_Q2 | MS2PIP:IonYAbsDiffQ2 | |
| 126 | +| iony_abs_diff_Q3 | MS2PIP:IonYAbsDiffQ3 | |
| 127 | +| iony_mean_abs_diff | MS2PIP:IonYMeanAbsDiff | |
| 128 | +| iony_std_abs_diff | MS2PIP:IonYStdAbsDiff | |
| 129 | +| dotprod_ionb | MS2PIP:DotProdIonB | |
| 130 | +| dotprod_iony | MS2PIP:DotProdIonY | |
| 131 | +| cos_ionb | MS2PIP:CosIonB | |
| 132 | +| cos_iony | MS2PIP:CosIonY | |
| 133 | + |
| 134 | +</details> |
| 135 | + |
| 136 | +<details> |
| 137 | +<summary>DeepLC Feature Mapping Table</summary> |
| 138 | + |
| 139 | +| MMS2Rescore DeepLC Feature | quantms-rescoring Name | |
| 140 | +|-------------------------------|-----------------------------------| |
| 141 | +| observed_retention_time | DeepLC:ObservedRetentionTime | |
| 142 | +| predicted_retention_time | DeepLC:PredictedRetentionTime | |
| 143 | +| rt_diff | DeepLC:RtDiff | |
| 144 | +| observed_retention_time_best | DeepLC:ObservedRetentionTimeBest | |
| 145 | +| predicted_retention_time_best | DeepLC:PredictedRetentionTimeBest | |
| 146 | +| rt_diff_best | DeepLC:RtDiffBest | |
| 147 | + |
| 148 | +</details> |
| 149 | + |
| 150 | +<details> |
| 151 | +<summary>Spectrum Feature Mapping Table</summary> |
| 152 | + |
| 153 | +| Spectrum Feature | quantms-rescoring Name | |
| 154 | +|---------------------|-----------------------------------| |
| 155 | +| snr | Quantms:Snr | |
| 156 | +| spectral_entropy | Quantms:SpectralEntropy | |
| 157 | +| fraction_tic_top_10 | Quantms:FracTICinTop10Peaks | |
| 158 | +| weighted_std_mz | Quantms:WeightedStdMz | |
| 159 | + |
| 160 | +</details> |
| 161 | + |
| 162 | +#### Data Processing of idXML Files |
| 163 | + |
| 164 | +- **Parallel Processing**: Implements multiprocessing capabilities for handling large datasets efficiently |
| 165 | +- **OpenMS Compatibility Layer**: Custom helper classes that gather statistics of number of PSMs by MS levels / dissociation methods, etc. |
| 166 | +- **Feature Validation**: Convert all Features from MS2PIP, DeepLC, and quantms into OpenMS features with well-established names (`constants.py`) |
| 167 | +- **PSM Filtering and Validation**: |
| 168 | + - Filter PSMs with **missing spectra information** or **empty peaks**. |
| 169 | + - Breaks the analysis of the input file contains more than one MS level or dissociation method, **only support for MS2 level** spectra. |
| 170 | +- **Output / Input files**: |
| 171 | + - Only works for OpenMS formats idXML, and mzML as input and export to idXML with the annotated features. |
| 172 | + |
| 173 | +### Installation |
| 174 | + |
| 175 | +Install quantms-rescoring using one of the following methods: |
| 176 | + |
| 177 | +**Using `pip`** |
| 178 | + |
| 179 | +```sh |
| 180 | +❯ pip install quantms-rescoring |
| 181 | +``` |
| 182 | + |
| 183 | +**Using `conda`** |
| 184 | + |
| 185 | +```sh |
| 186 | +❯ conda install -c bioconda quantms-rescoring |
| 187 | +``` |
| 188 | + |
| 189 | +**Build from source:** |
| 190 | + |
| 191 | +1. Clone the quantms-rescoring repository: |
| 192 | + |
| 193 | + ```sh |
| 194 | + ❯ git clone https://github.com/bigbio/quantms-rescoring |
| 195 | + ``` |
| 196 | + |
| 197 | +2. Navigate to the project directory: |
| 198 | + |
| 199 | + ```sh |
| 200 | + ❯ cd quantms-rescoring |
| 201 | + ``` |
| 202 | + |
| 203 | +3. Install the project dependencies: |
| 204 | + |
| 205 | + - Using `pip`: |
| 206 | + |
| 207 | + ```sh |
| 208 | + ❯ pip install -r requirements.txt |
| 209 | + ``` |
| 210 | + |
| 211 | + - Using `conda`: |
| 212 | + |
| 213 | + ```sh |
| 214 | + ❯ conda env create -f environment.yml |
| 215 | + ``` |
| 216 | + |
| 217 | +4. Install the package using `poetry`: |
| 218 | + |
| 219 | + ```sh |
| 220 | + ❯ poetry install |
| 221 | + ``` |
| 222 | + |
| 223 | +### TODO |
| 224 | + |
| 225 | +- [ ] Add support for multiple Files combined idXML and mzML |
27 | 226 |
|
28 | 227 | ### Issues and Contributions
|
29 | 228 |
|
|
0 commit comments