You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+42-5Lines changed: 42 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,13 +7,13 @@
7
7
8
8
quantms-rescoring is a Python tool that aims to add features to peptide-spectrum matches (PSMs) in idXML files using multiple tools including SAGE features, quantms spectrum features, MS2PIP and DeepLC. It is part of the quantms ecosystem package and leverages the MS²Rescore framework to improve identification confidence in proteomics data analysis.
9
9
10
-
###Core Components
10
+
## Core Components
11
11
12
12
-**Annotator Engine**: Integrates [MS2PIP](https://github.com/compomics/ms2pip) and [DeepLC](https://github.com/compomics/DeepLC) models to improve peptide-spectrum match (PSM) confidence.
13
13
-**Feature Generation**: Extracts signal-to-noise ratios, spectrum metrics, SAGE extra features and add them to each PSM for posterior downstream with Percolator.
14
14
-**OpenMS Integration**: Processes idXML and mzML files with custom validation methods.
15
15
16
-
###CLI Tools
16
+
## CLI Tools
17
17
18
18
```sh
19
19
quantms-rescoring msrescore2feature --help
@@ -30,7 +30,45 @@ Incorporates additional features from SAGE into idXML files.
30
30
```
31
31
Add additional spectrum feature like signal-to-noise to each PSM in the idXML.
32
32
33
-
### Technical Implementation Details
33
+
## Advanced Algorithms and Improvements
34
+
35
+
quantms-rescoring significantly enhances the capabilities of MS2PIP, DeepLC, and MS2Rescore through several innovative approaches:
36
+
37
+
### MS2PIP Integration Enhancements
38
+
39
+
-**Intelligent Model Selection**: Automatically evaluates and selects the optimal MS2PIP model for each dataset based on fragmentation type and correlation quality. If the user-selected model performs poorly, the system will intelligently search for a better alternative.
40
+
-**Adaptive MS2 Tolerance**: Dynamically adjusts MS2 tolerance based on the dataset characteristics, analyzing both reported and predicted tolerances to find the optimal setting.
41
+
-**Correlation Validation**: Implements a robust validation system that ensures the selected model achieves sufficient correlation with experimental spectra, preventing the use of inappropriate models.
42
+
-**Enhanced Spectrum Processing**: Uses OpenMS for spectrum file reading instead of ms2rescore_rs, providing better compatibility with a wider range of mzML files and formats.
43
+
44
+
### DeepLC Innovations
45
+
46
+
-**Model Optimization**: Automatically benchmarks pretrained vs. retrained DeepLC models for each dataset, selecting the one with the lowest Mean Absolute Error (MAE) for retention time prediction.
47
+
-**Per-Run Calibration**: Calibrates DeepLC models for each run to account for chromatographic variations between experiments, improving prediction accuracy.
48
+
-**Best Peptide Retention Time**: Tracks the best retention time prediction for each peptide across multiple PSMs, providing more reliable retention time features.
49
+
-**Transfer Learning**: Leverages transfer learning to adapt models to specific experimental conditions, improving prediction accuracy for challenging datasets.
50
+
51
+
### Spectrum Feature Analysis
52
+
53
+
Unlike traditional rescoring approaches, quantms-rescoring incorporates advanced spectrum quality metrics:
54
+
55
+
-**Signal-to-Noise Ratio (SNR)**: Calculates the ratio of maximum intensity to background noise, providing a robust measure of spectrum quality.
56
+
-**Spectral Entropy**: Quantifies the uniformity of peak distribution, helping to distinguish between high and low-quality spectra.
57
+
-**TIC Distribution Analysis**: Analyzes the distribution of Total Ion Current across peaks, identifying spectra with concentrated signal in top peaks.
58
+
-**Weighted m/z Standard Deviation**: Estimates spectral complexity by calculating the intensity-weighted standard deviation of m/z values.
59
+
60
+
### SAGE Feature Integration
61
+
62
+
-**Seamless Integration**: Incorporates additional features from SAGE (Spectrum Agnostic Generation of Embeddings) into the rescoring pipeline.
63
+
-**Feature Validation**: Ensures all features are properly validated and formatted for compatibility with OpenMS and downstream tools.
64
+
65
+
### Advantages Over Existing Tools
66
+
67
+
-**Compared to MS2PIP**: Adds automatic model selection, validation, and tolerance optimization, eliminating the need for manual parameter tuning.
68
+
-**Compared to DeepLC**: Provides automatic model selection between pretrained and retrained models, with per-run calibration for improved accuracy.
69
+
-**Compared to MS2Rescore**: Offers a more comprehensive feature set including spectrum quality metrics, better integration with OpenMS, and improved handling of different fragmentation methods and MS levels.
70
+
71
+
## Technical Implementation Details
34
72
35
73
#### Model Selection and Optimization
36
74
@@ -227,5 +265,4 @@ Install quantms-rescoring using one of the following methods:
227
265
228
266
### Issues and Contributions
229
267
230
-
For any issues or contributions, please open an issue in the [GitHub repository](https://github.com/bigbio/quantms/issues) - we use the quantms repo to control all issues—or PR in the [GitHub repository](https://github.com/bigbio/quantms-rescoring/pulls).
231
-
268
+
For any issues or contributions, please open an issue in the [GitHub repository](https://github.com/bigbio/quantms/issues) - we use the quantms repo to control all issues—or PR in the [GitHub repository](https://github.com/bigbio/quantms-rescoring/pulls).
0 commit comments