Skip to content

Commit c33c796

Browse files
authored
Merge pull request #18 from bigbio/dev
major patch for ms2pip to use pyopenms instad of ms2rescore-rs
2 parents 3d87c3e + a89cad3 commit c33c796

18 files changed

+969
-227
lines changed

.github/workflows/python-app.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,8 @@ jobs:
4141
env:
4242
CUDA_VISIBLE_DEVICES: "-1"
4343
run: |
44-
poetry run pytest
45-
- name: Download test deeplc_models
44+
poetry run pytest -vv
45+
- name: Download test files
4646
env:
4747
CUDA_VISIBLE_DEVICES: "-1"
4848
run: |
@@ -57,4 +57,4 @@ jobs:
5757
uses: codecov/codecov-action@v5
5858
with:
5959
token: ${{ secrets.CODECOV_TOKEN }}
60-
slug: bigbio/quantms-rescoring
60+
slug: bigbio/quantms-rescoring

.github/workflows/python-package.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ jobs:
4343
env:
4444
CUDA_VISIBLE_DEVICES: "-1"
4545
run: |
46-
pytest --cov-branch --cov-report=xml
46+
pytest -vv --cov-branch --cov-report=xml
4747
- name: Upload coverage reports to Codecov
4848
uses: codecov/codecov-action@v5
4949
with:
@@ -53,4 +53,4 @@ jobs:
5353
env:
5454
CUDA_VISIBLE_DEVICES: "-1"
5555
run: |
56-
rescoring --help
56+
rescoring --help

README.md

Lines changed: 42 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@
77

88
quantms-rescoring is a Python tool that aims to add features to peptide-spectrum matches (PSMs) in idXML files using multiple tools including SAGE features, quantms spectrum features, MS2PIP and DeepLC. It is part of the quantms ecosystem package and leverages the MS²Rescore framework to improve identification confidence in proteomics data analysis.
99

10-
### Core Components
10+
## Core Components
1111

1212
- **Annotator Engine**: Integrates [MS2PIP](https://github.com/compomics/ms2pip) and [DeepLC](https://github.com/compomics/DeepLC) models to improve peptide-spectrum match (PSM) confidence.
1313
- **Feature Generation**: Extracts signal-to-noise ratios, spectrum metrics, SAGE extra features and add them to each PSM for posterior downstream with Percolator.
1414
- **OpenMS Integration**: Processes idXML and mzML files with custom validation methods.
1515

16-
### CLI Tools
16+
## CLI Tools
1717

1818
```sh
1919
quantms-rescoring msrescore2feature --help
@@ -30,7 +30,45 @@ Incorporates additional features from SAGE into idXML files.
3030
```
3131
Add additional spectrum feature like signal-to-noise to each PSM in the idXML.
3232

33-
### Technical Implementation Details
33+
## Advanced Algorithms and Improvements
34+
35+
quantms-rescoring significantly enhances the capabilities of MS2PIP, DeepLC, and MS2Rescore through several innovative approaches:
36+
37+
### MS2PIP Integration Enhancements
38+
39+
- **Intelligent Model Selection**: Automatically evaluates and selects the optimal MS2PIP model for each dataset based on fragmentation type and correlation quality. If the user-selected model performs poorly, the system will intelligently search for a better alternative.
40+
- **Adaptive MS2 Tolerance**: Dynamically adjusts MS2 tolerance based on the dataset characteristics, analyzing both reported and predicted tolerances to find the optimal setting.
41+
- **Correlation Validation**: Implements a robust validation system that ensures the selected model achieves sufficient correlation with experimental spectra, preventing the use of inappropriate models.
42+
- **Enhanced Spectrum Processing**: Uses OpenMS for spectrum file reading instead of ms2rescore_rs, providing better compatibility with a wider range of mzML files and formats.
43+
44+
### DeepLC Innovations
45+
46+
- **Model Optimization**: Automatically benchmarks pretrained vs. retrained DeepLC models for each dataset, selecting the one with the lowest Mean Absolute Error (MAE) for retention time prediction.
47+
- **Per-Run Calibration**: Calibrates DeepLC models for each run to account for chromatographic variations between experiments, improving prediction accuracy.
48+
- **Best Peptide Retention Time**: Tracks the best retention time prediction for each peptide across multiple PSMs, providing more reliable retention time features.
49+
- **Transfer Learning**: Leverages transfer learning to adapt models to specific experimental conditions, improving prediction accuracy for challenging datasets.
50+
51+
### Spectrum Feature Analysis
52+
53+
Unlike traditional rescoring approaches, quantms-rescoring incorporates advanced spectrum quality metrics:
54+
55+
- **Signal-to-Noise Ratio (SNR)**: Calculates the ratio of maximum intensity to background noise, providing a robust measure of spectrum quality.
56+
- **Spectral Entropy**: Quantifies the uniformity of peak distribution, helping to distinguish between high and low-quality spectra.
57+
- **TIC Distribution Analysis**: Analyzes the distribution of Total Ion Current across peaks, identifying spectra with concentrated signal in top peaks.
58+
- **Weighted m/z Standard Deviation**: Estimates spectral complexity by calculating the intensity-weighted standard deviation of m/z values.
59+
60+
### SAGE Feature Integration
61+
62+
- **Seamless Integration**: Incorporates additional features from SAGE (Spectrum Agnostic Generation of Embeddings) into the rescoring pipeline.
63+
- **Feature Validation**: Ensures all features are properly validated and formatted for compatibility with OpenMS and downstream tools.
64+
65+
### Advantages Over Existing Tools
66+
67+
- **Compared to MS2PIP**: Adds automatic model selection, validation, and tolerance optimization, eliminating the need for manual parameter tuning.
68+
- **Compared to DeepLC**: Provides automatic model selection between pretrained and retrained models, with per-run calibration for improved accuracy.
69+
- **Compared to MS2Rescore**: Offers a more comprehensive feature set including spectrum quality metrics, better integration with OpenMS, and improved handling of different fragmentation methods and MS levels.
70+
71+
## Technical Implementation Details
3472

3573
#### Model Selection and Optimization
3674

@@ -227,5 +265,4 @@ Install quantms-rescoring using one of the following methods:
227265

228266
### Issues and Contributions
229267

230-
For any issues or contributions, please open an issue in the [GitHub repository](https://github.com/bigbio/quantms/issues) - we use the quantms repo to control all issues—or PR in the [GitHub repository](https://github.com/bigbio/quantms-rescoring/pulls).
231-
268+
For any issues or contributions, please open an issue in the [GitHub repository](https://github.com/bigbio/quantms/issues) - we use the quantms repo to control all issues—or PR in the [GitHub repository](https://github.com/bigbio/quantms-rescoring/pulls).

environment.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,3 @@ dependencies:
1818
- protobuf
1919
- pytest
2020
- ms2pip>=4.0
21-
- ms2rescore-rs

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name = "quantms-rescoring"
33
description = "quantms-rescoring: Python scripts and helpers for the quantMS workflow"
44
readme = "README.md"
55
license = "MIT"
6-
version = "0.0.5"
6+
version = "0.0.6"
77
authors = [
88
"Yasset Perez-Riverol <ypriverol@gmail.com>",
99
"Dai Chengxin <chengxin2024@126.com>",

quantmsrescore/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.0.5"
1+
__version__ = "0.0.6"

0 commit comments

Comments
 (0)