Skip to content

Commit ec3370f

Browse files
authored
Merge pull request #2 from bigbio/dev
Dev
2 parents 0b0258c + 7e708d4 commit ec3370f

24 files changed

+551477
-517
lines changed

.github/workflows/python-app.yml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,23 @@ jobs:
3838
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
3939
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
4040
- name: Test with pytest
41+
env:
42+
CUDA_VISIBLE_DEVICES: "-1"
4143
run: |
4244
poetry run pytest
43-
- name: Download test data
45+
- name: Download test deeplc_models
46+
env:
47+
CUDA_VISIBLE_DEVICES: "-1"
4448
run: |
4549
wget https://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-ci-github/quantms-utils/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML
4650
wget https://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-ci-github/quantms-utils/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01_comet.idXML
4751
- name: Test percolator ms2rescore
52+
env:
53+
CUDA_VISIBLE_DEVICES: "-1"
4854
run: |
49-
rescoring ms2rescore --psm_file TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01_comet.idXML --spectrum_path TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML --processes 2 --ms2pip_model HCD2021 --feature_generators 'ms2pip,deeplc' --id_decoy_pattern ^rev --test_fdr 0.05
55+
rescoring msrescore2feature --idxml TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01_comet.idXML --mzml TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML --processes 2 --ms2pip_model HCD2021 --feature_generators 'ms2pip,deeplc' --id_decoy_pattern ^rev
56+
- name: Upload coverage reports to Codecov
57+
uses: codecov/codecov-action@v5
58+
with:
59+
token: ${{ secrets.CODECOV_TOKEN }}
60+
slug: bigbio/quantms-rescoring

.github/workflows/python-package.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,12 @@ jobs:
3939
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
4040
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
4141
- name: Test with pytest
42+
env:
43+
CUDA_VISIBLE_DEVICES: "-1"
4244
run: |
4345
pytest
4446
- name: Test commandline tool
47+
env:
48+
CUDA_VISIBLE_DEVICES: "-1"
4549
run: |
4650
rescoring --help

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,3 +160,5 @@ cython_debug/
160160
# and can be added to the global gitignore or merged into this file. For a more nuclear
161161
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
162162
.idea/
163+
164+
.qodo

README.md

Lines changed: 213 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,224 @@
55
[![PyPI version](https://badge.fury.io/py/quantms-rescoring.svg)](https://badge.fury.io/py/quantms-rescoring)
66
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
77

8-
quantms-rescoring is a Python tool for rescoring peptide-spectrum matches (PSMs) in idXML files. It is part of the quantms ecosystem package and leverages the MS²Rescore framework to improve identification confidence in proteomics data analysis.
8+
quantms-rescoring is a Python tool that aims to add features to peptide-spectrum matches (PSMs) in idXML files using multiple tools including SAGE features, quantms spectrum features, MS2PIP and DeepLC. It is part of the quantms ecosystem package and leverages the MS²Rescore framework to improve identification confidence in proteomics data analysis.
99

10-
## Features
10+
### Core Components
1111

12-
- Enhanced Rescoring: Utilizes advanced rescoring engines like Percolator to refine PSM scores.
13-
- Flexible Feature Generators: Supports feature extraction using tools like MS²PIP, DeepLC, and custom generators.
14-
- Metadata Retention: Preserves essential metadata from the input idXML files.
15-
- Error Handling: Skips invalid PSMs and logs issues for transparent processing.
16-
- Seamless Integration: Built to integrate into proteomics workflows.
12+
- **Annotator Engine**: Integrates [MS2PIP](https://github.com/compomics/ms2pip) and [DeepLC](https://github.com/compomics/DeepLC) models to improve peptide-spectrum match (PSM) confidence.
13+
- **Feature Generation**: Extracts signal-to-noise ratios, spectrum metrics, SAGE extra features and add them to each PSM for posterior downstream with Percolator.
14+
- **OpenMS Integration**: Processes idXML and mzML files with custom validation methods.
1715

18-
## Installation
16+
### CLI Tools
1917

20-
To use quantms-rescoring, ensure the following dependencies are installed:
18+
```sh
19+
quantms-rescoring msrescore2feature --help
20+
```
21+
Annotates PSMs with prediction-based features from MS2PIP and DeepLC
2122

22-
- Python 3.8+
23-
- click
24-
- pyopenms
25-
- ms2rescore
26-
- psm_utils
23+
```sh
24+
quantms-rescoring add_sage_feature --help
25+
```
26+
Incorporates additional features from SAGE into idXML files.
27+
28+
```sh
29+
quantms-rescoring spectrum2feature --help
30+
```
31+
Add additional spectrum feature like signal-to-noise to each PSM in the idXML.
32+
33+
### Technical Implementation Details
34+
35+
#### Model Selection and Optimization
36+
37+
- **MS2PIP Model Selection**:
38+
- Automatically evaluate the quality of the MS2PIP model selected by the user. If the correlation between predicted and experiemtanl spectra is lower than a given threshold, we will try to find the best model to use (`annotator.py`)
39+
- **DeepLC Model Selection**:
40+
- Automatically select the best DeepLC model for each run based on the retention time calibration and prediction accuracy. Different to ms2rescore, the tool will try to use the best model from MS2PIP and benchmark it with the same model by using transfer learning (`annotator.py`). The best model is selected to be used to predict the retention time of PSMs.
41+
42+
#### Feature Engineering Pipeline
43+
44+
- **Retention Time Analysis**:
45+
- Calibrates DeepLC models per run to account for chromatographic variations.
46+
- Calculates delta RT (predicted vs. observed) as a discriminative feature
47+
- Normalizes RT differences for cross-run comparability
48+
49+
- **Spectral Feature Extraction**:
50+
- Computes signal-to-noise ratio using maximum intensity relative to background noise
51+
- Calculates spectral entropy to quantify peak distribution uniformity
52+
- Analyzes TIC (Total Ion Current) distribution across peaks for quality assessment
53+
- Determines weighted standard deviation of m/z values for spectral complexity estimation
54+
- **Feature Selection**: The parameters `only_features` allows to select the features to be added to the idXML file. For example: `--only_features "DeepLC:RtDiff,DeepLC:PredictedRetentionTimeBest,Ms2pip:DotProd"`.
55+
56+
##### Features
57+
58+
<details>
59+
<summary>MS2PIP Feature Mapping Table</summary>
60+
61+
| MMS2Rescore MS2PIP Feature | quantms-rescoring Name |
62+
|--------------------------------|-----------------------------------|
63+
| spec_pearson | MS2PIP:SpecPearson |
64+
| cos_norm | MS2PIP:SpecCosineNorm |
65+
| spec_pearson_norm | MS2PIP:SpecPearsonNorm |
66+
| dotprod | MS2PIP:DotProd |
67+
| ionb_pearson_norm | MS2PIP:IonBPearsonNorm |
68+
| iony_pearson_norm | MS2PIP:IonYPearsonNorm |
69+
| spec_mse_norm | MS2PIP:SpecMseNorm |
70+
| ionb_mse_norm | MS2PIP:IonBMseNorm |
71+
| iony_mse_norm | MS2PIP:IonYMseNorm |
72+
| min_abs_diff_norm | MS2PIP:MinAbsDiffNorm |
73+
| max_abs_diff_norm | MS2PIP:MaxAbsDiffNorm |
74+
| abs_diff_Q1_norm | MS2PIP:AbsDiffQ1Norm |
75+
| abs_diff_Q2_norm | MS2PIP:AbsDiffQ2Norm |
76+
| abs_diff_Q3_norm | MS2PIP:AbsDiffQ3Norm |
77+
| mean_abs_diff_norm | MS2PIP:MeanAbsDiffNorm |
78+
| std_abs_diff_norm | MS2PIP:StdAbsDiffNorm |
79+
| ionb_min_abs_diff_norm | MS2PIP:IonBMinAbsDiffNorm |
80+
| ionb_max_abs_diff_norm | MS2PIP:IonBMaxAbsDiffNorm |
81+
| ionb_abs_diff_Q1_norm | MS2PIP:IonBAbsDiffQ1Norm |
82+
| ionb_abs_diff_Q2_norm | MS2PIP:IonBAbsDiffQ2Norm |
83+
| ionb_abs_diff_Q3_norm | MS2PIP:IonBAbsDiffQ3Norm |
84+
| ionb_mean_abs_diff_norm | MS2PIP:IonBMeanAbsDiffNorm |
85+
| ionb_std_abs_diff_norm | MS2PIP:IonBStdAbsDiffNorm |
86+
| iony_min_abs_diff_norm | MS2PIP:IonYMinAbsDiffNorm |
87+
| iony_max_abs_diff_norm | MS2PIP:IonYMaxAbsDiffNorm |
88+
| iony_abs_diff_Q1_norm | MS2PIP:IonYAbsDiffQ1Norm |
89+
| iony_abs_diff_Q2_norm | MS2PIP:IonYAbsDiffQ2Norm |
90+
| iony_abs_diff_Q3_norm | MS2PIP:IonYAbsDiffQ3Norm |
91+
| iony_mean_abs_diff_norm | MS2PIP:IonYMeanAbsDiffNorm |
92+
| iony_std_abs_diff_norm | MS2PIP:IonYStdAbsDiffNorm |
93+
| dotprod_norm | MS2PIP:DotProdNorm |
94+
| dotprod_ionb_norm | MS2PIP:DotProdIonBNorm |
95+
| dotprod_iony_norm | MS2PIP:DotProdIonYNorm |
96+
| cos_ionb_norm | MS2PIP:CosIonBNorm |
97+
| cos_iony_norm | MS2PIP:CosIonYNorm |
98+
| ionb_pearson | MS2PIP:IonBPearson |
99+
| iony_pearson | MS2PIP:IonYPearson |
100+
| spec_spearman | MS2PIP:SpecSpearman |
101+
| ionb_spearman | MS2PIP:IonBSpearman |
102+
| iony_spearman | MS2PIP:IonYSpearman |
103+
| spec_mse | MS2PIP:SpecMse |
104+
| ionb_mse | MS2PIP:IonBMse |
105+
| iony_mse | MS2PIP:IonYMse |
106+
| min_abs_diff_iontype | MS2PIP:MinAbsDiffIonType |
107+
| max_abs_diff_iontype | MS2PIP:MaxAbsDiffIonType |
108+
| min_abs_diff | MS2PIP:MinAbsDiff |
109+
| max_abs_diff | MS2PIP:MaxAbsDiff |
110+
| abs_diff_Q1 | MS2PIP:AbsDiffQ1 |
111+
| abs_diff_Q2 | MS2PIP:AbsDiffQ2 |
112+
| abs_diff_Q3 | MS2PIP:AbsDiffQ3 |
113+
| mean_abs_diff | MS2PIP:MeanAbsDiff |
114+
| std_abs_diff | MS2PIP:StdAbsDiff |
115+
| ionb_min_abs_diff | MS2PIP:IonBMinAbsDiff |
116+
| ionb_max_abs_diff | MS2PIP:IonBMaxAbsDiff |
117+
| ionb_abs_diff_Q1 | MS2PIP:IonBAbsDiffQ1 |
118+
| ionb_abs_diff_Q2 | MS2PIP:IonBAbsDiffQ2 |
119+
| ionb_abs_diff_Q3 | MS2PIP:IonBAbsDiffQ3 |
120+
| ionb_mean_abs_diff | MS2PIP:IonBMeanAbsDiff |
121+
| ionb_std_abs_diff | MS2PIP:IonBStdAbsDiff |
122+
| iony_min_abs_diff | MS2PIP:IonYMinAbsDiff |
123+
| iony_max_abs_diff | MS2PIP:IonYMaxAbsDiff |
124+
| iony_abs_diff_Q1 | MS2PIP:IonYAbsDiffQ1 |
125+
| iony_abs_diff_Q2 | MS2PIP:IonYAbsDiffQ2 |
126+
| iony_abs_diff_Q3 | MS2PIP:IonYAbsDiffQ3 |
127+
| iony_mean_abs_diff | MS2PIP:IonYMeanAbsDiff |
128+
| iony_std_abs_diff | MS2PIP:IonYStdAbsDiff |
129+
| dotprod_ionb | MS2PIP:DotProdIonB |
130+
| dotprod_iony | MS2PIP:DotProdIonY |
131+
| cos_ionb | MS2PIP:CosIonB |
132+
| cos_iony | MS2PIP:CosIonY |
133+
134+
</details>
135+
136+
<details>
137+
<summary>DeepLC Feature Mapping Table</summary>
138+
139+
| MMS2Rescore DeepLC Feature | quantms-rescoring Name |
140+
|-------------------------------|-----------------------------------|
141+
| observed_retention_time | DeepLC:ObservedRetentionTime |
142+
| predicted_retention_time | DeepLC:PredictedRetentionTime |
143+
| rt_diff | DeepLC:RtDiff |
144+
| observed_retention_time_best | DeepLC:ObservedRetentionTimeBest |
145+
| predicted_retention_time_best | DeepLC:PredictedRetentionTimeBest |
146+
| rt_diff_best | DeepLC:RtDiffBest |
147+
148+
</details>
149+
150+
<details>
151+
<summary>Spectrum Feature Mapping Table</summary>
152+
153+
| Spectrum Feature | quantms-rescoring Name |
154+
|---------------------|-----------------------------------|
155+
| snr | Quantms:Snr |
156+
| spectral_entropy | Quantms:SpectralEntropy |
157+
| fraction_tic_top_10 | Quantms:FracTICinTop10Peaks |
158+
| weighted_std_mz | Quantms:WeightedStdMz |
159+
160+
</details>
161+
162+
#### Data Processing of idXML Files
163+
164+
- **Parallel Processing**: Implements multiprocessing capabilities for handling large datasets efficiently
165+
- **OpenMS Compatibility Layer**: Custom helper classes that gather statistics of number of PSMs by MS levels / dissociation methods, etc.
166+
- **Feature Validation**: Convert all Features from MS2PIP, DeepLC, and quantms into OpenMS features with well-established names (`constants.py`)
167+
- **PSM Filtering and Validation**:
168+
- Filter PSMs with **missing spectra information** or **empty peaks**.
169+
- Breaks the analysis of the input file contains more than one MS level or dissociation method, **only support for MS2 level** spectra.
170+
- **Output / Input files**:
171+
- Only works for OpenMS formats idXML, and mzML as input and export to idXML with the annotated features.
172+
173+
### Installation
174+
175+
Install quantms-rescoring using one of the following methods:
176+
177+
**Using `pip`**
178+
179+
```sh
180+
❯ pip install quantms-rescoring
181+
```
182+
183+
**Using `conda`**
184+
185+
```sh
186+
❯ conda install -c bioconda quantms-rescoring
187+
```
188+
189+
**Build from source:**
190+
191+
1. Clone the quantms-rescoring repository:
192+
193+
```sh
194+
❯ git clone https://github.com/bigbio/quantms-rescoring
195+
```
196+
197+
2. Navigate to the project directory:
198+
199+
```sh
200+
cd quantms-rescoring
201+
```
202+
203+
3. Install the project dependencies:
204+
205+
- Using `pip`:
206+
207+
```sh
208+
❯ pip install -r requirements.txt
209+
```
210+
211+
- Using `conda`:
212+
213+
```sh
214+
❯ conda env create -f environment.yml
215+
```
216+
217+
4. Install the package using `poetry`:
218+
219+
```sh
220+
❯ poetry install
221+
```
222+
223+
### TODO
224+
225+
- [ ] Add support for multiple Files combined idXML and mzML
27226

28227
### Issues and Contributions
29228

environment.yml

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,17 @@ channels:
55
- conda-forge
66
- nodefaults
77
dependencies:
8+
- python >=3.9,<3.12
89
- click
9-
- pyopenms>=2.4.0
10-
- pandas
11-
- numpy
12-
- ms2rescore=3.0.3
13-
- deepLC=2.2.38
14-
- psm-utils=0.8.3
15-
- scipy=1.13.1
10+
- pyopenms>=3.0
11+
- pandas >=1
12+
- numpy>=1.25
13+
- ms2rescore=3.1.4
14+
- deepLC>=3.0
15+
- psm-utils
16+
- scipy
1617
- pygam
17-
- protobuf=3.19.6
18+
- protobuf
19+
- pytest
20+
- ms2pip>=4.0
21+
- ms2rescore-rs

pyproject.toml

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name = "quantms-rescoring"
33
description = "quantms-rescoring: Python scripts and helpers for the quantMS workflow"
44
readme = "README.md"
55
license = "MIT"
6-
version = "0.0.4"
6+
version = "0.0.5"
77
authors = [
88
"Yasset Perez-Riverol <ypriverol@gmail.com>",
99
"Dai Chengxin <chengxin2024@126.com>",
@@ -29,17 +29,18 @@ packages = [
2929
]
3030

3131
[tool.poetry.dependencies]
32-
python = ">=3.8,<3.12"
32+
python = ">=3.9,<3.12"
3333
click = "*"
34-
pyopenms = "*"
35-
ms2rescore = "3.0.3"
34+
pyopenms = ">=3.0"
35+
ms2rescore = "3.1.4"
3636
pandas = "*"
37-
numpy = "*"
38-
psm-utils = "0.8.3"
39-
deepLC = "2.2.38"
40-
scipy = "1.13.1"
37+
numpy = ">=1.25"
38+
psm-utils = "*"
39+
deepLC = ">=3.0"
40+
scipy = "*"
4141
pygam = "*"
42-
protobuf= "3.19.6"
42+
protobuf= "*"
43+
ms2pip = ">=4.0"
4344

4445
[tool.poetry.urls]
4546
GitHub = "https://github.com/bigbio/quantms-rescoring"
@@ -59,4 +60,4 @@ target-version = ["py39"]
5960

6061
[build-system]
6162
requires = ["poetry-core>=1.2.0"]
62-
build-backend = "poetry.core.masonry.api"
63+
build-backend = "poetry.core.masonry.api"

quantmsrescore/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.0.4"
1+
__version__ = "0.0.5"

0 commit comments

Comments
 (0)