Skip to content

Commit 4f8435d

Browse files
Merge pull request #37 from hgb-bin-proteomics/develop
Post Process v1.2.4
2 parents 4878642 + d8f834c commit 4f8435d

File tree

6 files changed

+143
-25
lines changed

6 files changed

+143
-25
lines changed

.github/workflows/python-app.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ jobs:
2727
- name: Copy scripts and data to "/tests"
2828
run: |
2929
cp create_spectral_library.py tests
30+
cp post_process.py tests
3031
cp config.py tests
3132
cp data/test_filter.xlsx .
3233
cp data/test_reverse_mods.xlsx .

POSTPROCESSING.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,17 @@ The following additional columns are annotated:
8787
- `PP.SequenceCoverageAlpha`: Sequence coverage of the alpha peptide covered by all ions (range: 0-1)
8888
- `PP.SequenceCoverageBeta`: Sequence coverage of the beta peptide covered by all ions (range: 0-1)
8989
- `PP.SequenceCoverageFull`: Sequence coverage of the full crosslink covered by all ions (range: 0-1)
90+
- `PP.UniScoreAlpha`: The [UniScore](https://doi.org/10.1016/j.mcpro.2025.101010) of the alpha peptide
91+
- `PP.UniScoreBeta`: The [UniScore](https://doi.org/10.1016/j.mcpro.2025.101010) of the beta peptide
92+
- `PP.UniScoreFull`: The [UniScore](https://doi.org/10.1016/j.mcpro.2025.101010) of the crosslink, which is the minimum UniScore
93+
- `PP.PepLenAlpha`: The length of the alpha peptide (number of amino acids)
94+
- `PP.PepLenBeta`: The length of the beta peptide (number of amino acids)
95+
- `PP.NumberCrosslinkFragmentsAlpha`: The number of fragment ions that contain a crosslink modification for the alpha peptide
96+
- `PP.NumberCrosslinkFragmentsBeta`: The number of fragment ions that contain a crosslink modification for the beta peptide
97+
- `PP.NumberCrosslinkFragmentsFull`: The number of fragment ions that contain a crosslink modification for the full crosslink
98+
- `PP.NormalizedCrosslinkFragmentsAlpha`: `PP.NumberCrosslinkFragmentsAlpha` but normalized by total ion count
99+
- `PP.NormalizedCrosslinkFragmentsBeta`: `PP.NumberCrosslinkFragmentsBeta` but normalized by total ion count
100+
- `PP.NormalizedCrosslinkFragmentsFull`: `PP.NumberCrosslinkFragmentsFull` but normalized by the sum of total ion counts
90101
- `PP.PseudoScanNumber`: An iterative number that acts as an ID to create pseudo CSMs
91102
- `PP.Crosslinker`: Name of the crosslinker
92103
- `PP.CrosslinkerMass`: Delta mass of the crosslinker

README.md

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -29,29 +29,6 @@ Starting with version [1.4.4](https://github.com/hgb-bin-proteomics/MSAnnika_Spe
2929
[xiSearch](https://www.rappsilberlab.org/software/xisearch/) with [xiFDR](https://www.rappsilberlab.org/software/xifdr/). Simply use the validated CSMs file from
3030
xiFDR (e.g. usually ending with extension `CSM_xiFDR*.*.*.csv` where `*` denotes the xiFDR version) as input for the `CSMS_FILE` parameter in the `config.py` file!
3131

32-
## GUI
33-
34-
![Screenshot](gui/screenshot.png)
35-
36-
> [!Important]
37-
> **The GUI currently only is supported up to version [1.1.6](https://github.com/hgb-bin-proteomics/MSAnnika_Spectral_Library_exporter/releases/tag/v1.1.6)!**
38-
>
39-
40-
Alternatively to the commandline-based python script, a GUI is also available via [Docker](https://www.docker.com/):
41-
- After [installing Docker](https://docs.docker.com/engine/install/) [[Quick Guide here](https://github.com/michabirklbauer/PIA/blob/master/DOCKER.md)] run the following command:
42-
```
43-
docker run -p 8501:8501 michabirklbauer/spectrallibraryexporter
44-
```
45-
- Navigate to `localhost:8501` in your browser. You should see the MS Annika Spectral Library exporter GUI!
46-
47-
If you don't have/want to install Docker you can also run the GUI natively using the following commands:
48-
- Open a terminal inside `MSAnnika_Spectral_Library_exporter`.
49-
- Enter `cp gui/streamlit_app.py .`.
50-
- Enter `cp gui/streamlit_util.py .`.
51-
- Enter `pip install streamlit`.
52-
- Enter `streamlit run streamlit_app.py --server.maxUploadSize 5000`.
53-
- Navigate to `localhost:8501` in your browser. You should see the MS Annika Spectral Library exporter GUI!
54-
5532
## Exporting MS Annika results to Microsoft Excel
5633

5734
The script uses a Micrsoft Excel files as input, for that MS Annika results need to be exported from Proteome Discoverer. It is recommended to first filter results according to your needs, e.g. filter for high-confidence CSMs and filter out decoy CSMs as depicted below.

create_spectral_library.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,14 @@
11
#!/usr/bin/env python3
2+
#
3+
# /// script
4+
# requires-python = ">=3.7"
5+
# dependencies = [
6+
# "pandas",
7+
# "openpyxl",
8+
# "tqdm",
9+
# "pyteomics",
10+
# ]
11+
# ///
212

313
# MS ANNIKA SPECTRAL LIBRARY EXPORTER
414
# 2023 (c) Micha Johannes Birklbauer

post_process.py

Lines changed: 105 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
#!/usr/bin/env python3
2+
#
23
# /// script
34
# requires-python = ">=3.7"
45
# dependencies = [
@@ -14,8 +15,8 @@
1415

1516

1617
# version tracking
17-
__version = "1.2.2"
18-
__date = "2025-07-29"
18+
__version = "1.2.6"
19+
__date = "2025-08-04"
1920

2021
# PARAMETERS
2122

@@ -48,6 +49,37 @@ def get_mz_key(mz: float) -> float:
4849
def get_fragment_key(mz: float) -> str:
4950
return f"{round(mz, 4):.4f}"
5051

52+
def get_kmers(unique_seq_positions: set) -> list:
53+
sorted_pos = sorted(unique_seq_positions)
54+
kmers = list()
55+
current_kmer = 1
56+
for i, pos in enumerate(sorted_pos):
57+
if i + 1 < len(unique_seq_positions):
58+
if sorted_pos[i + 1] == pos + 1:
59+
current_kmer += 1
60+
else:
61+
if current_kmer > 1:
62+
kmers.append(current_kmer)
63+
current_kmer = 1
64+
else:
65+
if current_kmer > 1:
66+
kmers.append(current_kmer)
67+
return kmers
68+
69+
def get_bool_from_value(value) -> bool:
70+
if isinstance(value, bool):
71+
return value
72+
elif isinstance(value, int):
73+
if value in [0, 1]:
74+
return bool(value)
75+
else:
76+
raise ValueError(f"Cannot parse bool value from the given input {value}.")
77+
elif isinstance(value, str):
78+
return "t" in value.lower()
79+
else:
80+
raise ValueError(f"Cannot parse bool value from the given input {value}.")
81+
return False
82+
5183
def get_key_spec_lib(row: pd.Series) -> str:
5284
# ModifiedPeptide
5385
# DAKQRIVDK_NGVKM[Oxidation]C[Carbamidomethyl]PR
@@ -111,6 +143,7 @@ def generate_fragment_index(spectronaut: pd.DataFrame, index: dict) -> Dict[str,
111143
fragment_annotation[key] = {"matched_number_ions_a": 0,
112144
"matched_number_ions_b": 0,
113145
"fragments": list(),
146+
"fragments_rows": list(),
114147
"ion_types": set()}
115148
# current fragment ion from spectronaut row
116149
ion = float(row[SPECTRONAUT_FRAGMENT_MZ_COLUMN_NAME])
@@ -136,6 +169,7 @@ def generate_fragment_index(spectronaut: pd.DataFrame, index: dict) -> Dict[str,
136169
if fragment_key not in fragment_annotation[key]["fragments"]:
137170
fragment_annotation[key]["matched_number_ions_a"] += 1
138171
fragment_annotation[key]["fragments"].append(fragment_key)
172+
fragment_annotation[key]["fragments_rows"].append(current_ion)
139173
fragment_annotation[key]["ion_types"].add(
140174
f"{current_ion['FragmentType']};{current_ion['FragmentNumber']};0"
141175
)
@@ -145,6 +179,7 @@ def generate_fragment_index(spectronaut: pd.DataFrame, index: dict) -> Dict[str,
145179
if fragment_key not in fragment_annotation[key]["fragments"]:
146180
fragment_annotation[key]["matched_number_ions_b"] += 1
147181
fragment_annotation[key]["fragments"].append(fragment_key)
182+
fragment_annotation[key]["fragments_rows"].append(current_ion)
148183
fragment_annotation[key]["ion_types"].add(
149184
f"{current_ion['FragmentType']};{current_ion['FragmentNumber']};1"
150185
)
@@ -485,6 +520,74 @@ def annotate_SequenceCoverage(row: pd.Series, fragment_annotation: dict, alpha:
485520
tqdm.pandas(desc = "Annotating sequence coverage for full crosslink...")
486521
spectronaut["PP.SequenceCoverageFull"] = spectronaut.progress_apply(lambda row: (float(row["PP.SequenceCoverageAlpha"]) + float(row["PP.SequenceCoverageBeta"])) / 2.0, axis = 1)
487522

523+
def annotate_UniScore(row: pd.Series, fragment_annotation: dict, alpha: bool) -> float:
524+
key = get_key_spectronaut(row)
525+
ion_types = fragment_annotation[key]["ion_types"]
526+
peptide = str(row["PP.PeptideA"]).strip() if alpha else str(row["PP.PeptideB"]).strip()
527+
pep_id_lookup = 0 if alpha else 1
528+
nr_of_matched_ions = 0
529+
unique_seq_positions = set()
530+
for ion in ion_types:
531+
pep_id = int(ion.split(";")[2])
532+
ion_type = str(ion.split(";")[0]).strip()
533+
ion_number = int(ion.split(";")[1])
534+
if len(ion_type) != 1:
535+
raise RuntimeError(f"Could not parse ion type from ion {ion}!")
536+
if pep_id == pep_id_lookup:
537+
if ion_type in ["a", "b", "c"]:
538+
unique_seq_positions.add(ion_number)
539+
nr_of_matched_ions += 1
540+
elif ion_type in ["x", "y", "z"]:
541+
unique_seq_positions.add(len(peptide) + 1 - ion_number)
542+
nr_of_matched_ions += 1
543+
else:
544+
raise RuntimeError(f"Found not-suppored ion type: {ion_type}")
545+
kmers = get_kmers(unique_seq_positions)
546+
return nr_of_matched_ions + sum(kmers)
547+
548+
tqdm.pandas(desc = "Annotating UniScore for alpha peptide...")
549+
spectronaut["PP.UniScoreAlpha"] = spectronaut.progress_apply(lambda row: annotate_UniScore(row, fragment_annotation, True), axis = 1)
550+
551+
tqdm.pandas(desc = "Annotating UniScore for beta peptide...")
552+
spectronaut["PP.UniScoreBeta"] = spectronaut.progress_apply(lambda row: annotate_UniScore(row, fragment_annotation, False), axis = 1)
553+
554+
tqdm.pandas(desc = "Annotating UniScore for full crosslinks...")
555+
spectronaut["PP.UniScoreFull"] = spectronaut.progress_apply(lambda row: min(float(row["PP.UniScoreAlpha"]), float(row["PP.UniScoreBeta"])), axis = 1)
556+
557+
tqdm.pandas(desc = "Annotating peptide length for alpha peptide...")
558+
spectronaut["PP.PepLenAlpha"] = spectronaut.progress_apply(lambda row: len(str(row["PP.PeptideA"]).strip()), axis = 1)
559+
560+
tqdm.pandas(desc = "Annotating peptide length for beta peptide...")
561+
spectronaut["PP.PepLenBeta"] = spectronaut.progress_apply(lambda row: len(str(row["PP.PeptideB"]).strip()), axis = 1)
562+
563+
def annotate_CrosslinkFragments(row: pd.Series, fragment_annotation: dict, alpha: bool) -> int:
564+
key = get_key_spectronaut(row)
565+
ions_as_full_spec_lib_rows = fragment_annotation[key]["fragments_rows"]
566+
pep_id_lookup = 0 if alpha else 1
567+
nr_of_crosslink_fragments = 0
568+
for ion in ions_as_full_spec_lib_rows:
569+
if ion["FragmentPepId"] == pep_id_lookup and get_bool_from_value(ion["CLContainingFragment"]):
570+
nr_of_crosslink_fragments += 1
571+
return nr_of_crosslink_fragments
572+
573+
tqdm.pandas(desc = "Annotating number of crosslink fragments for alpha peptide...")
574+
spectronaut["PP.NumberCrosslinkFragmentsAlpha"] = spectronaut.progress_apply(lambda row: annotate_CrosslinkFragments(row, fragment_annotation, True), axis = 1)
575+
576+
tqdm.pandas(desc = "Annotating number of crosslink fragments for beta peptide...")
577+
spectronaut["PP.NumberCrosslinkFragmentsBeta"] = spectronaut.progress_apply(lambda row: annotate_CrosslinkFragments(row, fragment_annotation, False), axis = 1)
578+
579+
tqdm.pandas(desc = "Annotating number of crosslink fragments for full crosslinks...")
580+
spectronaut["PP.NumberCrosslinkFragmentsFull"] = spectronaut.progress_apply(lambda row: row["PP.NumberCrosslinkFragmentsAlpha"] + row["PP.NumberCrosslinkFragmentsBeta"], axis = 1)
581+
582+
tqdm.pandas(desc = "Annotating number of crosslink fragments (normalized) for alpha peptide...")
583+
spectronaut["PP.NormalizedCrosslinkFragmentsAlpha"] = spectronaut.progress_apply(lambda row: row["PP.NumberCrosslinkFragmentsAlpha"] / row["PP.TotalIonsA"], axis = 1)
584+
585+
tqdm.pandas(desc = "Annotating number of crosslink fragments (normalized) for beta peptide...")
586+
spectronaut["PP.NormalizedCrosslinkFragmentsBeta"] = spectronaut.progress_apply(lambda row: row["PP.NumberCrosslinkFragmentsBeta"] / row["PP.TotalIonsB"], axis = 1)
587+
588+
tqdm.pandas(desc = "Annotating number of crosslink fragments (normalized) for full crosslinks...")
589+
spectronaut["PP.NormalizedCrosslinkFragmentsFull"] = spectronaut.progress_apply(lambda row: (row["PP.NumberCrosslinkFragmentsAlpha"] + row["PP.NumberCrosslinkFragmentsBeta"]) / (row["PP.TotalIonsA"] + row["PP.TotalIonsB"]), axis = 1)
590+
488591
spectronaut["PP.PseudoScanNumber"] = pd.Series(range(spectronaut.shape[0]))
489592
spectronaut["PP.Crosslinker"] = pd.Series([CROSSLINKER for i in range(spectronaut.shape[0])])
490593
spectronaut["PP.CrosslinkerMass"] = pd.Series([CROSSLINKER_MASS for i in range(spectronaut.shape[0])])

tests/tests.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,3 +352,19 @@ def test12_spectral_library_exporter():
352352
assert float(row["Combined Score"]) == pytest.approx(8.71)
353353
checked += 1
354354
assert checked == 2
355+
356+
# check kmers calculation
357+
def test13_test_kmers():
358+
359+
from post_process import get_kmers
360+
361+
unique_seq_positions = {1,2,3,7,8,11,10,15,16,17,18}
362+
assert get_kmers(unique_seq_positions) == [3,2,2,4]
363+
unique_seq_positions = {1,3,5}
364+
assert get_kmers(unique_seq_positions) == []
365+
unique_seq_positions = {0,1}
366+
assert get_kmers(unique_seq_positions) == [2]
367+
unique_seq_positions = {0,1,3,7,9}
368+
assert get_kmers(unique_seq_positions) == [2]
369+
unique_seq_positions = {0,1,3,7,8,9,15}
370+
assert get_kmers(unique_seq_positions) == [2,3]

0 commit comments

Comments
 (0)