Skip to content
This repository was archived by the owner on Jun 25, 2025. It is now read-only.

Commit 608247f

Browse files
Merge pull request #29 from hgb-bin-proteomics/develop
add xiFdrExporter
2 parents 93cc846 + 71b1cc7 commit 608247f

9 files changed

+191
-12
lines changed

README.md

Lines changed: 70 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -20,38 +20,60 @@ FASTA headers need to follow the UniProtKB standard formatting (as described [*h
2020

2121
All of the scripts use Micrsoft Excel files as input, for that MS Annika results need to be exported from Proteome Discoverer. It is recommended to first filter results according to your needs, e.g. filter for high-confidence crosslinks and filter out decoy crosslinks as depicted below.
2222

23-
![PDFilter](filter.png)
23+
### Exporting Crosslinks
24+
25+
![PDFilterCrosslinks](img/crosslinks_filtered.png)
26+
27+
**Figure 1:** Crosslinks filtered for 1% estimated FDR and without decoys.
2428

2529
Results can then be exported by selecting `File > Export > To Microsoft Excel… > Level 1: Crosslinks > Export` in Proteome Discoverer.
2630

31+
### Exporting CSMs
32+
33+
![PDFilterCSMsUnvalidated](img/csms_unfiltered.png)
34+
35+
**Figure 2:** All (unvalidated) CSMs.
36+
37+
![PDFilterCSMsValidated](img/csms_filtered.png)
38+
39+
**Figure 3:** CSMs filtered for 1% estimated FDR and without decoys.
40+
41+
Results can then be exported by selecting `File > Export > To Microsoft Excel… > Level 1: CSMs > Export` in Proteome Discoverer.
42+
2743
## Quick start
2844

29-
- **Exporting to xiNET**
45+
- **Exporting to [xiNET](https://crosslinkviewer.org/)**
3046
Files needed:
31-
- result.xlsx - MS Annika result file(s) exported to .xlsx
47+
- result.xlsx - MS Annika crosslink result file(s) exported to .xlsx
3248
- seq.fasta - FASTA file containing sequences of the crosslinked proteins
3349
```
3450
python xiNetExporter_msannika.py result.xlsx -fasta seq.fasta
3551
```
36-
- **Exporting to xiVIEW**
52+
- **Exporting to [xiVIEW](https://xiview.org/xiNET_website/index.php)**
3753
Files needed:
38-
- result.xlsx - MS Annika result file(s) exported to .xlsx
54+
- result.xlsx - MS Annika crosslink result file(s) exported to .xlsx
3955
- seq.fasta - FASTA file containing sequences of the crosslinked proteins
4056
```
4157
python xiViewExporter_msannika.py result.xlsx -fasta seq.fasta
4258
```
43-
- **Exporting to pyXlinkViewer (pyMOL)**
59+
- **Exporting to [xiFDR](https://github.com/Rappsilber-Laboratory/xiFDR)**
60+
Files needed:
61+
- result.xlsx - MS Annika CSM result file (unvalidated) exported to .xlsx
62+
```
63+
python xiFdrExporter_msannika.py result.xlsx
64+
```
65+
- **Exporting to [pyXlinkViewer (pyMOL)](https://github.com/BobSchiffrin/PyXlinkViewer)**
4466
Files needed:
45-
- result.xlsx - MS Annika result file(s) exported to .xlsx
67+
- result.xlsx - MS Annika crosslink result file(s) exported to .xlsx
4668
- structure.pdb - 3D structure of the protein (complex) that crosslinks should be mapped to, alternatively you can also just provide the 4-letter code from the [PDB](https://www.rcsb.org/) and the script will fetch the structure from internet
4769
```
4870
python pyXlinkViewerExporter_msannika.py result.xlsx -pdb structure.pdb
4971
```
50-
- **Exporting to XLMS-Tools**
72+
- **Exporting to [XLMS-Tools](https://gitlab.com/topf-lab/xlms-tools)**
5173
XLMS-Tools uses the same file format as pyXlinkViewer, therefore the same exporter can be used!
52-
- **Exporting to XMAS (ChimeraX)**
74+
- **Exporting to [XMAS (ChimeraX)](https://github.com/ScheltemaLab/ChimeraX_bundle)**
5375
Visualization of MS Annika results works out of the box with .xlsx files exported from Proteome Discoverer.
54-
- **Exporting to PAE Viewer**
76+
- **Exporting to [PAE Viewer](http://www.subtiwiki.uni-goettingen.de/v4/paeViewerDemo)**
5577
Files needed:
5678
- pyXlinkViewer_export.csv - Crosslinks exported from pyXlinkViewer as .csv
5779
```
@@ -142,9 +164,45 @@ Or using the Windows binary:
142164
xiViewExporter_msannika.exe "202001216_nsp8_trypsin_XL_REP1.xlsx" "202001216_nsp8_trypsin_XL_REP2.xlsx" "202001216_nsp8_trypsin_XL_REP3.xlsx" --fasta SARS-COV-2.fasta -o test --ignore P0DTC1 P0DTD1 P0DTC2
143165
```
144166

167+
## Export to [xiFDR](https://github.com/Rappsilber-Laboratory/xiFDR)
168+
169+
```
170+
EXPORTER DESCRIPTION:
171+
A script to export MS Annika CSM results (.xlsx) to a xiFDR input file (.csv).
172+
CSMs should be unfiltered, therefore include decoys and not be validated for any
173+
FDR.
174+
Warning: This exporter currently only reports one/the first protein for
175+
ambiguous peptides that are found in more than one protein!
176+
USAGE:
177+
xiFdrExporter_msannika.py f [f]
178+
[-o OUTPUT]
179+
[-h]
180+
[--version]
181+
positional arguments:
182+
f Crosslink-Spectrum-Matches (CSMs) exported from
183+
MS Annika in Microsoft Excel (.xlsx) format.
184+
optional arguments:
185+
-o OUTPUT, --output OUTPUT
186+
Prefix of the output file.
187+
-h, --help show this help message and exit
188+
--version show program's version number and exit
189+
```
190+
191+
Example usage:
192+
193+
```
194+
python xiFdrExporter_msannika.py XLpeplib_Beveridge_QEx-HFX_DSS_R1.xlsx
195+
```
196+
197+
Or using the Windows binary:
198+
199+
```
200+
xiFdrExporter_msannika.exe XLpeplib_Beveridge_QEx-HFX_DSS_R1.xlsx
201+
```
202+
145203
## Export to [PyXlinkViewer for pyMOL](https://github.com/BobSchiffrin/PyXlinkViewer)
146204

147-
A schematic workflow of the implementation can be seen in [*this figure*](workflow_pyMOLexporter.png).
205+
A schematic workflow of the implementation can be seen in [*this figure*](img/workflow_pyMOLexporter.png).
148206

149207
```
150208
EXPORTER DESCRIPTION:
@@ -217,7 +275,7 @@ Visualization of crosslinks with [XMAS](https://github.com/ScheltemaLab/ChimeraX
217275

218276
Evaluating predicted structures (e.g. structures created with AlphaFold2) using cross-linking data can easily be done using [PAE Viewer](http://www.subtiwiki.uni-goettingen.de/v4/paeViewerDemo). Exporting MS Annika results to the input format of PAE Viewer requires first exporting to pyXlinkViewer (pyMOL) and then exporting crosslinks from pyXlinkViewer to CSV, as shown in the pyMOL screenshot below:
219277

220-
![pyMOLExportScreenshot](pyXlinkViewer_XL_export.png)
278+
![pyMOLExportScreenshot](img/pyXlinkViewer_XL_export.png)
221279

222280
The exporter takes the following arguments:
223281
```
34.2 MB
Binary file not shown.
109 KB
Binary file not shown.
File renamed without changes.

img/csms_filtered.png

32.5 KB
Loading

img/csms_unfiltered.png

13.8 KB
Loading
File renamed without changes.
File renamed without changes.

xiFdrExporter_msannika.py

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
#!/usr/bin/env python3
2+
3+
# Exporter of MS Annika CSM Results to xiFDR input format
4+
# 2024 (c) Micha Johannes Birklbauer
5+
# https://github.com/michabirklbauer/
6+
# micha.birklbauer@gmail.com
7+
8+
import argparse
9+
import pandas as pd
10+
11+
__version = "1.0.1"
12+
__date = "20240505"
13+
14+
"""
15+
DESCRIPTION:
16+
A script to export MS Annika CSM results (.xlsx) to a xiFDR input file (.csv).
17+
CSMs should be unfiltered, therefore include decoys and not be validated for any
18+
FDR.
19+
Warning: This exporter currently only reports one/the first protein for
20+
ambiguous peptides that are found in more than one protein!
21+
USAGE:
22+
xiFdrExporter_msannika.py f [f]
23+
[-o OUTPUT]
24+
[-h]
25+
[--version]
26+
positional arguments:
27+
f Crosslink-Spectrum-Matches (CSMs) exported from
28+
MS Annika in Microsoft Excel (.xlsx) format.
29+
optional arguments:
30+
-o OUTPUT, --output OUTPUT
31+
Prefix of the output file.
32+
-h, --help show this help message and exit
33+
--version show program's version number and exit
34+
"""
35+
36+
# Exporter class with constructor that takes one MS Annika CSM result file as
37+
# input. CSMs should not be in any way filtered and exported to Microsoft Excel
38+
# .xlsx format from Proteome Discoverer.
39+
class MSAnnika_Exporter:
40+
41+
def __init__(self, input_file: str):
42+
self.input_file = input_file
43+
44+
# static method to generate pandas dataframe of xiFDR export without class
45+
# instance. Takes the file name of the CSM file as input.
46+
@staticmethod
47+
def generate_df(input_file: str) -> pd.DataFrame:
48+
49+
print("Warning: This exporter currently only reports one/the first protein for ambiguous peptides that are found in more than one protein!")
50+
51+
df = pd.read_excel(input_file)
52+
df.rename(columns = {"Spectrum File": "run",
53+
"First Scan": "scan",
54+
"Sequence A": "peptide1",
55+
"Sequence B": "peptide2",
56+
"Crosslinker Position A": "peptide link 1",
57+
"Crosslinker Position B": "peptide link 2",
58+
"Charge": "precursor charge",
59+
"Combined Score": "score",
60+
"Score Alpha": "peptide1 score",
61+
"Score Beta": "peptide2 score",
62+
"Accession A": "accession1",
63+
"Accession B": "accession2",
64+
"A in protein": "peptide position 1",
65+
"B in protein": "peptide position 2"},
66+
inplace = True,
67+
errors = "raise")
68+
# remove the following two lines if I find out how to denote ambiguous peptides in xiFDR (e.g. peptides that link to more than one protein)
69+
df["accession1"] = df["accession1"].apply(lambda x: x.split(";")[0])
70+
df["accession2"] = df["accession2"].apply(lambda x: x.split(";")[0])
71+
df["is decoy 1"] = df["Alpha T/D"].apply(lambda x: "false" if "t" in str(x).lower() else "true")
72+
df["is decoy 2"] = df["Beta T/D"].apply(lambda x: "false" if "t" in str(x).lower() else "true")
73+
# same issue again - this would be used if xiFDR allows more than protein per peptide
74+
#df["peptide position 1"] = df["peptide position 1"].apply(lambda x: ";".join([str(int(y) + 1) for y in str(x).split(";")]))
75+
#df["peptide position 2"] = df["peptide position 2"].apply(lambda x: ";".join([str(int(y) + 1) for y in str(x).split(";")]))
76+
# remove the following two lines if I figure above out
77+
df["peptide position 1"] = df["peptide position 1"].apply(lambda x: int(x.split(";")[0]) + 1)
78+
df["peptide position 2"] = df["peptide position 2"].apply(lambda x: int(x.split(";")[0]) + 1)
79+
80+
return df
81+
82+
# classmethod implementation of the static generate_df
83+
def __generate_csv_df(self) -> pd.DataFrame:
84+
return self.generate_df(self.input_file)
85+
86+
# export function, takes one argument "output_file" which sets the prefix
87+
# of generated output file
88+
def export(self, output_file: str = None) -> pd.DataFrame:
89+
csv = self.__generate_csv_df()
90+
91+
if output_file is None:
92+
output_file = ".".join(self.input_file.split(".")[:-1])
93+
94+
csv.to_csv(output_file + "_xiFDR.csv", index = False)
95+
96+
return csv
97+
98+
# initialize exporter and export xiFDR csv file
99+
def main() -> None:
100+
parser = argparse.ArgumentParser()
101+
parser.add_argument(metavar = "f",
102+
dest = "file",
103+
help = "Name/Path of the MS Annika CSM result file (in .xlsx format) to process.",
104+
type = str,
105+
nargs = 1)
106+
parser.add_argument("-o", "--output",
107+
dest = "output",
108+
default = None,
109+
help = "Prefix of the output file.",
110+
type = str)
111+
parser.add_argument("--version",
112+
action = "version",
113+
version = __version)
114+
args = parser.parse_args()
115+
116+
exporter = MSAnnika_Exporter(args.file[0])
117+
118+
exporter.export(args.output)
119+
120+
if __name__ == "__main__":
121+
main()

0 commit comments

Comments
 (0)