|
1 | 1 | Getting Started
|
2 | 2 | ===============
|
3 | 3 |
|
| 4 | +**peptacular** is an extremely lightweight package with only one dependency: ``regex``. |
| 5 | + |
| 6 | +It contains functions for parsing and working with Proforma2.0 compliant peptide & protein sequences. |
| 7 | + |
| 8 | +Installation |
| 9 | +------------ |
| 10 | + |
| 11 | +**From pip**: |
| 12 | + |
| 13 | +.. code-block:: bash |
| 14 | +
|
| 15 | + pip install peptacular |
| 16 | +
|
| 17 | +**From source**: |
| 18 | + |
| 19 | +.. code-block:: bash |
| 20 | +
|
| 21 | + git clone https://github.com/pgarrett-scripps/peptacular.git |
| 22 | + cd peptacular |
| 23 | + pip install . |
| 24 | +
|
| 25 | +
|
4 | 26 | Basic Usage
|
5 |
| ----------- |
| 27 | +----------- |
| 28 | + |
| 29 | +All modules and functions in **peptacular** are available under the ``peptacular`` namespace but it is recommended to import as follows: |
| 30 | + |
| 31 | +.. code-block:: python |
| 32 | +
|
| 33 | + import peptacular as pt |
| 34 | +
|
| 35 | +**ProForma** sequence parsing in **peptacular** is lazy, meaning only the required notation is |
| 36 | +validated during parsing. Invalid modifications are not checked until they are explicitly needed, |
| 37 | +such as when calculating mass or composition. |
| 38 | + |
| 39 | +**For example:** |
| 40 | + |
| 41 | +.. code-block:: python |
| 42 | +
|
| 43 | + pt.pop_mods('PEP[INVALID]TIDE') # Successfully runs |
| 44 | + pt.mass('PEP[INVALID]TIDE') # Raises an error due to the invalid modification |
| 45 | +
|
| 46 | +
|
| 47 | +Key Features |
| 48 | +------------ |
| 49 | + |
| 50 | +**peptacular** is fully compliant with **ProForma 2.0** and includes functions for: |
| 51 | + |
| 52 | +- **Digestion:** Perform single, multi, or sequential digestion. |
| 53 | +- **Fragmentation:** Generate internal, terminal, immonium, and neutral loss fragments. |
| 54 | +- **Mass and Composition:** Calculate mass, m/z, or elemental composition of peptides. |
| 55 | +- **Modifications:** Apply or remove static and variable modifications. |
| 56 | +- **Parsing and Serializing:** Handle ProForma 2.0-compliant sequence parsing and serialization. |
| 57 | +- **Isotopic Distributions:** Simulate isotopic patterns. |
| 58 | +- **Scoring:** Compare theoretical fragments against experimental spectra. |
| 59 | + |
| 60 | +See the **Examples** section for more detailed use cases. |
| 61 | + |
| 62 | + |
| 63 | +Sequence Handling |
| 64 | +----------------- |
| 65 | + |
| 66 | +All functions in ``pt.sequence`` accept peptide sequences as strings but internally convert them to |
| 67 | +**ProformaAnnotation** objects. After processing, they are converted back to strings. |
| 68 | + |
| 69 | +When applying multiple sequence operations on the same peptide, it is more efficient to first convert the |
| 70 | +sequence to a **ProformaAnnotation** and use its methods directly. |
| 71 | + |
| 72 | +**Example: Converting between `ProformaAnnotation` and `str`:** |
| 73 | + |
| 74 | +.. code-block:: python |
| 75 | +
|
| 76 | + import peptacular st pt |
| 77 | +
|
| 78 | + annot = pt.parse('PEPTIDE') |
| 79 | + seq = pt.serialize(annot) # or annot.serialize() |
| 80 | + assert 'PEPTIDE' == seq |
| 81 | +
|
| 82 | +
|
| 83 | +- This returns either a ``ProformaAnnotation`` or a ``MultiProformaAnnotation`` object. |
| 84 | +- ``ProformaAnnotation`` is used for single, linear sequences (the most common use case). |
| 85 | +- ``MultiProformaAnnotation`` handles crosslinked or multiple sequences. |
| 86 | + |
| 87 | +**Crosslinked and Multiple Sequences:** |
| 88 | + |
| 89 | +- **Crosslinked**: ``{sequence1}\\{sequence2}`` |
| 90 | +- **Disconnected**: ``{sequence1}+{sequence2}`` |
| 91 | + |
| 92 | +``MultiProformaAnnotation`` contains a list of individual ``ProformaAnnotation`` objects along with their |
| 93 | +connection type. |
| 94 | + |
| 95 | + |
| 96 | +ProForma Notation |
| 97 | +---------------------- |
| 98 | + |
| 99 | +**ProForma 2.0** was introduced by the **Proteomics Standards Initiative (PSI)** to standardize the representation of peptide sequences, including modifications. |
| 100 | + |
| 101 | +- 📄 **Reference Paper:** `ProForma 2.0 Specification <https://pubs.acs.org/doi/10.1021/acs.jproteome.1c00771>`_ |
| 102 | +- 📚 **Latest Specification:** `ProForma 2.0 GitHub <https://github.com/HUPO-PSI/ProForma/tree/master/SpecDocument>`_ |
| 103 | + |
| 104 | + |
| 105 | +**Basic Syntax Overview** |
| 106 | + |
| 107 | +- **N-terminal:** ``[+100]-PEPTIDE`` |
| 108 | +- **C-terminal:** ``PEPTIDE-[+100]`` |
| 109 | +- **Internal:** ``PEPT[+100]IDE`` |
| 110 | +- **Global:** ``<[+100]@C>PEPTIDE`` or ``<[+100]@C,P>PEPTIDE`` |
| 111 | +- **Isotope:** ``<13C>PEPTIDE`` or ``<15N><13C>PEPTIDE`` |
| 112 | +- **Labile:** ``{+100}PEPTIDE`` |
| 113 | + |
| 114 | +Global, isotope, and labile mods are specified before N-terminal modification, or first residue if no terminal mod is present. |
| 115 | + |
| 116 | +**Combined Example:** |
| 117 | + |
| 118 | +.. code-block:: python |
| 119 | +
|
| 120 | + pt.parse('<[+20]@C><13C>{+75}[-40]-PEPT[+50]IDE-[+200]') |
| 121 | +
|
| 122 | + # Returns |
| 123 | + ProFormaAnnotation( |
| 124 | + sequence='PEPTIDE', |
| 125 | + isotope_mods=[Mod('13C', 1)], |
| 126 | + static_mods=[Mod('[+20]@C', 1)], |
| 127 | + labile_mods=[Mod(75, 1)], |
| 128 | + nterm_mods=[Mod(-40, 1)], |
| 129 | + cterm_mods=[Mod(200, 1)], |
| 130 | + internal_mods={3: [Mod(50, 1)]} |
| 131 | + ) |
| 132 | +
|
| 133 | +**Specifying Proforma Modifications** |
| 134 | + |
| 135 | +The ``Mod`` object contains: |
| 136 | +- The **modification string** |
| 137 | +- The **number of times** it is applied |
| 138 | + |
| 139 | +You can apply **multiple modifications** at the same position by adding them sequentially: |
| 140 | +- ``[+100][+30]`` → Two separate modifications |
| 141 | +- ``[+100]^2`` → The same modification applied twice |
| 142 | + |
| 143 | +**Modification Types** |
| 144 | + |
| 145 | +- **Mass-based:** ``[+100]``, ``[100]``, ``[-100]`` |
| 146 | +- **Chemical formula:** ``[Formula:C12H20O2]`` |
| 147 | +- **UNIMOD:** ``[Oxidation]``, ``[UNIMOD:21]``, ``[U:21]`` |
| 148 | +- **PSI-MOD:** ``[L-methionine sulfoxide]``, ``[MOD:00046]``, ``[M:00046]`` |
| 149 | +- **RESID:** ``[R:L-methionine (R)-sulfoxide]``, ``[RESID:AA0037]`` |
| 150 | +- **GNO:** ``[GNO:G02815KT]`` |
6 | 151 |
|
7 |
| -TODO |
| 152 | +While the prefix for unimod and psi-mods are not required (U: and M:), it is still reccommended to use them. |
0 commit comments