Skip to content

Biopandas PDB format cannot handle atomic charges #151

Open
@gate-tec

Description

@gate-tec

Describe the bug

Currently reading or writing PDB data with atomic charges that follow the PDB v3.3 format is not possible.

The PDB format specifies the notation for charge explicitly via:
Columns 79 - 80 indicate any charge on the atom, e.g., 2+, 1-. In most cases, these are blank.

Since the type of charge is specified as float and (a) usually these columns are blank and (b) strings like 2+ cannot be parsed to float, the type-formatiing fails and the entire charge column gets filled with NaN values

Writing charge values also doesn't provide the expected results, since the formatter is specified as +2.1f (same goes for anisou entries), which in my opinion doesn't match the PDB format at all, even when using float values.

Steps/Code to Reproduce

The following is an MWE for reading a PDB with charged atoms. Since the missmatch in formats for writing charge values should be clear from the above explanation, I'll omit an example for this (but I can provide one later if needed).

from biopandas.pdb import PandasPdb

atom_df = PandasPdb().fetch_pdb("2mjz").get_model(1).df["ATOM"]

print(len(atom_df.loc[atom_df["charge"].notnull(), "charge"]))

Expected Results

Detection of charged atoms in PDB data (first model of 2MJZ should have 350 charged atoms).

Actual Results

The output is 0 (since only NaN values are present).

Proposed Fix

I'd suggest changing the definition in the pdb_atomdict and pdb_anisoudict to type charges as str and change the string formatter accordingly.

A setup that seems to be working for me is:

{
    "id": "charge",
    "line": [78, 80],
    "type": str,
    "strf": lambda x: (
        str(int(re.sub(r"[+-]", "", x)))[-1] + ("-" if "-" in x else "+") if len(x.strip()) > 0 else ""
    ),
}

The string formatter can probably be improved but this was the safest option I could come up with.

Versions

biopandas 0.5.1
Linux-5.4.0-91-generic-x86_64-with-glibc2.31
Python 3.10.15 (main, Oct  3 2024, 07:27:34) [GCC 11.2.0]
NumPy 1.23.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions