Skip to content

SMILES / InChI(Key)+identifier inconsistencies in RMassBank-generated records #331

@schymane

Description

@schymane

Hi @meowcat @meier-rene (CC @anjuraj15 and @PaulThiessen)

We had a bizarre case of existing (3 year old) ENTACT records fail validation when we updated only unrelated (textual) information. Turns out the SMILES contained stereochemistry information, but the InChI, InChIKey and all related identifiers didn't, which then failed @meier-rene 's updated validation suite.

Here are the SMILES in question:

ClC1=CC=C(CN2CCS\C2=N/C#N)C=N1
CN(C)C1=CC=C(C=C1)\N=N\C1=C(C=CC=C1)C(O)=O
OC(=O)C1=CC(=CC=C1O)\N=N\C1=CC=C(C=C1)S(=O)(=O)NC1=NC=CC=C1
CCCOC\C(=N/C1=C(C=C(Cl)C=C1)C(F)(F)F)N1C=CN=C1
NC1=CC=C(C=C1)\N=N\C1=CC=CC=C1

Turns out that they standardize to the non-stereochemistry form in PubChem standardizer, and presumably also Cactvs - which may explain how everything after InChIKey ended up as the "stereochemistry-neutral" form. The only way we could get these records to pass validation was to adjust to the non-stereo SMILES, rather than having to update all InChI and identifier fields. See example before and after change (after with _ES and end) and the log.

Not sure if we have to build a check into RMassBank to catch this, @meowcat have you ever seen any cases like this? @meier-rene are there any other existing records that have this issue?

log.txt
MSBNK-LCSB-LU005205.txt
MSBNK-LCSB-LU005205_ES.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions