-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Hi @meowcat @meier-rene (CC @anjuraj15 and @PaulThiessen)
We had a bizarre case of existing (3 year old) ENTACT records fail validation when we updated only unrelated (textual) information. Turns out the SMILES contained stereochemistry information, but the InChI, InChIKey and all related identifiers didn't, which then failed @meier-rene 's updated validation suite.
Here are the SMILES in question:
ClC1=CC=C(CN2CCS\C2=N/C#N)C=N1
CN(C)C1=CC=C(C=C1)\N=N\C1=C(C=CC=C1)C(O)=O
OC(=O)C1=CC(=CC=C1O)\N=N\C1=CC=C(C=C1)S(=O)(=O)NC1=NC=CC=C1
CCCOC\C(=N/C1=C(C=C(Cl)C=C1)C(F)(F)F)N1C=CN=C1
NC1=CC=C(C=C1)\N=N\C1=CC=CC=C1
Turns out that they standardize to the non-stereochemistry form in PubChem standardizer, and presumably also Cactvs - which may explain how everything after InChIKey ended up as the "stereochemistry-neutral" form. The only way we could get these records to pass validation was to adjust to the non-stereo SMILES, rather than having to update all InChI and identifier fields. See example before and after change (after with _ES and end) and the log.
Not sure if we have to build a check into RMassBank to catch this, @meowcat have you ever seen any cases like this? @meier-rene are there any other existing records that have this issue?