Skip to content

Add additional sanity checks against external resources #84

@benmwebb

Description

@benmwebb

As @brindakv mentioned, python-ihm currently does various sanity checks to ensure the generated mmCIF is self-consistent, e.g. checking the model sequence against that in the struct_ref table. However, we could do additional sanity checks (perhaps as part of the make-mmcif.py script, or another script used as part of the deposition pipeline) that validate external resources. (I would be reluctant to have these done as part of generating every file, since they would make multiple network connections, referenced files might not exist at modeling time, and many issues might be warnings rather than errors or would need manual intervention.) For example we could

  • Query UniProt and check to make sure that the struct_ref sequence matches (complication: may need to check multiple versions of the UniProt sequence since it does change).
  • Ping any DOI referenced in the file to make sure it exists.
  • Download any referenced external archive files and make sure that any files referenced inside those archives exist.
  • Look up any accessions (e.g. SASBDB, EMDB) to make sure that a) they exist, b) they have been released and c) they match the model (e.g. by checking model fit or checking that both model and data reference the same UniProt sequence).
  • Look up any PMIDs and make sure the citation matches.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions