Add supramolecular test 1: LNCI16 #44

joehart2001 · 2025-09-17T20:48:41Z

Pre-review checklist for PR author

PR author must check the checkboxes below when creating the PR.

I've confirmed the contribution guidelines.

Summary

Evaluates models on the LNCI16 dataset, which contains interaction energies for 16 large non-covalent complexes, including proteins, DNA, and supramolecular assemblies.

(Dataset reference: https://www.thieme-connect.com/products/ejournals/abstract/10.1055/s-0042-1753141)

Linked issue

Resolves #21

Progress

Calculations
Analysis
Application
Documentation

Testing

MACE-MP-0b3

New decorators/callbacks

No

merge to get latest docs

ElliottKasoar · 2025-09-18T12:19:46Z

mlip_testing/calcs/supramolecular/LNCI16/calc_LNCI16.py

+        except Exception as e:
+            logging.warning(f"Failed to read charge from {filepath}: {e}")
+            return 0.0


We probably don't want to catch all Exceptions in a blanket way, but also, do we expect this to fail sometimes, and is that actually ok?

If some of the input files don't have the info we need, we should handle that separately in choosing our dataset, or as a specify processing step, I think.

More generally, we need to think about logging. For warnings, I've used a proper warning in the add-example branch, but we definitely do want to set up logging more seriously, and think about what/how we log.

ElliottKasoar · 2025-09-18T12:21:09Z

mlip_testing/calcs/supramolecular/LNCI16/calc_LNCI16.py

+            print(f"MAE for {self.model_name} on LNCI16: {mae:.2f} kcal/mol")
+
+
+def build_project(repro: bool = False) -> None:


Probably not something for this PR, but I wonder if we should make this a utility function for building mlimx nodes, and just pass the benchmark as an extra parameter? Everything else is so general.

ElliottKasoar · 2025-09-18T12:23:18Z

mlip_testing/calcs/supramolecular/LNCI16/calc_LNCI16.py

+                # Create fresh atoms object to avoid array contamination issues
+                fresh_atoms = Atoms(
+                    symbols=atoms.get_chemical_symbols(),
+                    positions=atoms.positions.copy(),
+                    cell=atoms.cell.copy() if atoms.cell is not None else None,
+                    pbc=atoms.pbc.copy() if atoms.pbc is not None else False,
+                )
+                fresh_atoms.info.update(atoms.info)


Why do we need to do this? If we do need a copy, can we not do Atoms.copy()?

ElliottKasoar · 2025-09-18T12:25:53Z

docs/source/user_guide/benchmarks/supramolecular.rst

+
+Input structures:
+
+* J. Gorges, B. Bädorf, A. Hansen, and S. Grimme, ‘LNCI16 - Efficient Computation of the Interaction Energies of Very Large Non-covalently Bound Complexes’, Synlett, vol. 34, no. 10, pp. 1135–1146, Jun. 2023, doi: 10.1055/s-0042-1753141.


Is there a link to the actual data that can be downloaded as well?

ElliottKasoar · 2025-09-18T12:26:31Z

mlip_testing/analysis/supramolecular/LNCI16/analyse_LNCI16.py

+from mlip_testing.calcs.models.models import MODELS
+
+CALC_PATH = (
+    Path(__file__).parent.parent.parent.parent


With the updated add-example branch, this can be simplified

ElliottKasoar · 2025-09-18T12:34:16Z

mlip_testing/calcs/supramolecular/LNCI16/calc_LNCI16.py

+OUT_PATH = Path(__file__).parent / "outputs"
+
+# Constants
+KCAL_TO_EV = 0.04336414


That doesn't look right?

In general, we probably should use https://gitlab.com/ase/ase/blob/master/ase/units.py where possible, but eV is massively different to kcal.

ElliottKasoar · 2025-09-18T12:35:25Z

mlip_testing/calcs/supramolecular/LNCI16/calc_LNCI16.py

+        if not filepath.exists():
+            return 0.0


I assume if the structure is uncharged we don't have this, so this is a safe thing to do?

ElliottKasoar · 2025-09-18T12:42:19Z

mlip_testing/calcs/supramolecular/LNCI16/calc_LNCI16.py

+            mae = results_df["error_kcal"].abs().mean()
+            mae_data = {"MAE_kcal": float(mae)}


I assume these parts aren't very expensive, but do we not actually use this anywhere (and generally calculating the error and/or MAE), since we recalculate it later alongside the scatter plot generation etc. for analysis?

ElliottKasoar · 2025-09-18T12:42:43Z

mlip_testing/calcs/supramolecular/LNCI16/calc_LNCI16.py

+            with open(write_dir / "mae_results.json", "w") as f:
+                json.dump(mae_data, f, indent=2)
+
+            print(f"MAE for {self.model_name} on LNCI16: {mae:.2f} kcal/mol")


Potentially something to be logged rather than printed?

ElliottKasoar · 2025-09-18T12:46:40Z

mlip_testing/calcs/supramolecular/LNCI16/calc_LNCI16.py

+        write_dir.mkdir(parents=True, exist_ok=True)
+
+        # Save individual complex atoms files for each system
+        if complex_atoms:


If there are no complex_atoms, it wouldn't integrate over anything, would it?

ElliottKasoar · 2025-09-19T15:27:01Z

Superseded by #47

ElliottKasoar added 30 commits September 16, 2025 16:53

Add OC157 test

cf20edd

Fix docstrings

d41ee61

Fix scatter for no hoverdata

efbc214

Add app utils

48a9498

Save app components in app dir

5d80af1

Rename file

b43f1e2

Remove deprecated package

b9a53f9

Write structures for app in order

6905ef8

Add ids for loaded components

560daf9

Add app for OC157 benchmark

ca3128a

Remove mlipx-required files from gitignore

d7256e4

Add run file for OC157 calculation

561df72

Refactor OC157 analysis

0fad1fb

Add docstrings for OC157 calc helper functions

fe6a624

Ignore numpydoc issues for models

ca26c4d

Correct units for OC157 test

bb7c4a9

Save tooltips for table

7f2cbaa

Calculate score and rank for all tables

9b5ed8c

Update OC157 app and generalise assets

a702628

Add app utility functions

b62cba4

Add function to run all tabs

d3ac374

Update .gitignore

8d7de36

Refactor app definition

5963654

Simplify setting tab callback

d7df5b0

Simplify callbacks and add tab description

1d2e58e

Build tabs and summary table

dab157e

Build table and ID from benchmark name

42b315f

Fix initial weights

f881698

Generalise weight builder

c22cb50

Allow weights to be specified when calculating score

d8658f8

ElliottKasoar and others added 13 commits September 16, 2025 17:49

Remove unused dvc files

235c658

Re-add dvc.yaml to gitignore

d1545e7

Document tests (#36)

e49b08c

Update dependencies for Python 3.13

e6b8692

Read data files in app on function call

374cfc1

Catch missing missing files when building app

5f54a6c

Raise error if no tests found

0434750

rm supramolecular files to start fresh

657f975

LNCI16 benchmark

dc47504

Merge updated docs into add-supra branch

607e2e6

Merge remote-tracking branch 'origin/add-example' into add-supra

6650607

merge to get latest docs

add docs for LNCI16

340014b

remove charged and neural MAE to make it a simpler example for now

a482e7f

joehart2001 added this to the v0.0.1 milestone Sep 17, 2025

joehart2001 requested a review from ElliottKasoar September 17, 2025 20:48

joehart2001 assigned ElliottKasoar Sep 17, 2025

joehart2001 added the new benchmark Proposals and suggestions for new benchmarks label Sep 17, 2025

joehart2001 changed the base branch from main to add-example September 17, 2025 20:51

joehart2001 added 2 commits September 17, 2025 21:57

update docs

9045ef9

add models back

f93c5a4

ElliottKasoar reviewed Sep 18, 2025

View reviewed changes

alinelena added the example benchmark addition label Sep 18, 2025

ElliottKasoar force-pushed the add-example branch 2 times, most recently from e396c41 to c808eef Compare September 18, 2025 16:48

Base automatically changed from add-example to main September 19, 2025 10:34

ElliottKasoar mentioned this pull request Sep 19, 2025

New supramolecular benchmark: LNCI16 calc, analysis, app and docs #47

Open

5 tasks

ElliottKasoar closed this Sep 19, 2025

ElliottKasoar deleted the add-supra branch September 19, 2025 15:28

ElliottKasoar removed the example benchmark addition label Sep 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add supramolecular test 1: LNCI16 #44

Add supramolecular test 1: LNCI16 #44

Uh oh!

joehart2001 commented Sep 17, 2025 •

edited

Loading

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar Sep 18, 2025

Uh oh!

ElliottKasoar commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		print(f"MAE for {self.model_name} on LNCI16: {mae:.2f} kcal/mol")


		def build_project(repro: bool = False) -> None:


		Input structures:

		* J. Gorges, B. Bädorf, A. Hansen, and S. Grimme, ‘LNCI16 - Efficient Computation of the Interaction Energies of Very Large Non-covalently Bound Complexes’, Synlett, vol. 34, no. 10, pp. 1135–1146, Jun. 2023, doi: 10.1055/s-0042-1753141.

		mae = results_df["error_kcal"].abs().mean()
		mae_data = {"MAE_kcal": float(mae)}

Add supramolecular test 1: LNCI16 #44

Add supramolecular test 1: LNCI16 #44

Uh oh!

Conversation

joehart2001 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist for PR author

Summary

Linked issue

Progress

Testing

New decorators/callbacks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ElliottKasoar commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joehart2001 commented Sep 17, 2025 •

edited

Loading