Skip to content

Conversation

joehart2001
Copy link
Collaborator

@joehart2001 joehart2001 commented Sep 17, 2025

Pre-review checklist for PR author

PR author must check the checkboxes below when creating the PR.

Summary

Evaluates models on the LNCI16 dataset, which contains interaction energies for 16 large non-covalent complexes, including proteins, DNA, and supramolecular assemblies.

(Dataset reference: https://www.thieme-connect.com/products/ejournals/abstract/10.1055/s-0042-1753141)

Linked issue


Resolves #21

Progress

  • Calculations
  • Analysis
  • Application
  • Documentation

Testing

MACE-MP-0b3

New decorators/callbacks

No

@joehart2001 joehart2001 added this to the v0.0.1 milestone Sep 17, 2025
@joehart2001 joehart2001 added the new benchmark Proposals and suggestions for new benchmarks label Sep 17, 2025
@joehart2001 joehart2001 changed the base branch from main to add-example September 17, 2025 20:51
Comment on lines +123 to +125
except Exception as e:
logging.warning(f"Failed to read charge from {filepath}: {e}")
return 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want to catch all Exceptions in a blanket way, but also, do we expect this to fail sometimes, and is that actually ok?

If some of the input files don't have the info we need, we should handle that separately in choosing our dataset, or as a specify processing step, I think.

More generally, we need to think about logging. For warnings, I've used a proper warning in the add-example branch, but we definitely do want to set up logging more seriously, and think about what/how we log.

print(f"MAE for {self.model_name} on LNCI16: {mae:.2f} kcal/mol")


def build_project(repro: bool = False) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not something for this PR, but I wonder if we should make this a utility function for building mlimx nodes, and just pass the benchmark as an extra parameter? Everything else is so general.

Comment on lines +324 to +331
# Create fresh atoms object to avoid array contamination issues
fresh_atoms = Atoms(
symbols=atoms.get_chemical_symbols(),
positions=atoms.positions.copy(),
cell=atoms.cell.copy() if atoms.cell is not None else None,
pbc=atoms.pbc.copy() if atoms.pbc is not None else False,
)
fresh_atoms.info.update(atoms.info)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this? If we do need a copy, can we not do Atoms.copy()?


Input structures:

* J. Gorges, B. Bädorf, A. Hansen, and S. Grimme, ‘LNCI16 - Efficient Computation of the Interaction Energies of Very Large Non-covalently Bound Complexes’, Synlett, vol. 34, no. 10, pp. 1135–1146, Jun. 2023, doi: 10.1055/s-0042-1753141.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a link to the actual data that can be downloaded as well?

from mlip_testing.calcs.models.models import MODELS

CALC_PATH = (
Path(__file__).parent.parent.parent.parent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the updated add-example branch, this can be simplified

OUT_PATH = Path(__file__).parent / "outputs"

# Constants
KCAL_TO_EV = 0.04336414
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't look right?

In general, we probably should use https://gitlab.com/ase/ase/blob/master/ase/units.py where possible, but eV is massively different to kcal.

Comment on lines +114 to +115
if not filepath.exists():
return 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume if the structure is uncharged we don't have this, so this is a safe thing to do?

Comment on lines +342 to +343
mae = results_df["error_kcal"].abs().mean()
mae_data = {"MAE_kcal": float(mae)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these parts aren't very expensive, but do we not actually use this anywhere (and generally calculating the error and/or MAE), since we recalculate it later alongside the scatter plot generation etc. for analysis?

with open(write_dir / "mae_results.json", "w") as f:
json.dump(mae_data, f, indent=2)

print(f"MAE for {self.model_name} on LNCI16: {mae:.2f} kcal/mol")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially something to be logged rather than printed?

write_dir.mkdir(parents=True, exist_ok=True)

# Save individual complex atoms files for each system
if complex_atoms:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no complex_atoms, it wouldn't integrate over anything, would it?

@ElliottKasoar ElliottKasoar force-pushed the add-example branch 2 times, most recently from e396c41 to c808eef Compare September 18, 2025 16:48
Base automatically changed from add-example to main September 19, 2025 10:34
@ElliottKasoar
Copy link
Collaborator

Superseded by #47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new benchmark Proposals and suggestions for new benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants