-
Notifications
You must be signed in to change notification settings - Fork 2
Add supramolecular test 1: LNCI16 #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
merge to get latest docs
except Exception as e: | ||
logging.warning(f"Failed to read charge from {filepath}: {e}") | ||
return 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't want to catch all Exceptions in a blanket way, but also, do we expect this to fail sometimes, and is that actually ok?
If some of the input files don't have the info we need, we should handle that separately in choosing our dataset, or as a specify processing step, I think.
More generally, we need to think about logging. For warnings, I've used a proper warning in the add-example branch, but we definitely do want to set up logging more seriously, and think about what/how we log.
print(f"MAE for {self.model_name} on LNCI16: {mae:.2f} kcal/mol") | ||
|
||
|
||
def build_project(repro: bool = False) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not something for this PR, but I wonder if we should make this a utility function for building mlimx nodes, and just pass the benchmark as an extra parameter? Everything else is so general.
# Create fresh atoms object to avoid array contamination issues | ||
fresh_atoms = Atoms( | ||
symbols=atoms.get_chemical_symbols(), | ||
positions=atoms.positions.copy(), | ||
cell=atoms.cell.copy() if atoms.cell is not None else None, | ||
pbc=atoms.pbc.copy() if atoms.pbc is not None else False, | ||
) | ||
fresh_atoms.info.update(atoms.info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do this? If we do need a copy, can we not do Atoms.copy()
?
|
||
Input structures: | ||
|
||
* J. Gorges, B. Bädorf, A. Hansen, and S. Grimme, ‘LNCI16 - Efficient Computation of the Interaction Energies of Very Large Non-covalently Bound Complexes’, Synlett, vol. 34, no. 10, pp. 1135–1146, Jun. 2023, doi: 10.1055/s-0042-1753141. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a link to the actual data that can be downloaded as well?
from mlip_testing.calcs.models.models import MODELS | ||
|
||
CALC_PATH = ( | ||
Path(__file__).parent.parent.parent.parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the updated add-example branch, this can be simplified
OUT_PATH = Path(__file__).parent / "outputs" | ||
|
||
# Constants | ||
KCAL_TO_EV = 0.04336414 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't look right?
In general, we probably should use https://gitlab.com/ase/ase/blob/master/ase/units.py where possible, but eV is massively different to kcal.
if not filepath.exists(): | ||
return 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume if the structure is uncharged we don't have this, so this is a safe thing to do?
mae = results_df["error_kcal"].abs().mean() | ||
mae_data = {"MAE_kcal": float(mae)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these parts aren't very expensive, but do we not actually use this anywhere (and generally calculating the error and/or MAE), since we recalculate it later alongside the scatter plot generation etc. for analysis?
with open(write_dir / "mae_results.json", "w") as f: | ||
json.dump(mae_data, f, indent=2) | ||
|
||
print(f"MAE for {self.model_name} on LNCI16: {mae:.2f} kcal/mol") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially something to be logged rather than printed?
write_dir.mkdir(parents=True, exist_ok=True) | ||
|
||
# Save individual complex atoms files for each system | ||
if complex_atoms: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no complex_atoms
, it wouldn't integrate over anything, would it?
e396c41
to
c808eef
Compare
Superseded by #47 |
Pre-review checklist for PR author
PR author must check the checkboxes below when creating the PR.
Summary
Evaluates models on the LNCI16 dataset, which contains interaction energies for 16 large non-covalent complexes, including proteins, DNA, and supramolecular assemblies.
(Dataset reference: https://www.thieme-connect.com/products/ejournals/abstract/10.1055/s-0042-1753141)
Linked issue
Resolves #21
Progress
Testing
MACE-MP-0b3New decorators/callbacks
No