feat(c++,pt-expt): add .pt2 (AOTInductor) C/C++ inference with DPA1/DPA2/DPA3 support #5298
feat(c++,pt-expt): add .pt2 (AOTInductor) C/C++ inference with DPA1/DPA2/DPA3 support #5298wanghan-iapcm wants to merge 21 commits intodeepmodeling:masterfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d3fd09c68
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds .pt2 (AOTInductor) support to the PyTorch exportable backend, migrates several descriptor and utility indexing paths to array‑API helpers (xp_take_first_n / xp_take_along_axis), implements a C++ DeepPotPTExpt backend and wiring, extends serialization for .pt2 packaging, and adds comprehensive C++ and Python test/model-generation artifacts. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DeepEval
participant Serializer
participant AOTInductor
participant CompiledModel
User->>DeepEval: load("model.pt2")
DeepEval->>Serializer: identify extension(".pt2")
Serializer-->>DeepEval: choose PT2 loader
DeepEval->>AOTInductor: load_runner_from_zip(model.pt2)
AOTInductor-->>DeepEval: runner_instance + metadata/output_keys
User->>DeepEval: eval(coords, types, box)
DeepEval->>DeepEval: prepare tensors (coords, atype, nlist, mapping, fparam, aparam)
DeepEval->>AOTInductor: runner.__call__(inputs)
AOTInductor->>CompiledModel: execute_compiled_forward
CompiledModel-->>AOTInductor: output_tensors
AOTInductor-->>DeepEval: output_tensors
DeepEval->>DeepEval: extract_outputs(output_keys)
DeepEval-->>User: energies, forces, virial
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can customize the high-level summary generated by CodeRabbit.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (6)
source/tests/pt_expt/infer/test_deep_eval.py (1)
490-791: Keep the.pt2suite aligned with the.ptecoverage.This copy has already drifted from
TestDeepEvalEner:test_dynamic_shapes()and the ASE neighbor-list cases exist only in the.pteclass. Those are exactly the paths most likely to regress for the exported backend, so please either factor the shared checks into a base/mixin or add.pt2equivalents here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt_expt/infer/test_deep_eval.py` around lines 490 - 791, The .pt2 test class TestDeepEvalEnerPt2 is missing parity with the .pte suite: add equivalents of test_dynamic_shapes and the ASE neighbor-list tests (those that exercise variable frame/atom counts and ASE neighbor-list paths) or factor shared checks into a base TestDeepEvalEnerMixin used by both TestDeepEvalEnerPt2 and TestDeepEvalEner so both run the same coverage; locate the suite in TestDeepEvalEnerPt2 and either (a) implement two new tests named test_dynamic_shapes and the ASE neighbor-list tests that call DeepPot.eval (and the model forward reference) with dynamic-shaped inputs and ASE neighbor-list-configured inputs, or (b) extract the common checks into a mixin class (e.g., DeepEvalSharedTests) and have both TestDeepEvalEnerPt2 and TestDeepEvalEner inherit it so DeepPot.eval, model.forward, and ASE neighbor-list code paths are exercised for .pt2 as well.source/tests/infer/gen_fparam_aparam.py (1)
127-139: Remove unusednlocvariable and prefix unused unpacked variables.Static analysis flags these issues:
- Line 127:
vis unpacked but never used- Line 136:
nloc = 6is assigned but never used (F841 error - will failruff check)- Line 167:
ae_pth,av_pthare unused♻️ Proposed fix
- e, f, v, ae, av = dp.eval( + e, f, _v, ae, av = dp.eval( coord, box, atype, fparam=fparam_val, aparam=aparam_val, atomic=True, ) - nloc = 6 atom_energy = ae[0, :, 0] force = f[0] atom_virial = av[0]And for line 167:
- e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval( + e_pth, f_pth, _, _ae_pth, _av_pth = dp_pth.eval(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_fparam_aparam.py` around lines 127 - 139, Unpack unused value from dp.eval by prefixing it with an underscore (e.g., change "e, f, v, ae, av = dp.eval(...)" to "e, f, _v, ae, av = dp.eval(...)" ), remove the unused assignment "nloc = 6", and either remove or prefix unused path variables "ae_pth" and "av_pth" with an underscore (e.g., "_ae_pth", "_av_pth") so static analysis (ruff) no longer reports F841; then run the test/lint suite to confirm no unused-variable errors remain.source/tests/infer/gen_dpa2.py (1)
171-196: Prefix unused unpacked variables with underscore to passruff check.Per coding guidelines,
ruff check .must pass before CI. These unpacked variables are flagged as unused:
- Line 171:
v1- Line 176:
v_np- Line 183:
v_pth,ae_pth,av_pth- Line 190:
f_pth_np,ae_pth_np,av_pth_np♻️ Proposed fix
- e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) + e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) print(f"\n// PBC total energy: {e1[0, 0]:.18e}") # noqa: T201 print_cpp_values("PBC reference values", ae1, f1, av1) # ---- 5. Run inference for NoPbc test ---- - e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True) + e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True) print(f"\n// NoPbc total energy: {e_np[0, 0]:.18e}") # noqa: T201 print_cpp_values("NoPbc reference values", ae_np, f_np, av_np) # ---- 6. Verify .pth gives same results ---- if os.path.exists(pth_path): dp_pth = DeepPot(pth_path) - e_pth, f_pth, v_pth, ae_pth, av_pth = dp_pth.eval( + e_pth, f_pth, _v_pth, _ae_pth, _av_pth = dp_pth.eval( coord, box, atype, atomic=True ) print(f"\n// .pth PBC total energy: {e_pth[0, 0]:.18e}") # noqa: T201 print(f"// .pth vs .pt2 energy diff: {abs(e1[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201 print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f1 - f_pth)):.2e}") # noqa: T201 - e_pth_np, f_pth_np, _, ae_pth_np, av_pth_np = dp_pth.eval( + e_pth_np, _f_pth_np, _, _ae_pth_np, _av_pth_np = dp_pth.eval( coord, None, atype, atomic=True )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_dpa2.py` around lines 171 - 196, The review flags unused unpacked variables from calls to DeepPot.eval (e.g., dp.eval and dp_pth.eval) that break ruff; fix by prefixing those unused names with an underscore in the unpacking (e.g., change v1 to _v1, v_np to _v_np, v_pth/ae_pth/av_pth to _v_pth/_ae_pth/_av_pth, and f_pth_np/ae_pth_np/av_pth_np to _f_pth_np/_ae_pth_np/_av_pth_np) so the dp.eval and dp_pth.eval calls keep the same outputs but satisfy the linter.source/tests/infer/gen_dpa1.py (1)
143-163: Prefix unused unpacked variables with underscore to passruff check.Same issue as gen_dpa2.py - these unpacked variables are flagged as unused:
- Line 143:
v1- Line 148:
v_np- Line 154:
v_pth,ae_pth,av_pth- Line 159:
f_pth_np,ae_pth_np,av_pth_np♻️ Proposed fix
- e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) + e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) print(f"\n// PBC total energy: {e1[0, 0]:.18e}") # noqa: T201 print_cpp_values("PBC reference values", ae1, f1, av1) # ---- 5. Run inference for NoPbc test ---- - e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True) + e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True) print(f"\n// NoPbc total energy: {e_np[0, 0]:.18e}") # noqa: T201 print_cpp_values("NoPbc reference values", ae_np, f_np, av_np) # ---- 6. Verify .pth gives same results ---- dp_pth = DeepPot(pth_path) - e_pth, f_pth, v_pth, ae_pth, av_pth = dp_pth.eval(coord, box, atype, atomic=True) + e_pth, f_pth, _v_pth, _ae_pth, _av_pth = dp_pth.eval(coord, box, atype, atomic=True) print(f"\n// .pth PBC total energy: {e_pth[0, 0]:.18e}") # noqa: T201 print(f"// .pth vs .pt2 energy diff: {abs(e1[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201 print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f1 - f_pth)):.2e}") # noqa: T201 - e_pth_np, f_pth_np, _, ae_pth_np, av_pth_np = dp_pth.eval( + e_pth_np, _f_pth_np, _, _ae_pth_np, _av_pth_np = dp_pth.eval( coord, None, atype, atomic=True )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_dpa1.py` around lines 143 - 163, The unpacked return values from DeepPot.eval contain unused variables that should be prefixed with an underscore to satisfy ruff; update the unpackings in gen_dpa1.py where DeepPot.eval is called: change v1 -> _v1 for the first call (e1, f1, _v1, ae1, av1), v_np -> _v_np for the NoPbc call (e_np, f_np, _v_np, ae_np, av_np), v_pth -> _v_pth and also prefix ae_pth, av_pth with underscores if unused (e_pth, f_pth, _v_pth, _ae_pth, _av_pth) for the .pth PBC call, and similarly for the .pth NoPbc call change f_pth_np -> _f_pth_np, ae_pth_np -> _ae_pth_np, av_pth_np -> _av_pth_np when they are not used; keep all other names unchanged and run ruff to verify.deepmd/pt_expt/infer/deep_eval.py (1)
702-710: Consider handling None exported_module for .pt2 models.The
get_model()method returnsself.exported_module, which isNonefor.pt2models. This could be confusing for callers.Consider adding documentation or a check
def get_model(self) -> torch.nn.Module: """Get the exported model module. Returns ------- torch.nn.Module - The exported model module. + The exported model module. Returns None for .pt2 (AOTInductor) models + as they use a different runner interface. """ return self.exported_module🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/infer/deep_eval.py` around lines 702 - 710, get_model currently returns self.exported_module which is None for .pt2 models; update get_model to explicitly handle that case by checking if self.exported_module is None and either raise a clear exception (e.g., RuntimeError with message mentioning .pt2 models and how to obtain a usable model) or return a documented alternative, and update the get_model docstring to state the behavior for .pt2 models; reference the get_model method and the exported_module attribute when making the change.deepmd/pt_expt/utils/serialization.py (1)
355-362: Note thattorch._inductor.aoti_compile_and_packageis an internal PyTorch API.While this function is available in PyTorch 2.6+ (and the project requires 2.10.0), internal APIs prefixed with
_may change between versions without notice. Consider adding a version comment documenting this dependency or monitor PyTorch release notes for API changes in future updates.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/pt_expt/utils/serialization.py` around lines 355 - 362, The code calls the internal API torch._inductor.aoti_compile_and_package (used on the local variables exported and model_file after _trace_and_export) which may break across PyTorch versions; add a clear inline comment next to the aoti_compile_and_package call documenting the required PyTorch minimum (e.g., 2.6+) and that this is an internal API subject to change, and add a simple runtime guard that checks torch.__version__ and emits a warning or raises a descriptive error if the installed torch is outside the supported range so maintainers are alerted when upgrading PyTorch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/dpmodel/descriptor/dpa1.py`:
- Line 1060: Remove the now-unused assignment to nall (the line "nall =
atype_ext.shape[1]") in dpa1.py since the gather refactor made nall unused;
locate the assignment near the atype_ext usage in the descriptor code (search
for "atype_ext" or "nall" in dpa1.py) and delete that line, then run ruff check
. and ruff format . to ensure no F841/formatting errors remain.
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 151-169: The code uses the Beta API
torch._inductor.aoti_load_package inside _load_pt2 to load .pt2 AOTInductor
packages; update _load_pt2 to (1) guard the import/use of aoti_load_package with
a clear fallback and informative error (catch
ImportError/AttributeError/RuntimeError when importing or calling
aoti_load_package and raise/log a message that the API is beta and may change),
(2) perform a runtime compatibility check (e.g., verify torch._inductor exists
and torch.__version__ is within a supported range) before calling
aoti_load_package, and (3) preserve and expose metadata-based constraints
(self.metadata, self._output_keys, self.rcut, self.type_map) and add an explicit
runtime validation step before inference that checks input tensors match
export-time size, dtype, and stride so callers get an actionable error if they
mismatch rather than a mysterious break inside _pt2_runner or exported_module.
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 641-647: Replace the assert in DeepPotPTExpt::extract_outputs with
a runtime check that throws a descriptive exception: verify flat_outputs.size()
== output_keys.size(), and if not, throw a std::runtime_error (or similar)
including both sizes and context (e.g., "DeepPotPTExpt::extract_outputs:
output_keys size X != flat_outputs size Y") before the loop that assigns
output_map[output_keys[i]] = flat_outputs[i]; this prevents out-of-bounds access
in release builds.
- Around line 235-399: read_zip_entry currently assumes specific member names
and uncompressed (stored) entries, ignores the file_content parameter, and uses
assert for output arity which is unsafe in release; update read_zip_entry to (1)
honor a provided file_content parameter by returning it when non-empty or using
it as the source instead of always reading zip_path, (2) validate and reject
compressed entries by reading the compression method from the central
directory/local header and throw deepmd::deepmd_exception if compression != 0
(or implement decompression), (3) avoid hardcoding names by making
caller-supplied expected entry names configurable/fall back to prefixes like
"extra/" and attempt alternate names, and (4) replace the
assert(flat_outputs.size() == output_keys.size()) with a runtime check that
throws deepmd::deepmd_exception containing both sizes and context (use symbols
flat_outputs and output_keys to locate the check). Ensure ZIP64 and bounds
checks are tightened to avoid reading out-of-range indexes when parsing headers.
In `@source/api_cc/tests/test_deeppot_dpa_ptexpt.cc`:
- Around line 376-383: The loop that fills coord_vir only copies the first nvir
scalar values, leaving most of the 3*nvir coordinate components zero; update the
copy so you copy all 3 coordinates per virtual atom (e.g., for each ii set
coord_vir[ii*3 + 0..2] = coord[ii*3 + 0..2] or use std::copy_n(&coord[ii*3], 3,
&coord_vir[ii*3])) so coord_vir contains the full (x,y,z) triples before
inserting into coord; keep atype_vir as is and then insert coord_vir at the
beginning as before.
In `@source/api_cc/tests/test_deeppot_dpa2_pt.cc`:
- Around line 276-296: The test TestInferDeepPotDpa2PtNoPbc currently uses a
hard-coded nlist_data with nghost == 0 so it never exercises InputNlist.mapping
or the ghost/virtual-atom remap; replace or add a test case that builds a
neighbor list with ghosts/mapped indices (e.g. call the existing _build_nlist()
helper or construct an InputNlist with non-zero nghost and a mapping array)
instead of the local-only nlist_data/convert_nlist path, and then call
dp.compute(ener, force, virial, coord, atype, box, 0, inlist, 0) to verify the
mapped-ghost behavior is used.
- Around line 15-17: The EPSILON macro is currently too loose for float tests
(1e-1) and can hide large regressions; update the test to tighten tolerances by
either (a) reducing the float branch of EPSILON to a stricter value (e.g. 1e-4
or similar) or (b) replace the single EPSILON with per-quantity tolerances (e.g.
ENERGY_EPSILON, FORCE_EPSILON, VIRIAL_EPSILON) and use those where comparisons
occur; ensure these new macros/variables are chosen based on VALUETYPE (float vs
double) so comparisons in the test code that reference EPSILON are replaced with
the appropriate more-specific tolerance.
In `@source/api_cc/tests/test_deeppot_dpa3_ptexpt.cc`:
- Around line 387-391: The loop that fills coord_vir copies only nvir scalars
instead of nvir*3, leaving most coordinates zero; update the copy to initialize
all 3 components per virtual atom (use coord_vir and coord) by copying nvir*3
elements (or iterate per atom and copy 3 components each) so the injected
virtual-atom geometry matches the intended coordinates; reference the variables
coord_vir, coord and nvir when making this change.
In `@source/api_cc/tests/test_deeppot_ptexpt.cc`:
- Around line 358-365: The virtual-atom coordinate block only copies one scalar
per atom, leaving other components zero; change the population of coord_vir so
you copy all 3 coordinates per virtual atom (use coord_vir[3*ii + k] =
coord[3*ii + k] for k = 0..2) before inserting into coord, so coord_vir, coord,
nvir and atype_vir are consistent and the full 3D positions are preserved.
In `@source/tests/infer/gen_dpa3.py`:
- Around line 175-200: The current block only prints differences between the
.pth and .pt2 runs; replace/add prints with real parity assertions: after
creating dp_pth and calling dp_pth.eval, assert that e_pth ≈ e1, f_pth ≈ f1,
v_pth ≈ v1, ae_pth ≈ ae1 and av_pth ≈ av1 (and similarly assert e_pth_np ≈ e_np,
f_pth_np ≈ f_np, ae_pth_np ≈ ae_np, av_pth_np ≈ av_np) using a numeric tolerance
(e.g. numpy.testing.assert_allclose or assert np.allclose(..., atol=...,
rtol=...)); reference the DeepPot class and the dp_pth.eval calls and the tensor
names (e1, f1, v1, ae1, av1, e_pth, f_pth, v_pth, ae_pth, av_pth, e_np, f_np,
e_pth_np, f_pth_np, ae_pth_np, av_pth_np) so CI fails if exports diverge and to
eliminate the unused-binding warnings.
- Around line 22-43: The try/except in the torch custom-op loader is swallowing
all errors; change it to only catch expected exceptions (e.g., ImportError,
AttributeError, OSError) around the import and load logic, and when the fallback
glob search for libdeepmd_op_pt.so fails or load_library raises an error, emit a
warning (use warnings.warn or processLogger.warning) that includes the exception
details and context (referencing torch.ops.deepmd and the border_op check and
the torch.ops.load_library call), rather than silently passing; ensure the
warning is produced when the fallback lookup fails so real export/load issues
surface.
---
Nitpick comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 702-710: get_model currently returns self.exported_module which is
None for .pt2 models; update get_model to explicitly handle that case by
checking if self.exported_module is None and either raise a clear exception
(e.g., RuntimeError with message mentioning .pt2 models and how to obtain a
usable model) or return a documented alternative, and update the get_model
docstring to state the behavior for .pt2 models; reference the get_model method
and the exported_module attribute when making the change.
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 355-362: The code calls the internal API
torch._inductor.aoti_compile_and_package (used on the local variables exported
and model_file after _trace_and_export) which may break across PyTorch versions;
add a clear inline comment next to the aoti_compile_and_package call documenting
the required PyTorch minimum (e.g., 2.6+) and that this is an internal API
subject to change, and add a simple runtime guard that checks torch.__version__
and emits a warning or raises a descriptive error if the installed torch is
outside the supported range so maintainers are alerted when upgrading PyTorch.
In `@source/tests/infer/gen_dpa1.py`:
- Around line 143-163: The unpacked return values from DeepPot.eval contain
unused variables that should be prefixed with an underscore to satisfy ruff;
update the unpackings in gen_dpa1.py where DeepPot.eval is called: change v1 ->
_v1 for the first call (e1, f1, _v1, ae1, av1), v_np -> _v_np for the NoPbc call
(e_np, f_np, _v_np, ae_np, av_np), v_pth -> _v_pth and also prefix ae_pth,
av_pth with underscores if unused (e_pth, f_pth, _v_pth, _ae_pth, _av_pth) for
the .pth PBC call, and similarly for the .pth NoPbc call change f_pth_np ->
_f_pth_np, ae_pth_np -> _ae_pth_np, av_pth_np -> _av_pth_np when they are not
used; keep all other names unchanged and run ruff to verify.
In `@source/tests/infer/gen_dpa2.py`:
- Around line 171-196: The review flags unused unpacked variables from calls to
DeepPot.eval (e.g., dp.eval and dp_pth.eval) that break ruff; fix by prefixing
those unused names with an underscore in the unpacking (e.g., change v1 to _v1,
v_np to _v_np, v_pth/ae_pth/av_pth to _v_pth/_ae_pth/_av_pth, and
f_pth_np/ae_pth_np/av_pth_np to _f_pth_np/_ae_pth_np/_av_pth_np) so the dp.eval
and dp_pth.eval calls keep the same outputs but satisfy the linter.
In `@source/tests/infer/gen_fparam_aparam.py`:
- Around line 127-139: Unpack unused value from dp.eval by prefixing it with an
underscore (e.g., change "e, f, v, ae, av = dp.eval(...)" to "e, f, _v, ae, av =
dp.eval(...)" ), remove the unused assignment "nloc = 6", and either remove or
prefix unused path variables "ae_pth" and "av_pth" with an underscore (e.g.,
"_ae_pth", "_av_pth") so static analysis (ruff) no longer reports F841; then run
the test/lint suite to confirm no unused-variable errors remain.
In `@source/tests/pt_expt/infer/test_deep_eval.py`:
- Around line 490-791: The .pt2 test class TestDeepEvalEnerPt2 is missing parity
with the .pte suite: add equivalents of test_dynamic_shapes and the ASE
neighbor-list tests (those that exercise variable frame/atom counts and ASE
neighbor-list paths) or factor shared checks into a base TestDeepEvalEnerMixin
used by both TestDeepEvalEnerPt2 and TestDeepEvalEner so both run the same
coverage; locate the suite in TestDeepEvalEnerPt2 and either (a) implement two
new tests named test_dynamic_shapes and the ASE neighbor-list tests that call
DeepPot.eval (and the model forward reference) with dynamic-shaped inputs and
ASE neighbor-list-configured inputs, or (b) extract the common checks into a
mixin class (e.g., DeepEvalSharedTests) and have both TestDeepEvalEnerPt2 and
TestDeepEvalEner inherit it so DeepPot.eval, model.forward, and ASE
neighbor-list code paths are exercised for .pt2 as well.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ec83f497-49cf-4865-907f-5e1894443f3d
📒 Files selected for processing (42)
deepmd/backend/pt_expt.pydeepmd/dpmodel/descriptor/dpa1.pydeepmd/dpmodel/descriptor/dpa2.pydeepmd/dpmodel/descriptor/dpa3.pydeepmd/dpmodel/descriptor/repflows.pydeepmd/dpmodel/descriptor/repformers.pydeepmd/dpmodel/model/make_model.pydeepmd/dpmodel/utils/exclude_mask.pydeepmd/dpmodel/utils/nlist.pydeepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/utils/serialization.pysource/api_c/tests/test_deeppot_a_fparam_aparam_ptexpt.ccsource/api_c/tests/test_deeppot_a_ptexpt.ccsource/api_cc/include/DeepPotPTExpt.hsource/api_cc/include/common.hsource/api_cc/src/DeepPot.ccsource/api_cc/src/DeepPotJAX.ccsource/api_cc/src/DeepPotPD.ccsource/api_cc/src/DeepPotPT.ccsource/api_cc/src/DeepPotPTExpt.ccsource/api_cc/src/DeepSpinPT.ccsource/api_cc/src/common.ccsource/api_cc/tests/test_deeppot_a_fparam_aparam_nframes_ptexpt.ccsource/api_cc/tests/test_deeppot_a_fparam_aparam_pt.ccsource/api_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa1_pt.ccsource/api_cc/tests/test_deeppot_dpa2_pt.ccsource/api_cc/tests/test_deeppot_dpa2_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa3_pt.ccsource/api_cc/tests/test_deeppot_dpa3_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa_ptexpt.ccsource/api_cc/tests/test_deeppot_ptexpt.ccsource/api_cc/tests/test_utils.hsource/tests/infer/convert-models.shsource/tests/infer/deeppot_dpa.pthsource/tests/infer/deeppot_sea.pthsource/tests/infer/fparam_aparam.pthsource/tests/infer/gen_dpa1.pysource/tests/infer/gen_dpa2.pysource/tests/infer/gen_dpa3.pysource/tests/infer/gen_fparam_aparam.pysource/tests/pt_expt/infer/test_deep_eval.py
| TYPED_TEST(TestInferDeepPotDpa2PtNoPbc, cpu_lmp_nlist) { | ||
| using VALUETYPE = TypeParam; | ||
| std::vector<VALUETYPE>& coord = this->coord; | ||
| std::vector<int>& atype = this->atype; | ||
| std::vector<VALUETYPE>& box = this->box; | ||
| std::vector<VALUETYPE>& expected_f = this->expected_f; | ||
| int& natoms = this->natoms; | ||
| double& expected_tot_e = this->expected_tot_e; | ||
| std::vector<VALUETYPE>& expected_tot_v = this->expected_tot_v; | ||
| deepmd::DeepPot& dp = this->dp; | ||
| double ener; | ||
| std::vector<VALUETYPE> force, virial; | ||
|
|
||
| std::vector<std::vector<int> > nlist_data = { | ||
| {1, 2, 3, 4, 5}, {0, 2, 3, 4, 5}, {0, 1, 3, 4, 5}, | ||
| {0, 1, 2, 4, 5}, {0, 1, 2, 3, 5}, {0, 1, 2, 3, 4}}; | ||
| std::vector<int> ilist(natoms), numneigh(natoms); | ||
| std::vector<int*> firstneigh(natoms); | ||
| deepmd::InputNlist inlist(natoms, &ilist[0], &numneigh[0], &firstneigh[0]); | ||
| convert_nlist(inlist, nlist_data); | ||
| dp.compute(ener, force, virial, coord, atype, box, 0, inlist, 0); |
There was a problem hiding this comment.
This nlist case never exercises the mapping/ghost path.
The hard-coded neighbor list is local-only with nghost == 0, so it does not cover InputNlist.mapping or the ghost/virtual-atom remap this PR fixes in the C++ backends. Please add at least one DPA2 .pth nlist case built through _build_nlist() or another mapped-ghost configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/api_cc/tests/test_deeppot_dpa2_pt.cc` around lines 276 - 296, The test
TestInferDeepPotDpa2PtNoPbc currently uses a hard-coded nlist_data with nghost
== 0 so it never exercises InputNlist.mapping or the ghost/virtual-atom remap;
replace or add a test case that builds a neighbor list with ghosts/mapped
indices (e.g. call the existing _build_nlist() helper or construct an InputNlist
with non-zero nghost and a mapping array) instead of the local-only
nlist_data/convert_nlist path, and then call dp.compute(ener, force, virial,
coord, atype, box, 0, inlist, 0) to verify the mapped-ghost behavior is used.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@source/tests/infer/gen_fparam_aparam.py`:
- Around line 86-93: The export fallback is not guarded: if
pt_deserialize_to_file(pth_path, ...) raises, later code still opens
"fparam_aparam.pth" and may read stale data; modify the flow so that you only
proceed with the parity check/load when the export actually succeeded—e.g., set
a boolean (export_succeeded) or check that pth_path was created by
pt_deserialize_to_file before any subsequent open/read or parity comparison;
update references around pth_path, pt_deserialize_to_file, and the later
file-open/parity-check block so the latter is skipped when export_succeeded is
false (or the file is absent/newly written).
- Around line 132-139: The test currently calls dp.eval(...) binding e, f, v,
ae, av but only compares energy and force diffs; add assertions and diff
reporting for virial and atomic outputs: compare v to the saved reference
(v_pth) and ae/av to ae_pth/av_pth respectively, compute and print max/relative
differences like done for e and f, and fail the test on mismatch. Locate the
dp.eval call and follow the existing pattern used for energy/force diffs to add
checks for v, ae, and av so virial and per-atom values cannot drift silently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9f653401-c78d-4aa6-9776-38de7659a1a9
📒 Files selected for processing (3)
source/tests/infer/convert-models.shsource/tests/infer/gen_dpa1.pysource/tests/infer/gen_fparam_aparam.py
🚧 Files skipped from review as they are similar to previous changes (1)
- source/tests/infer/convert-models.sh
| pth_path = os.path.join(base_dir, "fparam_aparam.pth") | ||
| print(f"Exporting to {pth_path} ...") # noqa: T201 | ||
| try: | ||
| pt_deserialize_to_file(pth_path, copy.deepcopy(data)) | ||
| except RuntimeError as e: | ||
| # Custom ops (e.g. tabulate_fusion_se_t_tebd) may not be available | ||
| # in all build environments; .pth generation is not critical. | ||
| print(f"WARNING: .pth export failed ({e}), skipping.") # noqa: T201 |
There was a problem hiding this comment.
Guard the .pth parity check on a successful export.
If pt_deserialize_to_file() fails on Line 89, Line 171 still opens fparam_aparam.pth. That turns the fallback into a crash, and if an old artifact is still on disk it can compare against stale weights instead.
Suggested fix
pth_path = os.path.join(base_dir, "fparam_aparam.pth")
+ pth_exported = False
print(f"Exporting to {pth_path} ...") # noqa: T201
try:
pt_deserialize_to_file(pth_path, copy.deepcopy(data))
+ pth_exported = True
except RuntimeError as e:
# Custom ops (e.g. tabulate_fusion_se_t_tebd) may not be available
# in all build environments; .pth generation is not critical.
print(f"WARNING: .pth export failed ({e}), skipping.") # noqa: T201
@@
- dp_pth = DeepPot(pth_path)
- e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval(
- coord,
- box,
- atype,
- fparam=fparam_val,
- aparam=aparam_val,
- atomic=True,
- )
- print(f"\n// .pth vs .pt2 energy diff: {abs(e[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201
- print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f - f_pth)):.2e}") # noqa: T201
+ if pth_exported:
+ dp_pth = DeepPot(pth_path)
+ e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval(
+ coord,
+ box,
+ atype,
+ fparam=fparam_val,
+ aparam=aparam_val,
+ atomic=True,
+ )
+ print(f"\n// .pth vs .pt2 energy diff: {abs(e[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201
+ print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f - f_pth)):.2e}") # noqa: T201
+ else:
+ print("\n// Skipped .pth vs .pt2 verification because .pth export failed.") # noqa: T201Also applies to: 170-179
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/tests/infer/gen_fparam_aparam.py` around lines 86 - 93, The export
fallback is not guarded: if pt_deserialize_to_file(pth_path, ...) raises, later
code still opens "fparam_aparam.pth" and may read stale data; modify the flow so
that you only proceed with the parity check/load when the export actually
succeeded—e.g., set a boolean (export_succeeded) or check that pth_path was
created by pt_deserialize_to_file before any subsequent open/read or parity
comparison; update references around pth_path, pt_deserialize_to_file, and the
later file-open/parity-check block so the latter is skipped when
export_succeeded is false (or the file is absent/newly written).
| e, f, v, ae, av = dp.eval( | ||
| coord, | ||
| box, | ||
| atype, | ||
| fparam=fparam_val, | ||
| aparam=aparam_val, | ||
| atomic=True, | ||
| ) |
There was a problem hiding this comment.
Finish the parity check for virial and atomic outputs.
Line 132 binds v, and Line 172 binds ae_pth/av_pth, but the script only reports energy and force diffs. That means virial or atomic-output drift can slip through silently even though those values are part of the generated references.
Suggested fix
- e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval(
+ e_pth, f_pth, v_pth, ae_pth, av_pth = dp_pth.eval(
coord,
box,
atype,
fparam=fparam_val,
aparam=aparam_val,
atomic=True,
)
print(f"\n// .pth vs .pt2 energy diff: {abs(e[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201
print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f - f_pth)):.2e}") # noqa: T201
+ print(f"// .pth vs .pt2 virial max diff: {np.max(np.abs(v - v_pth)):.2e}") # noqa: T201
+ print(f"// .pth vs .pt2 atomic energy max diff: {np.max(np.abs(ae - ae_pth)):.2e}") # noqa: T201
+ print(f"// .pth vs .pt2 atomic virial max diff: {np.max(np.abs(av - av_pth)):.2e}") # noqa: T201As per coding guidelines, **/*.py: Always run ruff check . and ruff format . before committing changes or CI will fail.
Also applies to: 172-181
🧰 Tools
🪛 GitHub Check: CodeQL
[notice] 136-136: Unused local variable
Variable nloc is not used.
🪛 Ruff (0.15.4)
[warning] 132-132: Unpacked variable v is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/tests/infer/gen_fparam_aparam.py` around lines 132 - 139, The test
currently calls dp.eval(...) binding e, f, v, ae, av but only compares energy
and force diffs; add assertions and diff reporting for virial and atomic
outputs: compare v to the saved reference (v_pth) and ae/av to ae_pth/av_pth
respectively, compute and print max/relative differences like done for e and f,
and fail the test on mismatch. Locate the dp.eval call and follow the existing
pattern used for energy/force diffs to add checks for v, ae, and av so virial
and per-atom values cannot drift silently.
| for (int kk = 0; kk < nframes; ++kk) { | ||
| std::copy(in.begin() + kk * nall * ndim, | ||
| in.begin() + kk * nall * ndim + nloc * ndim, | ||
| out.begin() + kk * nloc * ndim); |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| out.resize(static_cast<size_t>(nframes) * nloc * ndim); | ||
| for (int kk = 0; kk < nframes; ++kk) { | ||
| std::copy(in.begin() + kk * nall * ndim, | ||
| in.begin() + kk * nall * ndim + nloc * ndim, |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| out.resize(static_cast<size_t>(nframes) * nloc * ndim); | ||
| for (int kk = 0; kk < nframes; ++kk) { | ||
| std::copy(in.begin() + kk * nall * ndim, | ||
| in.begin() + kk * nall * ndim + nloc * ndim, |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| const int nframes = 1) { | ||
| out.resize(static_cast<size_t>(nframes) * nloc * ndim); | ||
| for (int kk = 0; kk < nframes; ++kk) { | ||
| std::copy(in.begin() + kk * nall * ndim, |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
There was a problem hiding this comment.
🧹 Nitpick comments (1)
source/tests/pt_expt/infer/test_deep_eval.py (1)
517-523: Refactor to save and restore the default device dynamically to eliminate fragile coupling.The hardcoded
"cuda:9999999"couples this test to the specific value set insource/tests/pt/__init__.py. If that module's configuration changes, this restoration would become incorrect.♻️ Proposed refactor
- # Temporarily clear default device to avoid poisoning AOTInductor - # compilation (tests/pt/__init__.py sets it to "cuda:9999999"). - torch.set_default_device(None) - try: - deserialize_to_file(cls.tmpfile.name, cls.model_data) - finally: - torch.set_default_device("cuda:9999999") + # Temporarily clear default device to avoid poisoning AOTInductor + # compilation. + prev_device = torch.get_default_device() + torch.set_default_device(None) + try: + deserialize_to_file(cls.tmpfile.name, cls.model_data) + finally: + torch.set_default_device(prev_device)The project requires PyTorch 2.10.0, and
torch.get_default_device()has been available since PyTorch 2.4, so this refactor is compatible with the project's dependencies.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt_expt/infer/test_deep_eval.py` around lines 517 - 523, Replace the hardcoded restoration of the default device with a dynamic save/restore: before calling torch.set_default_device(None) capture the current device using torch.get_default_device(), then call deserialize_to_file(cls.tmpfile.name, cls.model_data), and finally restore the saved device via torch.set_default_device(saved_device); ensure you reference the same symbols (torch.get_default_device, torch.set_default_device, deserialize_to_file, cls.tmpfile, cls.model_data) so the test no longer depends on the magic "cuda:9999999" value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@source/tests/pt_expt/infer/test_deep_eval.py`:
- Around line 517-523: Replace the hardcoded restoration of the default device
with a dynamic save/restore: before calling torch.set_default_device(None)
capture the current device using torch.get_default_device(), then call
deserialize_to_file(cls.tmpfile.name, cls.model_data), and finally restore the
saved device via torch.set_default_device(saved_device); ensure you reference
the same symbols (torch.get_default_device, torch.set_default_device,
deserialize_to_file, cls.tmpfile, cls.model_data) so the test no longer depends
on the magic "cuda:9999999" value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 735f1651-98fd-4b60-9de7-4d06506018a5
📒 Files selected for processing (1)
source/tests/pt_expt/infer/test_deep_eval.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5298 +/- ##
==========================================
- Coverage 82.37% 82.22% -0.15%
==========================================
Files 780 793 +13
Lines 78220 81098 +2878
Branches 3677 4003 +326
==========================================
+ Hits 64433 66686 +2253
- Misses 12614 13203 +589
- Partials 1173 1209 +36 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
source/tests/infer/gen_fparam_aparam.py (1)
176-176:⚠️ Potential issue | 🟡 MinorRemove unused variable
nloc.This assignment is never used and will fail
ruff checkwith F841 error.Proposed fix
- nloc = 6 atom_energy = ae[0, :, 0]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_fparam_aparam.py` at line 176, Remove the unused local variable assignment "nloc = 6" from source/tests/infer/gen_fparam_aparam.py (the unused symbol is "nloc"); either delete the line entirely or replace it with a meaningful use of nloc where intended so the variable is referenced (e.g., used in a calculation or assertion), ensuring the F841 ruff error is resolved.
🧹 Nitpick comments (6)
source/tests/infer/gen_dpa2.py (2)
206-211: Prefix unused unpacked variables with underscore to satisfy Ruff.Variables
v1andv_npare unpacked but never used.Suggested fix
- e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) + e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) ... - e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True) + e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_dpa2.py` around lines 206 - 211, The tuple unpackings from dp.eval include unused variables v1 and v_np; update those unpack lines to prefix the unused names with an underscore (e.g., _v1 and _v_np) so Ruff won't complain. Specifically modify the unpack assignments where dp.eval is called (the variables e1, f1, v1, ae1, av1 and e_np, f_np, v_np, ae_np, av_np) to rename the unused v1 and v_np to _v1 and _v_np.
51-52: Same bareexcept Exception: passpattern as other gen_*.py files.Narrow the exception scope and emit a warning when the fallback lookup fails. This will fail
ruff check(BLE001/S110).Suggested fix (apply to all gen_*.py files)
-except Exception: - pass +except (ImportError, AttributeError, OSError, RuntimeError) as exc: + import warnings + warnings.warn(f"Failed to load deepmd custom op library: {exc}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_dpa2.py` around lines 51 - 52, The bare "except Exception: pass" in gen_dpa2.py should be replaced with a narrowed exception catch (e.g., KeyError, LookupError or the specific exception thrown by the fallback lookup) and emit a warning when the fallback lookup fails; update the except to "except <SpecificException> as e" and call warnings.warn(...) or logger.warning(...) with context (include the exception message and that the fallback lookup failed) so the failure is visible and BLE001/S110 is resolved.source/tests/infer/gen_dpa3.py (1)
184-189: Prefix unused unpacked variables with underscore to satisfy Ruff.Variables
v1andv_npare unpacked but never used. As per coding guidelines,ruff checkwill fail on RUF059 warnings.Suggested fix
- e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) + e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) print(f"\n// PBC total energy: {e1[0, 0]:.18e}") # noqa: T201 print_cpp_values("PBC reference values", ae1, f1, av1) # ---- 5. Run inference for NoPbc test ---- - e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True) + e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_dpa3.py` around lines 184 - 189, The unpacked outputs v1 and v_np from dp.eval are unused and trigger Ruff RUF059; rename them to _v1 and _v_np (or simply _ ) where dp.eval is called (lines assigning e1, f1, v1, ae1, av1 and e_np, f_np, v_np, ae_np, av_np) so the unused tuple elements are prefixed with an underscore and the rest of the variables keep their current names; make the change at both calls to dp.eval to satisfy the linter.source/tests/infer/gen_fparam_aparam.py (1)
167-167: Prefix unusedvwith underscore.Variable
vis unpacked but never used. This will trigger Ruff RUF059.Suggested fix
- e, f, v, ae, av = dp.eval( + e, f, _v, ae, av = dp.eval(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_fparam_aparam.py` at line 167, The unpacking from dp.eval currently binds an unused variable v which triggers RUF059; update the tuple unpack to prefix that variable with an underscore (e.g., change "e, f, v, ae, av = dp.eval(...)" to use "_v" instead) so linters treat it as intentionally unused; locate the dp.eval call and rename only the middle unpacked variable to _v (leave e, f, ae, av intact).source/tests/infer/gen_sea.py (1)
47-48: Bareexcept Exception: passwill fail Ruff checks.Same pattern as other gen_*.py files. Narrow the exception scope and emit a warning.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_sea.py` around lines 47 - 48, Replace the bare "except Exception: pass" in source/tests/infer/gen_sea.py with a narrowed exception handler that catches the specific errors that can occur (e.g., OSError, IOError, ValueError) and emit a warning using the warnings module; import warnings if missing and change the block to "except (OSError, IOError, ValueError) as e: warnings.warn(f'gen_sea generation skipped: {e}', stacklevel=2)". This targets the specific except block in gen_sea.py and avoids a silent swallow while satisfying Ruff checks.source/tests/infer/gen_dpa1.py (1)
51-52: Bareexcept Exception: passwill fail Ruff checks.Same pattern as other gen_*.py files. Narrow the exception scope.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_dpa1.py` around lines 51 - 52, Replace the bare "except Exception: pass" in gen_dpa1.py with a narrow, explicit exception handler for the exceptions you expect (e.g., ValueError, KeyError) or at minimum "except Exception as e:" and handle or log e; do not swallow all exceptions—re-raise unexpected ones—so the try/except around the problematic block is explicit and Ruff-compliant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@source/tests/infer/gen_dpa1.py`:
- Around line 192-203: The script instantiates DeepPot(pth_path) without
guarding against a missing .pth file; add a check for the existence/validity of
pth_path before calling DeepPot to avoid a crash when export failed (the same
pattern used in gen_dpa2.py/gen_dpa3.py). If the file is missing or invalid,
skip the .pth parity checks (the block that computes e_pth, f_pth, v_pth,
ae_pth, av_pth and the subsequent NoPbc eval) or log an informative error and
exit gracefully; reference the DeepPot class and the pth_path variable and the
evaluation results (e_pth, f_pth, e_pth_np, f_pth_np) when implementing the
guard.
---
Duplicate comments:
In `@source/tests/infer/gen_fparam_aparam.py`:
- Line 176: Remove the unused local variable assignment "nloc = 6" from
source/tests/infer/gen_fparam_aparam.py (the unused symbol is "nloc"); either
delete the line entirely or replace it with a meaningful use of nloc where
intended so the variable is referenced (e.g., used in a calculation or
assertion), ensuring the F841 ruff error is resolved.
---
Nitpick comments:
In `@source/tests/infer/gen_dpa1.py`:
- Around line 51-52: Replace the bare "except Exception: pass" in gen_dpa1.py
with a narrow, explicit exception handler for the exceptions you expect (e.g.,
ValueError, KeyError) or at minimum "except Exception as e:" and handle or log
e; do not swallow all exceptions—re-raise unexpected ones—so the try/except
around the problematic block is explicit and Ruff-compliant.
In `@source/tests/infer/gen_dpa2.py`:
- Around line 206-211: The tuple unpackings from dp.eval include unused
variables v1 and v_np; update those unpack lines to prefix the unused names with
an underscore (e.g., _v1 and _v_np) so Ruff won't complain. Specifically modify
the unpack assignments where dp.eval is called (the variables e1, f1, v1, ae1,
av1 and e_np, f_np, v_np, ae_np, av_np) to rename the unused v1 and v_np to _v1
and _v_np.
- Around line 51-52: The bare "except Exception: pass" in gen_dpa2.py should be
replaced with a narrowed exception catch (e.g., KeyError, LookupError or the
specific exception thrown by the fallback lookup) and emit a warning when the
fallback lookup fails; update the except to "except <SpecificException> as e"
and call warnings.warn(...) or logger.warning(...) with context (include the
exception message and that the fallback lookup failed) so the failure is visible
and BLE001/S110 is resolved.
In `@source/tests/infer/gen_dpa3.py`:
- Around line 184-189: The unpacked outputs v1 and v_np from dp.eval are unused
and trigger Ruff RUF059; rename them to _v1 and _v_np (or simply _ ) where
dp.eval is called (lines assigning e1, f1, v1, ae1, av1 and e_np, f_np, v_np,
ae_np, av_np) so the unused tuple elements are prefixed with an underscore and
the rest of the variables keep their current names; make the change at both
calls to dp.eval to satisfy the linter.
In `@source/tests/infer/gen_fparam_aparam.py`:
- Line 167: The unpacking from dp.eval currently binds an unused variable v
which triggers RUF059; update the tuple unpack to prefix that variable with an
underscore (e.g., change "e, f, v, ae, av = dp.eval(...)" to use "_v" instead)
so linters treat it as intentionally unused; locate the dp.eval call and rename
only the middle unpacked variable to _v (leave e, f, ae, av intact).
In `@source/tests/infer/gen_sea.py`:
- Around line 47-48: Replace the bare "except Exception: pass" in
source/tests/infer/gen_sea.py with a narrowed exception handler that catches the
specific errors that can occur (e.g., OSError, IOError, ValueError) and emit a
warning using the warnings module; import warnings if missing and change the
block to "except (OSError, IOError, ValueError) as e: warnings.warn(f'gen_sea
generation skipped: {e}', stacklevel=2)". This targets the specific except block
in gen_sea.py and avoids a silent swallow while satisfying Ruff checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3d4976b0-43fa-4dd1-b2fb-b552ce7e485e
📒 Files selected for processing (6)
source/install/test_cc_local.shsource/tests/infer/gen_dpa1.pysource/tests/infer/gen_dpa2.pysource/tests/infer/gen_dpa3.pysource/tests/infer/gen_fparam_aparam.pysource/tests/infer/gen_sea.py
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
source/api_cc/src/DeepPotPTExpt.cc (2)
653-659:⚠️ Potential issue | 🟠 MajorMake the output-count check survive release builds.
assert()disappears under-DNDEBUG, so a badoutput_keys.json/ model-output mismatch turns Line 658 into unchecked indexing. Throw a descriptive runtime exception before the loop instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 653 - 659, Replace the fragile assert in DeepPotPTExpt::extract_outputs with an explicit size check that throws a descriptive std::runtime_error when flat_outputs.size() != output_keys.size(); before the loop, validate the counts and raise an error that includes both flat_outputs.size() and output_keys.size() and contextual text referencing output_map/output_keys so the mismatch is visible in release builds.
549-611:⚠️ Potential issue | 🟠 MajorHonor
file_contentor reject it explicitly.
DeepPotPTExpt::init()still reads metadata and constructs the loader frommodelonly. If a caller supplies the package bytes throughfile_content, this path silently ignores them and loads a different source instead. Either wire the in-memory payload through both steps or fail fast whenfile_contentis non-empty.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 549 - 611, DeepPotPTExpt::init currently ignores the file_content buffer; either consume it or reject it. Modify init so when file_content is non-empty you call a zip-reader that accepts an in-memory buffer (instead of read_zip_entry(path,...)) to extract "extra/model_def_script.json" and "extra/output_keys.json" and then construct the torch::inductor::AOTIModelPackageLoader from the in-memory package (or use an overload that accepts bytes) when creating loader; if your AOTIModelPackageLoader cannot be created from bytes, add an explicit check that throws/logs and returns when file_content is non-empty to fail fast. Ensure references: DeepPotPTExpt::init, file_content, read_zip_entry (replace or add read_zip_entry_from_buffer), parse_json, and loader are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 92-126: The parser currently calls get() until it finds a closing
quote (in parse_string_raw) and consumes expected delimiters/tokens elsewhere
without EOF checks; update parse_string_raw to check peek() or an is_eof()
condition before each get() and escape-char read and throw a deepmd_exception if
EOF or an unexpected character is encountered, and apply the same pattern to
every place that unconditionally calls get() for closing delimiters/tokens (the
other token-consuming sections noted in the review), i.e., validate presence of
expected characters before consuming them and throw deepmd_exception on
malformed or truncated JSON instead of reading past the buffer.
- Around line 1095-1120: Before slicing per-frame in the loop, validate that the
batched arrays have exact expected sizes: compute natoms (from atype if
appropriate) and then check coord.size() == nframes * natoms * 3; if box is
non-empty ensure box.size() == nframes * 9; compute dim_fparam and dim_aparam
(expected per-frame widths) and ensure fparam is empty or fparam.size() ==
nframes * dim_fparam and aparam is empty or aparam.size() == nframes *
dim_aparam; only then compute dap = dim_aparam and dfp = dim_fparam (or derive
them as static_cast<int>(aparam.size())/nframes after verifying exact
divisibility) and proceed into the ff loop; on any mismatch return/throw a clear
error referencing nframes, natoms, dim_fparam/dim_aparam,
coord/fparam/aparam/box to avoid silent truncation during per-frame slicing.
---
Duplicate comments:
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 653-659: Replace the fragile assert in
DeepPotPTExpt::extract_outputs with an explicit size check that throws a
descriptive std::runtime_error when flat_outputs.size() != output_keys.size();
before the loop, validate the counts and raise an error that includes both
flat_outputs.size() and output_keys.size() and contextual text referencing
output_map/output_keys so the mismatch is visible in release builds.
- Around line 549-611: DeepPotPTExpt::init currently ignores the file_content
buffer; either consume it or reject it. Modify init so when file_content is
non-empty you call a zip-reader that accepts an in-memory buffer (instead of
read_zip_entry(path,...)) to extract "extra/model_def_script.json" and
"extra/output_keys.json" and then construct the
torch::inductor::AOTIModelPackageLoader from the in-memory package (or use an
overload that accepts bytes) when creating loader; if your
AOTIModelPackageLoader cannot be created from bytes, add an explicit check that
throws/logs and returns when file_content is non-empty to fail fast. Ensure
references: DeepPotPTExpt::init, file_content, read_zip_entry (replace or add
read_zip_entry_from_buffer), parse_json, and loader are updated accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 58df0b9f-a796-41ac-8a02-6a7bf826814b
📒 Files selected for processing (1)
source/api_cc/src/DeepPotPTExpt.cc
source/api_cc/src/DeepPotPTExpt.cc
Outdated
| int natoms = atype.size(); | ||
| int dap = aparam.empty() ? 0 : static_cast<int>(aparam.size()) / nframes; | ||
| int dfp = fparam.empty() ? 0 : static_cast<int>(fparam.size()) / nframes; | ||
| ener.clear(); | ||
| force.clear(); | ||
| virial.clear(); | ||
| if (atomic) { | ||
| atom_energy.clear(); | ||
| atom_virial.clear(); | ||
| } | ||
| for (int ff = 0; ff < nframes; ++ff) { | ||
| std::vector<VALUETYPE> frame_coord(coord.begin() + ff * natoms * 3, | ||
| coord.begin() + (ff + 1) * natoms * 3); | ||
| std::vector<VALUETYPE> frame_box; | ||
| if (!box.empty()) { | ||
| frame_box.assign(box.begin() + ff * 9, box.begin() + (ff + 1) * 9); | ||
| } | ||
| std::vector<VALUETYPE> frame_fparam; | ||
| if (!fparam.empty()) { | ||
| frame_fparam.assign(fparam.begin() + ff * dfp, | ||
| fparam.begin() + (ff + 1) * dfp); | ||
| } | ||
| std::vector<VALUETYPE> frame_aparam; | ||
| if (!aparam.empty()) { | ||
| frame_aparam.assign(aparam.begin() + ff * dap, | ||
| aparam.begin() + (ff + 1) * dap); |
There was a problem hiding this comment.
Validate batched input sizes before slicing per frame.
natoms, dfp, and dap are derived with integer division in both loops. If atype, box, fparam, or aparam does not match the expected per-frame layout exactly, the tail is silently truncated and each frame is sliced with the wrong shape. Please reject non-exact multiples up front and validate them against nframes, dim_fparam, and dim_aparam before entering the loop.
Also applies to: 1230-1257
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 1095 - 1120, Before slicing
per-frame in the loop, validate that the batched arrays have exact expected
sizes: compute natoms (from atype if appropriate) and then check coord.size() ==
nframes * natoms * 3; if box is non-empty ensure box.size() == nframes * 9;
compute dim_fparam and dim_aparam (expected per-frame widths) and ensure fparam
is empty or fparam.size() == nframes * dim_fparam and aparam is empty or
aparam.size() == nframes * dim_aparam; only then compute dap = dim_aparam and
dfp = dim_fparam (or derive them as static_cast<int>(aparam.size())/nframes
after verifying exact divisibility) and proceed into the ff loop; on any
mismatch return/throw a clear error referencing nframes, natoms,
dim_fparam/dim_aparam, coord/fparam/aparam/box to avoid silent truncation during
per-frame slicing.
- add support for serialization to .pt2 - deep eval with .pt2 - C/C++ inference api with .pt2 - add ut for python inference - add ut for c++/c inference: se_e2_a, fparam/aparam, multi-frames, dpa1
Replace flat (nf*nall,) indexing in dpa1.py and exclude_mask.py with xp_take_along_axis and xp_take_first_n. The flat indexing creates Ne(nall, nloc) assertions during torch.export tracing that fail when nall == nloc (NoPbc). The new gather-based approach avoids these shape constraints. Unify PT/pt_expt test models so .pth and .pt2 are generated from the same dpmodel weights (type_one_side=True) via gen scripts. Remove deeppot_sea.pth, deeppot_dpa.pth, and fparam_aparam.pth from git tracking — they are now regenerated in CI via convert-models.sh. Changes: - dpa1.py, exclude_mask.py: xp_take_along_axis replaces flat indexing - gen_dpa1.py: generates deeppot_dpa1.pth + .pt2 from dpmodel - gen_fparam_aparam.py: generates fparam_aparam.pth + .pt2 from dpmodel - convert-models.sh: regenerate .pth from yaml and run gen scripts in CI - test_deeppot_dpa_ptexpt.cc: add NoPbc test fixture - test_deeppot_dpa1_pt.cc: new DPA1 .pth test (PBC + NoPbc) - test_deeppot_a_fparam_aparam_pt.cc: update reference values
…ckends Fix a latent bug in all 5 C++ backends (DeepPotPT, DeepPotPTExpt, DeepPotJAX, DeepSpinPT, DeepPotPD) where the ghost-to-local mapping was incorrectly indexed through fwd_map when virtual atoms are present. The old code `mapping[ii] = lmp_list.mapping[fwd_map[ii]]` treats the post-filter index as an original index, causing undefined behavior when fwd_map contains -1 entries for filtered atoms. The fix uses bkw_map to correctly translate: new_idx -> orig_idx -> mapped_orig -> new_mapped. Also fix a use-after-free in DeepPotPTExpt.cc where torch::from_blob referenced a local vector's data after the vector went out of scope. Additionally fix dpmodel code (dpa2.py, repformers.py, make_model.py, nlist.py) to use xp_take_first_n and xp_expand_dims instead of slice indexing that creates shape constraints incompatible with torch.export. Add DPA2 C++ test files for both .pth and .pt2 backends with PBC, NoPbc, and type_sel test cases. Reduce DPA2 test model size for machine-precision agreement between jit and export paths.
Fix torch.export-incompatible slicing in dpmodel DPA3/repflows code ([:, :nloc] → xp_take_first_n, reshape with -1 → expand_dims), add gen_dpa3.py model generation script, and create C++ test files for both .pth and .pt2 backends with PBC and NoPbc fixtures.
Restore deeppot_sea.pth and deeppot_dpa.pth to git tracking — CI cannot regenerate them because tabulate_fusion_se_t_tebd custom op is not available in the Python-only CI environment. Remove the dp convert-backend .pth lines from convert-models.sh. Add try/except around .pth export in gen_dpa1.py and gen_fparam_aparam.py so the scripts don't fail when custom ops are unavailable — only .pt2 generation is required.
make_fx tracing through torch.autograd.grad fails on CUDA with "opt_ready_stream && opt_parent_stream INTERNAL ASSERT FAILED". Fix by running make_fx on CPU (the graph is device-agnostic), then moving sample inputs to the target device before torch.export.export and AOTInductor compilation so the compiled package targets the correct device (GPU when available).
make_fx with _allow_non_fake_inputs=True on CUDA triggers autograd stream assertion (opt_ready_stream && opt_parent_stream). Fix by: 1. Tracing via make_fx on CPU (decomposes autograd.grad into aten ops) 2. Extracting output keys from CPU-traced module 3. Moving traced module + inputs to target device (CUDA if available) 4. Running torch.export.export on target device (the traced graph has no autograd.grad calls, so no CUDA stream issue) This ensures AOTInductor compiles for the correct device.
The -fsanitize=leak flag produces sanitized .so files that fail to dlopen in Python processes lacking the LSAN runtime. Preload liblsan.so via LD_PRELOAD when CXXFLAGS contains the leak sanitizer. Also use os.path.realpath in gen scripts for robust path resolution and print diagnostic on library loading failure instead of silently passing.
Replace manual traced-module-to-device approach with the official torch.export.passes.move_to_device_pass API. This properly rewrites device='cpu' constants baked into the FX graph by make_fx, avoiding FakeTensor device propagation errors on CUDA (pytorch/pytorch#121761).
- C++ JSON/ZIP parser: add EOF checks, bounds validation, stod exception handling, int64_t loop var to prevent unsigned underflow - C++ from_blob: add .clone() to all calls to prevent use-after-free - C++ translate_error: catch std::exception instead of std::runtime_error - Python deep_eval/serialization: validate ZIP entries before reading - gen_*.py: move custom op loading after deepmd imports to avoid double registration; add inductor compiler fallbacks for CI; remove unused subprocess imports
LSAN runtime inherited by g++ subprocesses causes false leak reports and non-zero exit codes, making torch._inductor think no C++ compiler works. Clear LD_PRELOAD after custom ops are loaded since it's no longer needed.
85e09c7 to
cd90c31
Compare
- Guard EOCD scan against short/corrupt .pt2 files (P1) - Parse ZIP64 fields using full 64-bit widths (P2) - Match ZIP entry names by suffix for archives with dir prefixes - Reject unsupported in-memory file_content in DeepPotPTExpt::init - Replace debug-only assert with runtime exception for output count - Fix virtual atom coordinate copy: nvir -> nvir*3 in 3 test files - Tighten DPA2 float EPSILON from 1e-1 to 1e-4, add TODO for ghost - Narrow exception handling in gen_dpa3.py custom op loading - Add parity assertions (not just prints) in gen_dpa3.py/.py exports - Guard .pth parity check on successful export in gen_fparam_aparam.py - Extend parity check to virial and atomic outputs - Update deep_eval.py comment on private torch API
Add has_default_fparam support to DeepPotPTExpt: - Embed has_default_fparam in model_def_script.json metadata during freeze (_collect_metadata) - Read it in C++ init from metadata, with fallback to false for older .pt2 files - Add C++ tests for both has_default_fparam==false and ==true - Generate fparam_aparam_default.pt2 test model with default_fparam set
- Guard EOCD scan against short/corrupt .pt2 files (P1) - Parse ZIP64 fields using full 64-bit widths (P2) - Match ZIP entry names by suffix for archives with dir prefixes - Reject unsupported in-memory file_content in DeepPotPTExpt::init - Replace debug-only assert with runtime exception for output count - Fix virtual atom coordinate copy: nvir -> nvir*3 in 3 test files - Tighten DPA2 float EPSILON from 1e-1 to 1e-4, add TODO for ghost - Narrow exception handling in gen_dpa3.py custom op loading - Add parity assertions (not just prints) in gen_dpa3.py/.py exports - Guard .pth parity check on successful export in gen_fparam_aparam.py - Extend parity check to virial and atomic outputs - Update deep_eval.py comment on private torch API
The gen_*.py scripts import torch, which has internal pybind/JIT allocations that LSAN reports as leaks. These are not our leaks. Previously, LD_PRELOAD=liblsan.so was set for the gen scripts to handle dlopen of the sanitizer-built custom op .so, but this causes LSAN to track all Python/torch allocations and fail on exit. Fix: run gen scripts without LD_PRELOAD. The gen scripts don't need LSAN — only the C++ test binary (ctest) does, and ctest already inherits LSAN via the -fsanitize=leak compile flag. Also revert the DPA2 cpu_lmp_nlist test (needs further investigation).
2d94aeb to
5b346f6
Compare
- Remove unused variable nloc in gen_fparam_aparam.py - Cast int multiplications to size_t before use as iterator offsets in DeepPotPTExpt multi-frame compute to prevent overflow
The gen scripts need LD_PRELOAD=liblsan.so to dlopen the custom op .so built with -fsanitize=leak. But LSAN leak detection must be disabled (detect_leaks=0) to avoid false reports from torch/paddle internal allocations.
Add C/C++ inference support for the
.pt2(torch.export / AOTInductor) backend, covering all major descriptor types: SE_E2_A, DPA1, DPA2, and DPA3.C/C++ inference backend (
DeepPotPTExpt)DeepPotPTExptbackend that loads.pt2models viatorch::inductor::AOTIModelContainerRunnerCpu.pt2file extensiondpmodel fixes for torch.export compatibility
[:, :nloc]slicing withxp_take_first_n()in DPA1, DPA2, DPA3, and repflows/repformers — the original slicing createsNe(nall, nloc)shape constraints that fail whennall == nloc(NoPbc case)(nf*nall,)indexing indpa1.pyandexclude_mask.pywithxp_take_along_axisxp.reshape(mapping, (nframes, -1, 1))withxp.expand_dimsin repflows/repformers — the-1resolves tonallduring tracingpt_expt serialization
.pt2export viatorch.export.export→aot_compile→ package as ziptorch._inductor.aoti_load_packageBug fix in all C++ backends
mapping[ii] = lmp_list.mapping[fwd_map[ii]]used post-filter indices as original indices; fixed tomapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii]]]DeepPotPTExpt.ccwheretorch::from_blobreferenced a local vector after it went out of scopeTest infrastructure
gen_dpa1.py,gen_dpa2.py,gen_dpa3.py,gen_fparam_aparam.py) that build from dpmodel config → serialize → export to both.pthand.pt2with identical weights.pthfiles; regenerate in CI viaconvert-models.sh.pthand.pt2, PBC + NoPbc, double + float)test_deep_eval.py)Summary by CodeRabbit
New Features
Bug Fixes
Tests