Skip to content

feat(c++,pt-expt): add .pt2 (AOTInductor) C/C++ inference with DPA1/DPA2/DPA3 support #5298

Open
wanghan-iapcm wants to merge 21 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-c
Open

feat(c++,pt-expt): add .pt2 (AOTInductor) C/C++ inference with DPA1/DPA2/DPA3 support #5298
wanghan-iapcm wants to merge 21 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-c

Conversation

@wanghan-iapcm
Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm commented Mar 9, 2026

Add C/C++ inference support for the .pt2 (torch.export / AOTInductor) backend, covering all major descriptor types: SE_E2_A, DPA1, DPA2, and DPA3.

C/C++ inference backend (DeepPotPTExpt)

  • New DeepPotPTExpt backend that loads .pt2 models via torch::inductor::AOTIModelContainerRunnerCpu
  • Supports PBC, NoPbc, fparam/aparam, multi-frame batching, atomic energy/virial, LAMMPS neighbor list (with ghost atoms, 2rc padding, type selection)
  • Registered alongside existing PT/TF/JAX/PD backends via the .pt2 file extension

dpmodel fixes for torch.export compatibility

  • Replace [:, :nloc] slicing with xp_take_first_n() in DPA1, DPA2, DPA3, and repflows/repformers — the original slicing creates Ne(nall, nloc) shape constraints that fail when nall == nloc (NoPbc case)
  • Replace flat (nf*nall,) indexing in dpa1.py and exclude_mask.py with xp_take_along_axis
  • Replace xp.reshape(mapping, (nframes, -1, 1)) with xp.expand_dims in repflows/repformers — the -1 resolves to nall during tracing

pt_expt serialization

  • .pt2 export via torch.export.exportaot_compile → package as zip
  • Python inference via torch._inductor.aoti_load_package

Bug fix in all C++ backends

  • Fix ghost-to-local mapping when virtual atoms are present — the old code mapping[ii] = lmp_list.mapping[fwd_map[ii]] used post-filter indices as original indices; fixed to mapping[ii] = fwd_map[lmp_list.mapping[bkw_map[ii]]]
  • Fix use-after-free in DeepPotPTExpt.cc where torch::from_blob referenced a local vector after it went out of scope

Test infrastructure

  • Model generation scripts (gen_dpa1.py, gen_dpa2.py, gen_dpa3.py, gen_fparam_aparam.py) that build from dpmodel config → serialize → export to both .pth and .pt2 with identical weights
  • Remove pre-committed .pth files; regenerate in CI via convert-models.sh
  • C++ tests for all descriptor types: SE_E2_A, DPA1, DPA2, DPA3 (both .pth and .pt2, PBC + NoPbc, double + float)
  • Python unit tests for pt_expt inference (test_deep_eval.py)

Summary by CodeRabbit

  • New Features

    • Added support for PyTorch exportable (.pt2) models and runtime detection, enabling AOTInductor-based inference across interfaces.
  • Bug Fixes

    • Improved neighbor/embedding extraction and broadcasting to increase backend export compatibility and robustness.
  • Tests

    • Added extensive C++ and Python test suites and reference-generation scripts to validate .pt2 inference paths and cross-format consistency.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d3fd09c68

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds .pt2 (AOTInductor) support to the PyTorch exportable backend, migrates several descriptor and utility indexing paths to array‑API helpers (xp_take_first_n / xp_take_along_axis), implements a C++ DeepPotPTExpt backend and wiring, extends serialization for .pt2 packaging, and adds comprehensive C++ and Python test/model-generation artifacts.

Changes

Cohort / File(s) Summary
Backend config & detection
deepmd/backend/pt_expt.py, source/api_cc/include/common.h, source/api_cc/src/common.cc, source/api_cc/src/DeepPot.cc
Added .pt2 suffix and DPBackend::PyTorchExportable; detect .pt2 as PyTorchExportable and route backend construction accordingly.
PTExpt inference & serialization (Python)
deepmd/pt_expt/infer/deep_eval.py, deepmd/pt_expt/utils/serialization.py
Added .pt2 loaders/serializers, _load_pt2/_load_pte split, AOTInductor runner handling, tracing/export helpers, and packaging of metadata/output_keys into .pt2 ZIPs.
C++ PTExpt implementation & headers
source/api_cc/include/DeepPotPTExpt.h, source/api_cc/src/DeepPotPTExpt.cc, source/api_cc/src/DeepPot.cc
New DeepPotPTExpt class and implementation (model loading, run_model, extract_outputs, compute templates), DeepPot.cc wired to construct DeepPotPTExpt when appropriate.
Descriptor & array-API refactor
deepmd/dpmodel/descriptor/dpa1.py, .../dpa2.py, .../dpa3.py, .../repflows.py, .../repformers.py
Replaced direct [:nloc] slicing with xp_take_first_n, switched mapping broadcasting to xp.expand_dims+xp.tile, and switched neighbor gathering to xp_take_along_axis for export/backends safety.
Model & utils array-API updates
deepmd/dpmodel/model/make_model.py, deepmd/dpmodel/utils/exclude_mask.py, deepmd/dpmodel/utils/nlist.py
Imported/used xp_take_first_n and xp_take_along_axis; reworked neighbor-type gathering, exclude-mask assembly, and first-n coordinate extraction to avoid flat indexing and backend-export constraints.
Neighbor-list mapping fixes (C++)
source/api_cc/src/DeepPotJAX.cc, .../DeepPotPD.cc, .../DeepPotPT.cc, .../DeepSpinPT.cc
Changed mapping construction order to index via backward map then forward map (fwd_map[lmp_list.mapping[bkw_map[ii]]]) to correct per-atom remapping.
Folding util & enum
source/api_cc/include/common.h, source/api_cc/src/common.cc, source/api_cc/tests/test_utils.h
Added template fold_back utility; extended DPBackend enum; changed test helper access levels.
C++ test suites (pt_expt & PT)
source/api_c/tests/*, source/api_cc/tests/test_deeppot_*.cc, source/api_cc/tests/*ptexpt.cc
Added many GoogleTest suites validating .pt2 backends across DPA1/DPA2/DPA3 and fparam/aparam, PBC/NoPBC, multi-frame and LAMMPS-style nlist paths.
Python test/model-gen scripts & tests
source/tests/infer/gen_*.py, source/tests/pt_expt/infer/test_deep_eval.py, source/tests/infer/gen_sea.py
New scripts to generate .pt2/.pth models and reference outputs; new pytest for DeepEval .pt2 vs .pte parity and round-trips.
Build/test harness step
source/install/test_cc_local.sh
Added invocation of model-generation scripts to produce .pt2/.pth assets for C++ tests during post-build local test step.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DeepEval
    participant Serializer
    participant AOTInductor
    participant CompiledModel

    User->>DeepEval: load("model.pt2")
    DeepEval->>Serializer: identify extension(".pt2")
    Serializer-->>DeepEval: choose PT2 loader
    DeepEval->>AOTInductor: load_runner_from_zip(model.pt2)
    AOTInductor-->>DeepEval: runner_instance + metadata/output_keys

    User->>DeepEval: eval(coords, types, box)
    DeepEval->>DeepEval: prepare tensors (coords, atype, nlist, mapping, fparam, aparam)
    DeepEval->>AOTInductor: runner.__call__(inputs)
    AOTInductor->>CompiledModel: execute_compiled_forward
    CompiledModel-->>AOTInductor: output_tensors
    AOTInductor-->>DeepEval: output_tensors
    DeepEval->>DeepEval: extract_outputs(output_keys)
    DeepEval-->>User: energies, forces, virial
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

LAMMPS

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically summarizes the main change: adding .pt2 (AOTInductor) C/C++ inference support with DPA1/DPA2/DPA3 model variants.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (6)
source/tests/pt_expt/infer/test_deep_eval.py (1)

490-791: Keep the .pt2 suite aligned with the .pte coverage.

This copy has already drifted from TestDeepEvalEner: test_dynamic_shapes() and the ASE neighbor-list cases exist only in the .pte class. Those are exactly the paths most likely to regress for the exported backend, so please either factor the shared checks into a base/mixin or add .pt2 equivalents here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/infer/test_deep_eval.py` around lines 490 - 791, The
.pt2 test class TestDeepEvalEnerPt2 is missing parity with the .pte suite: add
equivalents of test_dynamic_shapes and the ASE neighbor-list tests (those that
exercise variable frame/atom counts and ASE neighbor-list paths) or factor
shared checks into a base TestDeepEvalEnerMixin used by both TestDeepEvalEnerPt2
and TestDeepEvalEner so both run the same coverage; locate the suite in
TestDeepEvalEnerPt2 and either (a) implement two new tests named
test_dynamic_shapes and the ASE neighbor-list tests that call DeepPot.eval (and
the model forward reference) with dynamic-shaped inputs and ASE
neighbor-list-configured inputs, or (b) extract the common checks into a mixin
class (e.g., DeepEvalSharedTests) and have both TestDeepEvalEnerPt2 and
TestDeepEvalEner inherit it so DeepPot.eval, model.forward, and ASE
neighbor-list code paths are exercised for .pt2 as well.
source/tests/infer/gen_fparam_aparam.py (1)

127-139: Remove unused nloc variable and prefix unused unpacked variables.

Static analysis flags these issues:

  • Line 127: v is unpacked but never used
  • Line 136: nloc = 6 is assigned but never used (F841 error - will fail ruff check)
  • Line 167: ae_pth, av_pth are unused
♻️ Proposed fix
-    e, f, v, ae, av = dp.eval(
+    e, f, _v, ae, av = dp.eval(
         coord,
         box,
         atype,
         fparam=fparam_val,
         aparam=aparam_val,
         atomic=True,
     )

-    nloc = 6
     atom_energy = ae[0, :, 0]
     force = f[0]
     atom_virial = av[0]

And for line 167:

-    e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval(
+    e_pth, f_pth, _, _ae_pth, _av_pth = dp_pth.eval(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_fparam_aparam.py` around lines 127 - 139, Unpack
unused value from dp.eval by prefixing it with an underscore (e.g., change "e,
f, v, ae, av = dp.eval(...)" to "e, f, _v, ae, av = dp.eval(...)" ), remove the
unused assignment "nloc = 6", and either remove or prefix unused path variables
"ae_pth" and "av_pth" with an underscore (e.g., "_ae_pth", "_av_pth") so static
analysis (ruff) no longer reports F841; then run the test/lint suite to confirm
no unused-variable errors remain.
source/tests/infer/gen_dpa2.py (1)

171-196: Prefix unused unpacked variables with underscore to pass ruff check.

Per coding guidelines, ruff check . must pass before CI. These unpacked variables are flagged as unused:

  • Line 171: v1
  • Line 176: v_np
  • Line 183: v_pth, ae_pth, av_pth
  • Line 190: f_pth_np, ae_pth_np, av_pth_np
♻️ Proposed fix
-    e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
+    e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
     print(f"\n// PBC total energy: {e1[0, 0]:.18e}")  # noqa: T201
     print_cpp_values("PBC reference values", ae1, f1, av1)

     # ---- 5. Run inference for NoPbc test ----
-    e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
+    e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
     print(f"\n// NoPbc total energy: {e_np[0, 0]:.18e}")  # noqa: T201
     print_cpp_values("NoPbc reference values", ae_np, f_np, av_np)

     # ---- 6. Verify .pth gives same results ----
     if os.path.exists(pth_path):
         dp_pth = DeepPot(pth_path)
-        e_pth, f_pth, v_pth, ae_pth, av_pth = dp_pth.eval(
+        e_pth, f_pth, _v_pth, _ae_pth, _av_pth = dp_pth.eval(
             coord, box, atype, atomic=True
         )
         print(f"\n// .pth PBC total energy: {e_pth[0, 0]:.18e}")  # noqa: T201
         print(f"// .pth vs .pt2 energy diff: {abs(e1[0, 0] - e_pth[0, 0]):.2e}")  # noqa: T201
         print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f1 - f_pth)):.2e}")  # noqa: T201

-        e_pth_np, f_pth_np, _, ae_pth_np, av_pth_np = dp_pth.eval(
+        e_pth_np, _f_pth_np, _, _ae_pth_np, _av_pth_np = dp_pth.eval(
             coord, None, atype, atomic=True
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_dpa2.py` around lines 171 - 196, The review flags
unused unpacked variables from calls to DeepPot.eval (e.g., dp.eval and
dp_pth.eval) that break ruff; fix by prefixing those unused names with an
underscore in the unpacking (e.g., change v1 to _v1, v_np to _v_np,
v_pth/ae_pth/av_pth to _v_pth/_ae_pth/_av_pth, and f_pth_np/ae_pth_np/av_pth_np
to _f_pth_np/_ae_pth_np/_av_pth_np) so the dp.eval and dp_pth.eval calls keep
the same outputs but satisfy the linter.
source/tests/infer/gen_dpa1.py (1)

143-163: Prefix unused unpacked variables with underscore to pass ruff check.

Same issue as gen_dpa2.py - these unpacked variables are flagged as unused:

  • Line 143: v1
  • Line 148: v_np
  • Line 154: v_pth, ae_pth, av_pth
  • Line 159: f_pth_np, ae_pth_np, av_pth_np
♻️ Proposed fix
-    e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
+    e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
     print(f"\n// PBC total energy: {e1[0, 0]:.18e}")  # noqa: T201
     print_cpp_values("PBC reference values", ae1, f1, av1)

     # ---- 5. Run inference for NoPbc test ----
-    e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
+    e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
     print(f"\n// NoPbc total energy: {e_np[0, 0]:.18e}")  # noqa: T201
     print_cpp_values("NoPbc reference values", ae_np, f_np, av_np)

     # ---- 6. Verify .pth gives same results ----
     dp_pth = DeepPot(pth_path)
-    e_pth, f_pth, v_pth, ae_pth, av_pth = dp_pth.eval(coord, box, atype, atomic=True)
+    e_pth, f_pth, _v_pth, _ae_pth, _av_pth = dp_pth.eval(coord, box, atype, atomic=True)
     print(f"\n// .pth PBC total energy: {e_pth[0, 0]:.18e}")  # noqa: T201
     print(f"// .pth vs .pt2 energy diff: {abs(e1[0, 0] - e_pth[0, 0]):.2e}")  # noqa: T201
     print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f1 - f_pth)):.2e}")  # noqa: T201

-    e_pth_np, f_pth_np, _, ae_pth_np, av_pth_np = dp_pth.eval(
+    e_pth_np, _f_pth_np, _, _ae_pth_np, _av_pth_np = dp_pth.eval(
         coord, None, atype, atomic=True
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_dpa1.py` around lines 143 - 163, The unpacked return
values from DeepPot.eval contain unused variables that should be prefixed with
an underscore to satisfy ruff; update the unpackings in gen_dpa1.py where
DeepPot.eval is called: change v1 -> _v1 for the first call (e1, f1, _v1, ae1,
av1), v_np -> _v_np for the NoPbc call (e_np, f_np, _v_np, ae_np, av_np), v_pth
-> _v_pth and also prefix ae_pth, av_pth with underscores if unused (e_pth,
f_pth, _v_pth, _ae_pth, _av_pth) for the .pth PBC call, and similarly for the
.pth NoPbc call change f_pth_np -> _f_pth_np, ae_pth_np -> _ae_pth_np, av_pth_np
-> _av_pth_np when they are not used; keep all other names unchanged and run
ruff to verify.
deepmd/pt_expt/infer/deep_eval.py (1)

702-710: Consider handling None exported_module for .pt2 models.

The get_model() method returns self.exported_module, which is None for .pt2 models. This could be confusing for callers.

Consider adding documentation or a check
     def get_model(self) -> torch.nn.Module:
         """Get the exported model module.
 
         Returns
         -------
         torch.nn.Module
-            The exported model module.
+            The exported model module. Returns None for .pt2 (AOTInductor) models
+            as they use a different runner interface.
         """
         return self.exported_module
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/infer/deep_eval.py` around lines 702 - 710, get_model
currently returns self.exported_module which is None for .pt2 models; update
get_model to explicitly handle that case by checking if self.exported_module is
None and either raise a clear exception (e.g., RuntimeError with message
mentioning .pt2 models and how to obtain a usable model) or return a documented
alternative, and update the get_model docstring to state the behavior for .pt2
models; reference the get_model method and the exported_module attribute when
making the change.
deepmd/pt_expt/utils/serialization.py (1)

355-362: Note that torch._inductor.aoti_compile_and_package is an internal PyTorch API.

While this function is available in PyTorch 2.6+ (and the project requires 2.10.0), internal APIs prefixed with _ may change between versions without notice. Consider adding a version comment documenting this dependency or monitor PyTorch release notes for API changes in future updates.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/serialization.py` around lines 355 - 362, The code calls
the internal API torch._inductor.aoti_compile_and_package (used on the local
variables exported and model_file after _trace_and_export) which may break
across PyTorch versions; add a clear inline comment next to the
aoti_compile_and_package call documenting the required PyTorch minimum (e.g.,
2.6+) and that this is an internal API subject to change, and add a simple
runtime guard that checks torch.__version__ and emits a warning or raises a
descriptive error if the installed torch is outside the supported range so
maintainers are alerted when upgrading PyTorch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/dpmodel/descriptor/dpa1.py`:
- Line 1060: Remove the now-unused assignment to nall (the line "nall =
atype_ext.shape[1]") in dpa1.py since the gather refactor made nall unused;
locate the assignment near the atype_ext usage in the descriptor code (search
for "atype_ext" or "nall" in dpa1.py) and delete that line, then run ruff check
. and ruff format . to ensure no F841/formatting errors remain.

In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 151-169: The code uses the Beta API
torch._inductor.aoti_load_package inside _load_pt2 to load .pt2 AOTInductor
packages; update _load_pt2 to (1) guard the import/use of aoti_load_package with
a clear fallback and informative error (catch
ImportError/AttributeError/RuntimeError when importing or calling
aoti_load_package and raise/log a message that the API is beta and may change),
(2) perform a runtime compatibility check (e.g., verify torch._inductor exists
and torch.__version__ is within a supported range) before calling
aoti_load_package, and (3) preserve and expose metadata-based constraints
(self.metadata, self._output_keys, self.rcut, self.type_map) and add an explicit
runtime validation step before inference that checks input tensors match
export-time size, dtype, and stride so callers get an actionable error if they
mismatch rather than a mysterious break inside _pt2_runner or exported_module.

In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 641-647: Replace the assert in DeepPotPTExpt::extract_outputs with
a runtime check that throws a descriptive exception: verify flat_outputs.size()
== output_keys.size(), and if not, throw a std::runtime_error (or similar)
including both sizes and context (e.g., "DeepPotPTExpt::extract_outputs:
output_keys size X != flat_outputs size Y") before the loop that assigns
output_map[output_keys[i]] = flat_outputs[i]; this prevents out-of-bounds access
in release builds.
- Around line 235-399: read_zip_entry currently assumes specific member names
and uncompressed (stored) entries, ignores the file_content parameter, and uses
assert for output arity which is unsafe in release; update read_zip_entry to (1)
honor a provided file_content parameter by returning it when non-empty or using
it as the source instead of always reading zip_path, (2) validate and reject
compressed entries by reading the compression method from the central
directory/local header and throw deepmd::deepmd_exception if compression != 0
(or implement decompression), (3) avoid hardcoding names by making
caller-supplied expected entry names configurable/fall back to prefixes like
"extra/" and attempt alternate names, and (4) replace the
assert(flat_outputs.size() == output_keys.size()) with a runtime check that
throws deepmd::deepmd_exception containing both sizes and context (use symbols
flat_outputs and output_keys to locate the check). Ensure ZIP64 and bounds
checks are tightened to avoid reading out-of-range indexes when parsing headers.

In `@source/api_cc/tests/test_deeppot_dpa_ptexpt.cc`:
- Around line 376-383: The loop that fills coord_vir only copies the first nvir
scalar values, leaving most of the 3*nvir coordinate components zero; update the
copy so you copy all 3 coordinates per virtual atom (e.g., for each ii set
coord_vir[ii*3 + 0..2] = coord[ii*3 + 0..2] or use std::copy_n(&coord[ii*3], 3,
&coord_vir[ii*3])) so coord_vir contains the full (x,y,z) triples before
inserting into coord; keep atype_vir as is and then insert coord_vir at the
beginning as before.

In `@source/api_cc/tests/test_deeppot_dpa2_pt.cc`:
- Around line 276-296: The test TestInferDeepPotDpa2PtNoPbc currently uses a
hard-coded nlist_data with nghost == 0 so it never exercises InputNlist.mapping
or the ghost/virtual-atom remap; replace or add a test case that builds a
neighbor list with ghosts/mapped indices (e.g. call the existing _build_nlist()
helper or construct an InputNlist with non-zero nghost and a mapping array)
instead of the local-only nlist_data/convert_nlist path, and then call
dp.compute(ener, force, virial, coord, atype, box, 0, inlist, 0) to verify the
mapped-ghost behavior is used.
- Around line 15-17: The EPSILON macro is currently too loose for float tests
(1e-1) and can hide large regressions; update the test to tighten tolerances by
either (a) reducing the float branch of EPSILON to a stricter value (e.g. 1e-4
or similar) or (b) replace the single EPSILON with per-quantity tolerances (e.g.
ENERGY_EPSILON, FORCE_EPSILON, VIRIAL_EPSILON) and use those where comparisons
occur; ensure these new macros/variables are chosen based on VALUETYPE (float vs
double) so comparisons in the test code that reference EPSILON are replaced with
the appropriate more-specific tolerance.

In `@source/api_cc/tests/test_deeppot_dpa3_ptexpt.cc`:
- Around line 387-391: The loop that fills coord_vir copies only nvir scalars
instead of nvir*3, leaving most coordinates zero; update the copy to initialize
all 3 components per virtual atom (use coord_vir and coord) by copying nvir*3
elements (or iterate per atom and copy 3 components each) so the injected
virtual-atom geometry matches the intended coordinates; reference the variables
coord_vir, coord and nvir when making this change.

In `@source/api_cc/tests/test_deeppot_ptexpt.cc`:
- Around line 358-365: The virtual-atom coordinate block only copies one scalar
per atom, leaving other components zero; change the population of coord_vir so
you copy all 3 coordinates per virtual atom (use coord_vir[3*ii + k] =
coord[3*ii + k] for k = 0..2) before inserting into coord, so coord_vir, coord,
nvir and atype_vir are consistent and the full 3D positions are preserved.

In `@source/tests/infer/gen_dpa3.py`:
- Around line 175-200: The current block only prints differences between the
.pth and .pt2 runs; replace/add prints with real parity assertions: after
creating dp_pth and calling dp_pth.eval, assert that e_pth ≈ e1, f_pth ≈ f1,
v_pth ≈ v1, ae_pth ≈ ae1 and av_pth ≈ av1 (and similarly assert e_pth_np ≈ e_np,
f_pth_np ≈ f_np, ae_pth_np ≈ ae_np, av_pth_np ≈ av_np) using a numeric tolerance
(e.g. numpy.testing.assert_allclose or assert np.allclose(..., atol=...,
rtol=...)); reference the DeepPot class and the dp_pth.eval calls and the tensor
names (e1, f1, v1, ae1, av1, e_pth, f_pth, v_pth, ae_pth, av_pth, e_np, f_np,
e_pth_np, f_pth_np, ae_pth_np, av_pth_np) so CI fails if exports diverge and to
eliminate the unused-binding warnings.
- Around line 22-43: The try/except in the torch custom-op loader is swallowing
all errors; change it to only catch expected exceptions (e.g., ImportError,
AttributeError, OSError) around the import and load logic, and when the fallback
glob search for libdeepmd_op_pt.so fails or load_library raises an error, emit a
warning (use warnings.warn or processLogger.warning) that includes the exception
details and context (referencing torch.ops.deepmd and the border_op check and
the torch.ops.load_library call), rather than silently passing; ensure the
warning is produced when the fallback lookup fails so real export/load issues
surface.

---

Nitpick comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 702-710: get_model currently returns self.exported_module which is
None for .pt2 models; update get_model to explicitly handle that case by
checking if self.exported_module is None and either raise a clear exception
(e.g., RuntimeError with message mentioning .pt2 models and how to obtain a
usable model) or return a documented alternative, and update the get_model
docstring to state the behavior for .pt2 models; reference the get_model method
and the exported_module attribute when making the change.

In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 355-362: The code calls the internal API
torch._inductor.aoti_compile_and_package (used on the local variables exported
and model_file after _trace_and_export) which may break across PyTorch versions;
add a clear inline comment next to the aoti_compile_and_package call documenting
the required PyTorch minimum (e.g., 2.6+) and that this is an internal API
subject to change, and add a simple runtime guard that checks torch.__version__
and emits a warning or raises a descriptive error if the installed torch is
outside the supported range so maintainers are alerted when upgrading PyTorch.

In `@source/tests/infer/gen_dpa1.py`:
- Around line 143-163: The unpacked return values from DeepPot.eval contain
unused variables that should be prefixed with an underscore to satisfy ruff;
update the unpackings in gen_dpa1.py where DeepPot.eval is called: change v1 ->
_v1 for the first call (e1, f1, _v1, ae1, av1), v_np -> _v_np for the NoPbc call
(e_np, f_np, _v_np, ae_np, av_np), v_pth -> _v_pth and also prefix ae_pth,
av_pth with underscores if unused (e_pth, f_pth, _v_pth, _ae_pth, _av_pth) for
the .pth PBC call, and similarly for the .pth NoPbc call change f_pth_np ->
_f_pth_np, ae_pth_np -> _ae_pth_np, av_pth_np -> _av_pth_np when they are not
used; keep all other names unchanged and run ruff to verify.

In `@source/tests/infer/gen_dpa2.py`:
- Around line 171-196: The review flags unused unpacked variables from calls to
DeepPot.eval (e.g., dp.eval and dp_pth.eval) that break ruff; fix by prefixing
those unused names with an underscore in the unpacking (e.g., change v1 to _v1,
v_np to _v_np, v_pth/ae_pth/av_pth to _v_pth/_ae_pth/_av_pth, and
f_pth_np/ae_pth_np/av_pth_np to _f_pth_np/_ae_pth_np/_av_pth_np) so the dp.eval
and dp_pth.eval calls keep the same outputs but satisfy the linter.

In `@source/tests/infer/gen_fparam_aparam.py`:
- Around line 127-139: Unpack unused value from dp.eval by prefixing it with an
underscore (e.g., change "e, f, v, ae, av = dp.eval(...)" to "e, f, _v, ae, av =
dp.eval(...)" ), remove the unused assignment "nloc = 6", and either remove or
prefix unused path variables "ae_pth" and "av_pth" with an underscore (e.g.,
"_ae_pth", "_av_pth") so static analysis (ruff) no longer reports F841; then run
the test/lint suite to confirm no unused-variable errors remain.

In `@source/tests/pt_expt/infer/test_deep_eval.py`:
- Around line 490-791: The .pt2 test class TestDeepEvalEnerPt2 is missing parity
with the .pte suite: add equivalents of test_dynamic_shapes and the ASE
neighbor-list tests (those that exercise variable frame/atom counts and ASE
neighbor-list paths) or factor shared checks into a base TestDeepEvalEnerMixin
used by both TestDeepEvalEnerPt2 and TestDeepEvalEner so both run the same
coverage; locate the suite in TestDeepEvalEnerPt2 and either (a) implement two
new tests named test_dynamic_shapes and the ASE neighbor-list tests that call
DeepPot.eval (and the model forward reference) with dynamic-shaped inputs and
ASE neighbor-list-configured inputs, or (b) extract the common checks into a
mixin class (e.g., DeepEvalSharedTests) and have both TestDeepEvalEnerPt2 and
TestDeepEvalEner inherit it so DeepPot.eval, model.forward, and ASE
neighbor-list code paths are exercised for .pt2 as well.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ec83f497-49cf-4865-907f-5e1894443f3d

📥 Commits

Reviewing files that changed from the base of the PR and between 24e54bf and 3d3fd09.

📒 Files selected for processing (42)
  • deepmd/backend/pt_expt.py
  • deepmd/dpmodel/descriptor/dpa1.py
  • deepmd/dpmodel/descriptor/dpa2.py
  • deepmd/dpmodel/descriptor/dpa3.py
  • deepmd/dpmodel/descriptor/repflows.py
  • deepmd/dpmodel/descriptor/repformers.py
  • deepmd/dpmodel/model/make_model.py
  • deepmd/dpmodel/utils/exclude_mask.py
  • deepmd/dpmodel/utils/nlist.py
  • deepmd/pt_expt/infer/deep_eval.py
  • deepmd/pt_expt/utils/serialization.py
  • source/api_c/tests/test_deeppot_a_fparam_aparam_ptexpt.cc
  • source/api_c/tests/test_deeppot_a_ptexpt.cc
  • source/api_cc/include/DeepPotPTExpt.h
  • source/api_cc/include/common.h
  • source/api_cc/src/DeepPot.cc
  • source/api_cc/src/DeepPotJAX.cc
  • source/api_cc/src/DeepPotPD.cc
  • source/api_cc/src/DeepPotPT.cc
  • source/api_cc/src/DeepPotPTExpt.cc
  • source/api_cc/src/DeepSpinPT.cc
  • source/api_cc/src/common.cc
  • source/api_cc/tests/test_deeppot_a_fparam_aparam_nframes_ptexpt.cc
  • source/api_cc/tests/test_deeppot_a_fparam_aparam_pt.cc
  • source/api_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.cc
  • source/api_cc/tests/test_deeppot_dpa1_pt.cc
  • source/api_cc/tests/test_deeppot_dpa2_pt.cc
  • source/api_cc/tests/test_deeppot_dpa2_ptexpt.cc
  • source/api_cc/tests/test_deeppot_dpa3_pt.cc
  • source/api_cc/tests/test_deeppot_dpa3_ptexpt.cc
  • source/api_cc/tests/test_deeppot_dpa_ptexpt.cc
  • source/api_cc/tests/test_deeppot_ptexpt.cc
  • source/api_cc/tests/test_utils.h
  • source/tests/infer/convert-models.sh
  • source/tests/infer/deeppot_dpa.pth
  • source/tests/infer/deeppot_sea.pth
  • source/tests/infer/fparam_aparam.pth
  • source/tests/infer/gen_dpa1.py
  • source/tests/infer/gen_dpa2.py
  • source/tests/infer/gen_dpa3.py
  • source/tests/infer/gen_fparam_aparam.py
  • source/tests/pt_expt/infer/test_deep_eval.py

Comment on lines +276 to +296
TYPED_TEST(TestInferDeepPotDpa2PtNoPbc, cpu_lmp_nlist) {
using VALUETYPE = TypeParam;
std::vector<VALUETYPE>& coord = this->coord;
std::vector<int>& atype = this->atype;
std::vector<VALUETYPE>& box = this->box;
std::vector<VALUETYPE>& expected_f = this->expected_f;
int& natoms = this->natoms;
double& expected_tot_e = this->expected_tot_e;
std::vector<VALUETYPE>& expected_tot_v = this->expected_tot_v;
deepmd::DeepPot& dp = this->dp;
double ener;
std::vector<VALUETYPE> force, virial;

std::vector<std::vector<int> > nlist_data = {
{1, 2, 3, 4, 5}, {0, 2, 3, 4, 5}, {0, 1, 3, 4, 5},
{0, 1, 2, 4, 5}, {0, 1, 2, 3, 5}, {0, 1, 2, 3, 4}};
std::vector<int> ilist(natoms), numneigh(natoms);
std::vector<int*> firstneigh(natoms);
deepmd::InputNlist inlist(natoms, &ilist[0], &numneigh[0], &firstneigh[0]);
convert_nlist(inlist, nlist_data);
dp.compute(ener, force, virial, coord, atype, box, 0, inlist, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This nlist case never exercises the mapping/ghost path.

The hard-coded neighbor list is local-only with nghost == 0, so it does not cover InputNlist.mapping or the ghost/virtual-atom remap this PR fixes in the C++ backends. Please add at least one DPA2 .pth nlist case built through _build_nlist() or another mapped-ghost configuration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/api_cc/tests/test_deeppot_dpa2_pt.cc` around lines 276 - 296, The test
TestInferDeepPotDpa2PtNoPbc currently uses a hard-coded nlist_data with nghost
== 0 so it never exercises InputNlist.mapping or the ghost/virtual-atom remap;
replace or add a test case that builds a neighbor list with ghosts/mapped
indices (e.g. call the existing _build_nlist() helper or construct an InputNlist
with non-zero nghost and a mapping array) instead of the local-only
nlist_data/convert_nlist path, and then call dp.compute(ener, force, virial,
coord, atype, box, 0, inlist, 0) to verify the mapped-ghost behavior is used.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/tests/infer/gen_fparam_aparam.py`:
- Around line 86-93: The export fallback is not guarded: if
pt_deserialize_to_file(pth_path, ...) raises, later code still opens
"fparam_aparam.pth" and may read stale data; modify the flow so that you only
proceed with the parity check/load when the export actually succeeded—e.g., set
a boolean (export_succeeded) or check that pth_path was created by
pt_deserialize_to_file before any subsequent open/read or parity comparison;
update references around pth_path, pt_deserialize_to_file, and the later
file-open/parity-check block so the latter is skipped when export_succeeded is
false (or the file is absent/newly written).
- Around line 132-139: The test currently calls dp.eval(...) binding e, f, v,
ae, av but only compares energy and force diffs; add assertions and diff
reporting for virial and atomic outputs: compare v to the saved reference
(v_pth) and ae/av to ae_pth/av_pth respectively, compute and print max/relative
differences like done for e and f, and fail the test on mismatch. Locate the
dp.eval call and follow the existing pattern used for energy/force diffs to add
checks for v, ae, and av so virial and per-atom values cannot drift silently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9f653401-c78d-4aa6-9776-38de7659a1a9

📥 Commits

Reviewing files that changed from the base of the PR and between 3d3fd09 and acabcfd.

📒 Files selected for processing (3)
  • source/tests/infer/convert-models.sh
  • source/tests/infer/gen_dpa1.py
  • source/tests/infer/gen_fparam_aparam.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/infer/convert-models.sh

Comment on lines +86 to +93
pth_path = os.path.join(base_dir, "fparam_aparam.pth")
print(f"Exporting to {pth_path} ...") # noqa: T201
try:
pt_deserialize_to_file(pth_path, copy.deepcopy(data))
except RuntimeError as e:
# Custom ops (e.g. tabulate_fusion_se_t_tebd) may not be available
# in all build environments; .pth generation is not critical.
print(f"WARNING: .pth export failed ({e}), skipping.") # noqa: T201
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard the .pth parity check on a successful export.

If pt_deserialize_to_file() fails on Line 89, Line 171 still opens fparam_aparam.pth. That turns the fallback into a crash, and if an old artifact is still on disk it can compare against stale weights instead.

Suggested fix
     pth_path = os.path.join(base_dir, "fparam_aparam.pth")
+    pth_exported = False
     print(f"Exporting to {pth_path} ...")  # noqa: T201
     try:
         pt_deserialize_to_file(pth_path, copy.deepcopy(data))
+        pth_exported = True
     except RuntimeError as e:
         # Custom ops (e.g. tabulate_fusion_se_t_tebd) may not be available
         # in all build environments; .pth generation is not critical.
         print(f"WARNING: .pth export failed ({e}), skipping.")  # noqa: T201
@@
-    dp_pth = DeepPot(pth_path)
-    e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval(
-        coord,
-        box,
-        atype,
-        fparam=fparam_val,
-        aparam=aparam_val,
-        atomic=True,
-    )
-    print(f"\n// .pth vs .pt2 energy diff: {abs(e[0, 0] - e_pth[0, 0]):.2e}")  # noqa: T201
-    print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f - f_pth)):.2e}")  # noqa: T201
+    if pth_exported:
+        dp_pth = DeepPot(pth_path)
+        e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval(
+            coord,
+            box,
+            atype,
+            fparam=fparam_val,
+            aparam=aparam_val,
+            atomic=True,
+        )
+        print(f"\n// .pth vs .pt2 energy diff: {abs(e[0, 0] - e_pth[0, 0]):.2e}")  # noqa: T201
+        print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f - f_pth)):.2e}")  # noqa: T201
+    else:
+        print("\n// Skipped .pth vs .pt2 verification because .pth export failed.")  # noqa: T201

Also applies to: 170-179

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_fparam_aparam.py` around lines 86 - 93, The export
fallback is not guarded: if pt_deserialize_to_file(pth_path, ...) raises, later
code still opens "fparam_aparam.pth" and may read stale data; modify the flow so
that you only proceed with the parity check/load when the export actually
succeeded—e.g., set a boolean (export_succeeded) or check that pth_path was
created by pt_deserialize_to_file before any subsequent open/read or parity
comparison; update references around pth_path, pt_deserialize_to_file, and the
later file-open/parity-check block so the latter is skipped when
export_succeeded is false (or the file is absent/newly written).

Comment on lines +132 to +139
e, f, v, ae, av = dp.eval(
coord,
box,
atype,
fparam=fparam_val,
aparam=aparam_val,
atomic=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Finish the parity check for virial and atomic outputs.

Line 132 binds v, and Line 172 binds ae_pth/av_pth, but the script only reports energy and force diffs. That means virial or atomic-output drift can slip through silently even though those values are part of the generated references.

Suggested fix
-        e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval(
+        e_pth, f_pth, v_pth, ae_pth, av_pth = dp_pth.eval(
             coord,
             box,
             atype,
             fparam=fparam_val,
             aparam=aparam_val,
             atomic=True,
         )
         print(f"\n// .pth vs .pt2 energy diff: {abs(e[0, 0] - e_pth[0, 0]):.2e}")  # noqa: T201
         print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f - f_pth)):.2e}")  # noqa: T201
+        print(f"// .pth vs .pt2 virial max diff: {np.max(np.abs(v - v_pth)):.2e}")  # noqa: T201
+        print(f"// .pth vs .pt2 atomic energy max diff: {np.max(np.abs(ae - ae_pth)):.2e}")  # noqa: T201
+        print(f"// .pth vs .pt2 atomic virial max diff: {np.max(np.abs(av - av_pth)):.2e}")  # noqa: T201

As per coding guidelines, **/*.py: Always run ruff check . and ruff format . before committing changes or CI will fail.

Also applies to: 172-181

🧰 Tools
🪛 GitHub Check: CodeQL

[notice] 136-136: Unused local variable
Variable nloc is not used.

🪛 Ruff (0.15.4)

[warning] 132-132: Unpacked variable v is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_fparam_aparam.py` around lines 132 - 139, The test
currently calls dp.eval(...) binding e, f, v, ae, av but only compares energy
and force diffs; add assertions and diff reporting for virial and atomic
outputs: compare v to the saved reference (v_pth) and ae/av to ae_pth/av_pth
respectively, compute and print max/relative differences like done for e and f,
and fail the test on mismatch. Locate the dp.eval call and follow the existing
pattern used for energy/force diffs to add checks for v, ae, and av so virial
and per-atom values cannot drift silently.

for (int kk = 0; kk < nframes; ++kk) {
std::copy(in.begin() + kk * nall * ndim,
in.begin() + kk * nall * ndim + nloc * ndim,
out.begin() + kk * nloc * ndim);

Check failure

Code scanning / CodeQL

Multiplication result converted to larger type High

Multiplication result may overflow 'int' before it is converted to 'difference_type'.
out.resize(static_cast<size_t>(nframes) * nloc * ndim);
for (int kk = 0; kk < nframes; ++kk) {
std::copy(in.begin() + kk * nall * ndim,
in.begin() + kk * nall * ndim + nloc * ndim,

Check failure

Code scanning / CodeQL

Multiplication result converted to larger type High

Multiplication result may overflow 'int' before it is converted to 'difference_type'.
out.resize(static_cast<size_t>(nframes) * nloc * ndim);
for (int kk = 0; kk < nframes; ++kk) {
std::copy(in.begin() + kk * nall * ndim,
in.begin() + kk * nall * ndim + nloc * ndim,

Check failure

Code scanning / CodeQL

Multiplication result converted to larger type High

Multiplication result may overflow 'int' before it is converted to 'difference_type'.
const int nframes = 1) {
out.resize(static_cast<size_t>(nframes) * nloc * ndim);
for (int kk = 0; kk < nframes; ++kk) {
std::copy(in.begin() + kk * nall * ndim,

Check failure

Code scanning / CodeQL

Multiplication result converted to larger type High

Multiplication result may overflow 'int' before it is converted to 'difference_type'.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
source/tests/pt_expt/infer/test_deep_eval.py (1)

517-523: Refactor to save and restore the default device dynamically to eliminate fragile coupling.

The hardcoded "cuda:9999999" couples this test to the specific value set in source/tests/pt/__init__.py. If that module's configuration changes, this restoration would become incorrect.

♻️ Proposed refactor
-        # Temporarily clear default device to avoid poisoning AOTInductor
-        # compilation (tests/pt/__init__.py sets it to "cuda:9999999").
-        torch.set_default_device(None)
-        try:
-            deserialize_to_file(cls.tmpfile.name, cls.model_data)
-        finally:
-            torch.set_default_device("cuda:9999999")
+        # Temporarily clear default device to avoid poisoning AOTInductor
+        # compilation.
+        prev_device = torch.get_default_device()
+        torch.set_default_device(None)
+        try:
+            deserialize_to_file(cls.tmpfile.name, cls.model_data)
+        finally:
+            torch.set_default_device(prev_device)

The project requires PyTorch 2.10.0, and torch.get_default_device() has been available since PyTorch 2.4, so this refactor is compatible with the project's dependencies.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/infer/test_deep_eval.py` around lines 517 - 523, Replace
the hardcoded restoration of the default device with a dynamic save/restore:
before calling torch.set_default_device(None) capture the current device using
torch.get_default_device(), then call deserialize_to_file(cls.tmpfile.name,
cls.model_data), and finally restore the saved device via
torch.set_default_device(saved_device); ensure you reference the same symbols
(torch.get_default_device, torch.set_default_device, deserialize_to_file,
cls.tmpfile, cls.model_data) so the test no longer depends on the magic
"cuda:9999999" value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@source/tests/pt_expt/infer/test_deep_eval.py`:
- Around line 517-523: Replace the hardcoded restoration of the default device
with a dynamic save/restore: before calling torch.set_default_device(None)
capture the current device using torch.get_default_device(), then call
deserialize_to_file(cls.tmpfile.name, cls.model_data), and finally restore the
saved device via torch.set_default_device(saved_device); ensure you reference
the same symbols (torch.get_default_device, torch.set_default_device,
deserialize_to_file, cls.tmpfile, cls.model_data) so the test no longer depends
on the magic "cuda:9999999" value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 735f1651-98fd-4b60-9de7-4d06506018a5

📥 Commits

Reviewing files that changed from the base of the PR and between acabcfd and af4241b.

📒 Files selected for processing (1)
  • source/tests/pt_expt/infer/test_deep_eval.py

@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 78.51268% with 627 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.22%. Comparing base (b2923e8) to head (1aded45).

Files with missing lines Patch % Lines
source/api_cc/src/DeepPotPTExpt.cc 69.79% 189 Missing and 33 partials ⚠️
source/api_cc/tests/test_deeppot_dpa2_ptexpt.cc 83.17% 53 Missing ⚠️
source/api_cc/tests/test_deeppot_dpa3_ptexpt.cc 83.17% 53 Missing ⚠️
source/api_cc/tests/test_deeppot_dpa_ptexpt.cc 82.84% 53 Missing ⚠️
source/api_cc/tests/test_deeppot_ptexpt.cc 83.33% 48 Missing ⚠️
...pi_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.cc 76.76% 46 Missing ⚠️
source/api_c/tests/test_deeppot_a_ptexpt.cc 74.83% 38 Missing and 1 partial ⚠️
source/api_cc/tests/test_deeppot_dpa1_pt.cc 78.76% 24 Missing ⚠️
source/api_cc/tests/test_deeppot_dpa2_pt.cc 78.76% 24 Missing ⚠️
source/api_cc/tests/test_deeppot_dpa3_pt.cc 78.76% 24 Missing ⚠️
... and 9 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5298      +/-   ##
==========================================
- Coverage   82.37%   82.22%   -0.15%     
==========================================
  Files         780      793      +13     
  Lines       78220    81098    +2878     
  Branches     3677     4003     +326     
==========================================
+ Hits        64433    66686    +2253     
- Misses      12614    13203     +589     
- Partials     1173     1209      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
source/tests/infer/gen_fparam_aparam.py (1)

176-176: ⚠️ Potential issue | 🟡 Minor

Remove unused variable nloc.

This assignment is never used and will fail ruff check with F841 error.

Proposed fix
-    nloc = 6
     atom_energy = ae[0, :, 0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_fparam_aparam.py` at line 176, Remove the unused local
variable assignment "nloc = 6" from source/tests/infer/gen_fparam_aparam.py (the
unused symbol is "nloc"); either delete the line entirely or replace it with a
meaningful use of nloc where intended so the variable is referenced (e.g., used
in a calculation or assertion), ensuring the F841 ruff error is resolved.
🧹 Nitpick comments (6)
source/tests/infer/gen_dpa2.py (2)

206-211: Prefix unused unpacked variables with underscore to satisfy Ruff.

Variables v1 and v_np are unpacked but never used.

Suggested fix
-    e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
+    e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
     ...
-    e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
+    e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_dpa2.py` around lines 206 - 211, The tuple unpackings
from dp.eval include unused variables v1 and v_np; update those unpack lines to
prefix the unused names with an underscore (e.g., _v1 and _v_np) so Ruff won't
complain. Specifically modify the unpack assignments where dp.eval is called
(the variables e1, f1, v1, ae1, av1 and e_np, f_np, v_np, ae_np, av_np) to
rename the unused v1 and v_np to _v1 and _v_np.

51-52: Same bare except Exception: pass pattern as other gen_*.py files.

Narrow the exception scope and emit a warning when the fallback lookup fails. This will fail ruff check (BLE001/S110).

Suggested fix (apply to all gen_*.py files)
-except Exception:
-    pass
+except (ImportError, AttributeError, OSError, RuntimeError) as exc:
+    import warnings
+    warnings.warn(f"Failed to load deepmd custom op library: {exc}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_dpa2.py` around lines 51 - 52, The bare "except
Exception: pass" in gen_dpa2.py should be replaced with a narrowed exception
catch (e.g., KeyError, LookupError or the specific exception thrown by the
fallback lookup) and emit a warning when the fallback lookup fails; update the
except to "except <SpecificException> as e" and call warnings.warn(...) or
logger.warning(...) with context (include the exception message and that the
fallback lookup failed) so the failure is visible and BLE001/S110 is resolved.
source/tests/infer/gen_dpa3.py (1)

184-189: Prefix unused unpacked variables with underscore to satisfy Ruff.

Variables v1 and v_np are unpacked but never used. As per coding guidelines, ruff check will fail on RUF059 warnings.

Suggested fix
-    e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
+    e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
     print(f"\n// PBC total energy: {e1[0, 0]:.18e}")  # noqa: T201
     print_cpp_values("PBC reference values", ae1, f1, av1)

     # ---- 5. Run inference for NoPbc test ----
-    e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
+    e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_dpa3.py` around lines 184 - 189, The unpacked outputs
v1 and v_np from dp.eval are unused and trigger Ruff RUF059; rename them to _v1
and _v_np (or simply _ ) where dp.eval is called (lines assigning e1, f1, v1,
ae1, av1 and e_np, f_np, v_np, ae_np, av_np) so the unused tuple elements are
prefixed with an underscore and the rest of the variables keep their current
names; make the change at both calls to dp.eval to satisfy the linter.
source/tests/infer/gen_fparam_aparam.py (1)

167-167: Prefix unused v with underscore.

Variable v is unpacked but never used. This will trigger Ruff RUF059.

Suggested fix
-    e, f, v, ae, av = dp.eval(
+    e, f, _v, ae, av = dp.eval(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_fparam_aparam.py` at line 167, The unpacking from
dp.eval currently binds an unused variable v which triggers RUF059; update the
tuple unpack to prefix that variable with an underscore (e.g., change "e, f, v,
ae, av = dp.eval(...)" to use "_v" instead) so linters treat it as intentionally
unused; locate the dp.eval call and rename only the middle unpacked variable to
_v (leave e, f, ae, av intact).
source/tests/infer/gen_sea.py (1)

47-48: Bare except Exception: pass will fail Ruff checks.

Same pattern as other gen_*.py files. Narrow the exception scope and emit a warning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_sea.py` around lines 47 - 48, Replace the bare "except
Exception: pass" in source/tests/infer/gen_sea.py with a narrowed exception
handler that catches the specific errors that can occur (e.g., OSError, IOError,
ValueError) and emit a warning using the warnings module; import warnings if
missing and change the block to "except (OSError, IOError, ValueError) as e:
warnings.warn(f'gen_sea generation skipped: {e}', stacklevel=2)". This targets
the specific except block in gen_sea.py and avoids a silent swallow while
satisfying Ruff checks.
source/tests/infer/gen_dpa1.py (1)

51-52: Bare except Exception: pass will fail Ruff checks.

Same pattern as other gen_*.py files. Narrow the exception scope.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/infer/gen_dpa1.py` around lines 51 - 52, Replace the bare
"except Exception: pass" in gen_dpa1.py with a narrow, explicit exception
handler for the exceptions you expect (e.g., ValueError, KeyError) or at minimum
"except Exception as e:" and handle or log e; do not swallow all
exceptions—re-raise unexpected ones—so the try/except around the problematic
block is explicit and Ruff-compliant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/tests/infer/gen_dpa1.py`:
- Around line 192-203: The script instantiates DeepPot(pth_path) without
guarding against a missing .pth file; add a check for the existence/validity of
pth_path before calling DeepPot to avoid a crash when export failed (the same
pattern used in gen_dpa2.py/gen_dpa3.py). If the file is missing or invalid,
skip the .pth parity checks (the block that computes e_pth, f_pth, v_pth,
ae_pth, av_pth and the subsequent NoPbc eval) or log an informative error and
exit gracefully; reference the DeepPot class and the pth_path variable and the
evaluation results (e_pth, f_pth, e_pth_np, f_pth_np) when implementing the
guard.

---

Duplicate comments:
In `@source/tests/infer/gen_fparam_aparam.py`:
- Line 176: Remove the unused local variable assignment "nloc = 6" from
source/tests/infer/gen_fparam_aparam.py (the unused symbol is "nloc"); either
delete the line entirely or replace it with a meaningful use of nloc where
intended so the variable is referenced (e.g., used in a calculation or
assertion), ensuring the F841 ruff error is resolved.

---

Nitpick comments:
In `@source/tests/infer/gen_dpa1.py`:
- Around line 51-52: Replace the bare "except Exception: pass" in gen_dpa1.py
with a narrow, explicit exception handler for the exceptions you expect (e.g.,
ValueError, KeyError) or at minimum "except Exception as e:" and handle or log
e; do not swallow all exceptions—re-raise unexpected ones—so the try/except
around the problematic block is explicit and Ruff-compliant.

In `@source/tests/infer/gen_dpa2.py`:
- Around line 206-211: The tuple unpackings from dp.eval include unused
variables v1 and v_np; update those unpack lines to prefix the unused names with
an underscore (e.g., _v1 and _v_np) so Ruff won't complain. Specifically modify
the unpack assignments where dp.eval is called (the variables e1, f1, v1, ae1,
av1 and e_np, f_np, v_np, ae_np, av_np) to rename the unused v1 and v_np to _v1
and _v_np.
- Around line 51-52: The bare "except Exception: pass" in gen_dpa2.py should be
replaced with a narrowed exception catch (e.g., KeyError, LookupError or the
specific exception thrown by the fallback lookup) and emit a warning when the
fallback lookup fails; update the except to "except <SpecificException> as e"
and call warnings.warn(...) or logger.warning(...) with context (include the
exception message and that the fallback lookup failed) so the failure is visible
and BLE001/S110 is resolved.

In `@source/tests/infer/gen_dpa3.py`:
- Around line 184-189: The unpacked outputs v1 and v_np from dp.eval are unused
and trigger Ruff RUF059; rename them to _v1 and _v_np (or simply _ ) where
dp.eval is called (lines assigning e1, f1, v1, ae1, av1 and e_np, f_np, v_np,
ae_np, av_np) so the unused tuple elements are prefixed with an underscore and
the rest of the variables keep their current names; make the change at both
calls to dp.eval to satisfy the linter.

In `@source/tests/infer/gen_fparam_aparam.py`:
- Line 167: The unpacking from dp.eval currently binds an unused variable v
which triggers RUF059; update the tuple unpack to prefix that variable with an
underscore (e.g., change "e, f, v, ae, av = dp.eval(...)" to use "_v" instead)
so linters treat it as intentionally unused; locate the dp.eval call and rename
only the middle unpacked variable to _v (leave e, f, ae, av intact).

In `@source/tests/infer/gen_sea.py`:
- Around line 47-48: Replace the bare "except Exception: pass" in
source/tests/infer/gen_sea.py with a narrowed exception handler that catches the
specific errors that can occur (e.g., OSError, IOError, ValueError) and emit a
warning using the warnings module; import warnings if missing and change the
block to "except (OSError, IOError, ValueError) as e: warnings.warn(f'gen_sea
generation skipped: {e}', stacklevel=2)". This targets the specific except block
in gen_sea.py and avoids a silent swallow while satisfying Ruff checks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3d4976b0-43fa-4dd1-b2fb-b552ce7e485e

📥 Commits

Reviewing files that changed from the base of the PR and between af4241b and 87daa33.

📒 Files selected for processing (6)
  • source/install/test_cc_local.sh
  • source/tests/infer/gen_dpa1.py
  • source/tests/infer/gen_dpa2.py
  • source/tests/infer/gen_dpa3.py
  • source/tests/infer/gen_fparam_aparam.py
  • source/tests/infer/gen_sea.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
source/api_cc/src/DeepPotPTExpt.cc (2)

653-659: ⚠️ Potential issue | 🟠 Major

Make the output-count check survive release builds.

assert() disappears under -DNDEBUG, so a bad output_keys.json / model-output mismatch turns Line 658 into unchecked indexing. Throw a descriptive runtime exception before the loop instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 653 - 659, Replace the
fragile assert in DeepPotPTExpt::extract_outputs with an explicit size check
that throws a descriptive std::runtime_error when flat_outputs.size() !=
output_keys.size(); before the loop, validate the counts and raise an error that
includes both flat_outputs.size() and output_keys.size() and contextual text
referencing output_map/output_keys so the mismatch is visible in release builds.

549-611: ⚠️ Potential issue | 🟠 Major

Honor file_content or reject it explicitly.

DeepPotPTExpt::init() still reads metadata and constructs the loader from model only. If a caller supplies the package bytes through file_content, this path silently ignores them and loads a different source instead. Either wire the in-memory payload through both steps or fail fast when file_content is non-empty.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 549 - 611,
DeepPotPTExpt::init currently ignores the file_content buffer; either consume it
or reject it. Modify init so when file_content is non-empty you call a
zip-reader that accepts an in-memory buffer (instead of
read_zip_entry(path,...)) to extract "extra/model_def_script.json" and
"extra/output_keys.json" and then construct the
torch::inductor::AOTIModelPackageLoader from the in-memory package (or use an
overload that accepts bytes) when creating loader; if your
AOTIModelPackageLoader cannot be created from bytes, add an explicit check that
throws/logs and returns when file_content is non-empty to fail fast. Ensure
references: DeepPotPTExpt::init, file_content, read_zip_entry (replace or add
read_zip_entry_from_buffer), parse_json, and loader are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 92-126: The parser currently calls get() until it finds a closing
quote (in parse_string_raw) and consumes expected delimiters/tokens elsewhere
without EOF checks; update parse_string_raw to check peek() or an is_eof()
condition before each get() and escape-char read and throw a deepmd_exception if
EOF or an unexpected character is encountered, and apply the same pattern to
every place that unconditionally calls get() for closing delimiters/tokens (the
other token-consuming sections noted in the review), i.e., validate presence of
expected characters before consuming them and throw deepmd_exception on
malformed or truncated JSON instead of reading past the buffer.
- Around line 1095-1120: Before slicing per-frame in the loop, validate that the
batched arrays have exact expected sizes: compute natoms (from atype if
appropriate) and then check coord.size() == nframes * natoms * 3; if box is
non-empty ensure box.size() == nframes * 9; compute dim_fparam and dim_aparam
(expected per-frame widths) and ensure fparam is empty or fparam.size() ==
nframes * dim_fparam and aparam is empty or aparam.size() == nframes *
dim_aparam; only then compute dap = dim_aparam and dfp = dim_fparam (or derive
them as static_cast<int>(aparam.size())/nframes after verifying exact
divisibility) and proceed into the ff loop; on any mismatch return/throw a clear
error referencing nframes, natoms, dim_fparam/dim_aparam,
coord/fparam/aparam/box to avoid silent truncation during per-frame slicing.

---

Duplicate comments:
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 653-659: Replace the fragile assert in
DeepPotPTExpt::extract_outputs with an explicit size check that throws a
descriptive std::runtime_error when flat_outputs.size() != output_keys.size();
before the loop, validate the counts and raise an error that includes both
flat_outputs.size() and output_keys.size() and contextual text referencing
output_map/output_keys so the mismatch is visible in release builds.
- Around line 549-611: DeepPotPTExpt::init currently ignores the file_content
buffer; either consume it or reject it. Modify init so when file_content is
non-empty you call a zip-reader that accepts an in-memory buffer (instead of
read_zip_entry(path,...)) to extract "extra/model_def_script.json" and
"extra/output_keys.json" and then construct the
torch::inductor::AOTIModelPackageLoader from the in-memory package (or use an
overload that accepts bytes) when creating loader; if your
AOTIModelPackageLoader cannot be created from bytes, add an explicit check that
throws/logs and returns when file_content is non-empty to fail fast. Ensure
references: DeepPotPTExpt::init, file_content, read_zip_entry (replace or add
read_zip_entry_from_buffer), parse_json, and loader are updated accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 58df0b9f-a796-41ac-8a02-6a7bf826814b

📥 Commits

Reviewing files that changed from the base of the PR and between 87daa33 and a3ed6d6.

📒 Files selected for processing (1)
  • source/api_cc/src/DeepPotPTExpt.cc

Comment on lines +1095 to +1120
int natoms = atype.size();
int dap = aparam.empty() ? 0 : static_cast<int>(aparam.size()) / nframes;
int dfp = fparam.empty() ? 0 : static_cast<int>(fparam.size()) / nframes;
ener.clear();
force.clear();
virial.clear();
if (atomic) {
atom_energy.clear();
atom_virial.clear();
}
for (int ff = 0; ff < nframes; ++ff) {
std::vector<VALUETYPE> frame_coord(coord.begin() + ff * natoms * 3,
coord.begin() + (ff + 1) * natoms * 3);
std::vector<VALUETYPE> frame_box;
if (!box.empty()) {
frame_box.assign(box.begin() + ff * 9, box.begin() + (ff + 1) * 9);
}
std::vector<VALUETYPE> frame_fparam;
if (!fparam.empty()) {
frame_fparam.assign(fparam.begin() + ff * dfp,
fparam.begin() + (ff + 1) * dfp);
}
std::vector<VALUETYPE> frame_aparam;
if (!aparam.empty()) {
frame_aparam.assign(aparam.begin() + ff * dap,
aparam.begin() + (ff + 1) * dap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate batched input sizes before slicing per frame.

natoms, dfp, and dap are derived with integer division in both loops. If atype, box, fparam, or aparam does not match the expected per-frame layout exactly, the tail is silently truncated and each frame is sliced with the wrong shape. Please reject non-exact multiples up front and validate them against nframes, dim_fparam, and dim_aparam before entering the loop.

Also applies to: 1230-1257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 1095 - 1120, Before slicing
per-frame in the loop, validate that the batched arrays have exact expected
sizes: compute natoms (from atype if appropriate) and then check coord.size() ==
nframes * natoms * 3; if box is non-empty ensure box.size() == nframes * 9;
compute dim_fparam and dim_aparam (expected per-frame widths) and ensure fparam
is empty or fparam.size() == nframes * dim_fparam and aparam is empty or
aparam.size() == nframes * dim_aparam; only then compute dap = dim_aparam and
dfp = dim_fparam (or derive them as static_cast<int>(aparam.size())/nframes
after verifying exact divisibility) and proceed into the ff loop; on any
mismatch return/throw a clear error referencing nframes, natoms,
dim_fparam/dim_aparam, coord/fparam/aparam/box to avoid silent truncation during
per-frame slicing.

deepcopy,
)

import deepmd.pt_expt.utils.env as _env

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note

Module 'deepmd.pt_expt.utils.env' is imported with both 'import' and 'import from'.
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 9, 2026
@github-actions github-actions bot removed the Test CUDA Trigger test CUDA workflow label Mar 9, 2026
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 10, 2026
@github-actions github-actions bot removed the Test CUDA Trigger test CUDA workflow label Mar 10, 2026
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 10, 2026
@github-actions github-actions bot removed the Test CUDA Trigger test CUDA workflow label Mar 10, 2026
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 10, 2026
@github-actions github-actions bot removed the Test CUDA Trigger test CUDA workflow label Mar 10, 2026
@njzjz njzjz linked an issue Mar 15, 2026 that may be closed by this pull request
Han Wang added 2 commits March 21, 2026 21:28
- add support for serialization to .pt2
- deep eval with .pt2
- C/C++ inference api with .pt2
- add ut for python inference
- add ut for c++/c inference: se_e2_a, fparam/aparam, multi-frames, dpa1
Replace flat (nf*nall,) indexing in dpa1.py and exclude_mask.py with
xp_take_along_axis and xp_take_first_n. The flat indexing creates
Ne(nall, nloc) assertions during torch.export tracing that fail when
nall == nloc (NoPbc). The new gather-based approach avoids these
shape constraints.

Unify PT/pt_expt test models so .pth and .pt2 are generated from the
same dpmodel weights (type_one_side=True) via gen scripts. Remove
deeppot_sea.pth, deeppot_dpa.pth, and fparam_aparam.pth from git
tracking — they are now regenerated in CI via convert-models.sh.

Changes:
- dpa1.py, exclude_mask.py: xp_take_along_axis replaces flat indexing
- gen_dpa1.py: generates deeppot_dpa1.pth + .pt2 from dpmodel
- gen_fparam_aparam.py: generates fparam_aparam.pth + .pt2 from dpmodel
- convert-models.sh: regenerate .pth from yaml and run gen scripts in CI
- test_deeppot_dpa_ptexpt.cc: add NoPbc test fixture
- test_deeppot_dpa1_pt.cc: new DPA1 .pth test (PBC + NoPbc)
- test_deeppot_a_fparam_aparam_pt.cc: update reference values
Han Wang added 14 commits March 21, 2026 21:28
…ckends

Fix a latent bug in all 5 C++ backends (DeepPotPT, DeepPotPTExpt,
DeepPotJAX, DeepSpinPT, DeepPotPD) where the ghost-to-local mapping
was incorrectly indexed through fwd_map when virtual atoms are present.
The old code `mapping[ii] = lmp_list.mapping[fwd_map[ii]]` treats the
post-filter index as an original index, causing undefined behavior when
fwd_map contains -1 entries for filtered atoms. The fix uses bkw_map
to correctly translate: new_idx -> orig_idx -> mapped_orig -> new_mapped.

Also fix a use-after-free in DeepPotPTExpt.cc where torch::from_blob
referenced a local vector's data after the vector went out of scope.

Additionally fix dpmodel code (dpa2.py, repformers.py, make_model.py,
nlist.py) to use xp_take_first_n and xp_expand_dims instead of slice
indexing that creates shape constraints incompatible with torch.export.

Add DPA2 C++ test files for both .pth and .pt2 backends with PBC,
NoPbc, and type_sel test cases. Reduce DPA2 test model size for
machine-precision agreement between jit and export paths.
Fix torch.export-incompatible slicing in dpmodel DPA3/repflows code
([:, :nloc] → xp_take_first_n, reshape with -1 → expand_dims), add
gen_dpa3.py model generation script, and create C++ test files for
both .pth and .pt2 backends with PBC and NoPbc fixtures.
Restore deeppot_sea.pth and deeppot_dpa.pth to git tracking — CI
cannot regenerate them because tabulate_fusion_se_t_tebd custom op
is not available in the Python-only CI environment. Remove the
dp convert-backend .pth lines from convert-models.sh.

Add try/except around .pth export in gen_dpa1.py and
gen_fparam_aparam.py so the scripts don't fail when custom ops
are unavailable — only .pt2 generation is required.
make_fx tracing through torch.autograd.grad fails on CUDA with
"opt_ready_stream && opt_parent_stream INTERNAL ASSERT FAILED".
Fix by running make_fx on CPU (the graph is device-agnostic), then
moving sample inputs to the target device before torch.export.export
and AOTInductor compilation so the compiled package targets the
correct device (GPU when available).
make_fx with _allow_non_fake_inputs=True on CUDA triggers autograd
stream assertion (opt_ready_stream && opt_parent_stream). Fix by:
1. Tracing via make_fx on CPU (decomposes autograd.grad into aten ops)
2. Extracting output keys from CPU-traced module
3. Moving traced module + inputs to target device (CUDA if available)
4. Running torch.export.export on target device (the traced graph has
   no autograd.grad calls, so no CUDA stream issue)
This ensures AOTInductor compiles for the correct device.
The -fsanitize=leak flag produces sanitized .so files that fail to
dlopen in Python processes lacking the LSAN runtime.  Preload liblsan.so
via LD_PRELOAD when CXXFLAGS contains the leak sanitizer.

Also use os.path.realpath in gen scripts for robust path resolution and
print diagnostic on library loading failure instead of silently passing.
Replace manual traced-module-to-device approach with the official
torch.export.passes.move_to_device_pass API. This properly rewrites
device='cpu' constants baked into the FX graph by make_fx, avoiding
FakeTensor device propagation errors on CUDA (pytorch/pytorch#121761).
- C++ JSON/ZIP parser: add EOF checks, bounds validation, stod
  exception handling, int64_t loop var to prevent unsigned underflow
- C++ from_blob: add .clone() to all calls to prevent use-after-free
- C++ translate_error: catch std::exception instead of std::runtime_error
- Python deep_eval/serialization: validate ZIP entries before reading
- gen_*.py: move custom op loading after deepmd imports to avoid double
  registration; add inductor compiler fallbacks for CI; remove unused
  subprocess imports
LSAN runtime inherited by g++ subprocesses causes false leak reports
and non-zero exit codes, making torch._inductor think no C++ compiler
works. Clear LD_PRELOAD after custom ops are loaded since it's no
longer needed.
wanghan-iapcm pushed a commit to wanghan-iapcm/deepmd-kit that referenced this pull request Mar 21, 2026
- Guard EOCD scan against short/corrupt .pt2 files (P1)
- Parse ZIP64 fields using full 64-bit widths (P2)
- Match ZIP entry names by suffix for archives with dir prefixes
- Reject unsupported in-memory file_content in DeepPotPTExpt::init
- Replace debug-only assert with runtime exception for output count
- Fix virtual atom coordinate copy: nvir -> nvir*3 in 3 test files
- Tighten DPA2 float EPSILON from 1e-1 to 1e-4, add TODO for ghost
- Narrow exception handling in gen_dpa3.py custom op loading
- Add parity assertions (not just prints) in gen_dpa3.py/.py exports
- Guard .pth parity check on successful export in gen_fparam_aparam.py
- Extend parity check to virial and atomic outputs
- Update deep_eval.py comment on private torch API
Han Wang added 3 commits March 21, 2026 22:51
Add has_default_fparam support to DeepPotPTExpt:
- Embed has_default_fparam in model_def_script.json metadata during
  freeze (_collect_metadata)
- Read it in C++ init from metadata, with fallback to false for older
  .pt2 files
- Add C++ tests for both has_default_fparam==false and ==true
- Generate fparam_aparam_default.pt2 test model with default_fparam set
- Guard EOCD scan against short/corrupt .pt2 files (P1)
- Parse ZIP64 fields using full 64-bit widths (P2)
- Match ZIP entry names by suffix for archives with dir prefixes
- Reject unsupported in-memory file_content in DeepPotPTExpt::init
- Replace debug-only assert with runtime exception for output count
- Fix virtual atom coordinate copy: nvir -> nvir*3 in 3 test files
- Tighten DPA2 float EPSILON from 1e-1 to 1e-4, add TODO for ghost
- Narrow exception handling in gen_dpa3.py custom op loading
- Add parity assertions (not just prints) in gen_dpa3.py/.py exports
- Guard .pth parity check on successful export in gen_fparam_aparam.py
- Extend parity check to virial and atomic outputs
- Update deep_eval.py comment on private torch API
The gen_*.py scripts import torch, which has internal pybind/JIT
allocations that LSAN reports as leaks.  These are not our leaks.
Previously, LD_PRELOAD=liblsan.so was set for the gen scripts to
handle dlopen of the sanitizer-built custom op .so, but this causes
LSAN to track all Python/torch allocations and fail on exit.

Fix: run gen scripts without LD_PRELOAD.  The gen scripts don't
need LSAN — only the C++ test binary (ctest) does, and ctest
already inherits LSAN via the -fsanitize=leak compile flag.

Also revert the DPA2 cpu_lmp_nlist test (needs further investigation).
Han Wang added 2 commits March 21, 2026 23:04
- Remove unused variable nloc in gen_fparam_aparam.py
- Cast int multiplications to size_t before use as iterator offsets
  in DeepPotPTExpt multi-frame compute to prevent overflow
The gen scripts need LD_PRELOAD=liblsan.so to dlopen the custom op
.so built with -fsanitize=leak.  But LSAN leak detection must be
disabled (detect_leaks=0) to avoid false reports from torch/paddle
internal allocations.
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 21, 2026
@github-actions github-actions bot removed the Test CUDA Trigger test CUDA workflow label Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

C++ Inference Support (PyTorch Exportable + AOTI)

1 participant