Skip to content

Conversation

@Schaeff
Copy link

@Schaeff Schaeff commented Oct 28, 2025

Released under this tag

Golovanov399 and others added 30 commits September 1, 2025 10:32
)

Resolves INT-4950.

Some CUDA 13.0 optimizer choices caused kernel
`fri_reduced_opening_tracegen` to use 80 registers (previously 58),
causing the total number of registers per block to exceed the allowable
amount set by the device.

[Benchmark
difference](https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/17476936661)
on CUDA 12.9 is negligible. Benchmarks done to determine savings gained
by switching to CUDA 13.0 will be done later.
…g#2121)

- pass `pc`, `instret` and
`instret_end`/`max_execution_cost`/`segment_check_insns` by value in
execution handlers to get them to be passed in registers
- add `likely`, `unlikely` hints for suspension/termination in `tco`

[benchmark
comparison](https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/17513217695#summary-49747838308)

Towards INT-4921
git cache was saving stark-backend build with cuda12.9 which doesn't
work on new runner images with cuda13.0

closes INT-4948
)

- update `create_tco_handler` macro to `create_handler` which now
automatically sets `exit_code` based on whether the execute impl returns
`Result::Err`. it acts as a simple wrapper for execute impls that don't
return a `Result`
- only do exit checks for executors that can exit in tco mode i.e. for
execute impls that return `Result`
- feature gate all non-tco functions with `#[cfg(not(feature = "tco"))]`

[benchmark
comparison](https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/17590682650#summary-49971529404)

Towards INT-4921
…T refactor (openvm-org#2127)

Updates `openvm-cuda-backend` with:
- Performance update for better GPU memory usage:
openvm-org/stark-backend#114
- Refactor to avoid some sporadic issues with NTT params initialization:
openvm-org/stark-backend#123

---------

Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
…m-org#2137)

- add `ProgramAir` height as a constant trace height in metered
execution

doesn't seem to affect the
[benchmark](https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/17747443362)
which is good
…g#2108)

Co-authored-by: stephenh-axiom-xyz <stephenh@intrinsictech.xyz>
Replace the optimistic execution/segmentation with a checkpointing
approach that checkpoints the last `trace_height`/`instret` value that
is below the thresholds and use these values for the segments. This
should make the segmentation more predictable for downstream usage since
the segments should satisfy the thresholds (with the only caveat being
if segmentation happens when there is no checkpoint to fall back to i.e.
if we overshoot the threshold before the first segmentation check)

This requires storing some extra state and results in a higher segment
count compared to earlier for the same thresholds. Also makes execution
slightly slower since we're doing some extra work now

[benchmark
run](https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/17750607194)
with 0.7B max cells
[benchmark
run](https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/17771662984)
with 1.2B max cells
- log total cells, total interactions and max trace height for each
segment
…#2140)

- E2 execution supports suspension every segment.
[Reth-benchmark](https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/17782856127)
doesn't show performance difference.
- `VmExecState`/`VmState` supports clone. Benchmark shows cloning a VM
state which only uses address space 2 takes 0.3~0.6ms.

closes INT-5066
closes INT-5067
### What
- updates install instructions link

| Before | After |
|--------|--------|
| <img width="1293" height="775" alt="image"
src="https://github.com/user-attachments/assets/f30738c1-94a9-49b1-b0ce-1afc19739970"
/> | <img width="1293" height="775" alt="image"
src="https://github.com/user-attachments/assets/308f5473-e931-4745-a436-d53c019a4223"
/> |
…penvm-org#2149)

Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
The `two_modular_limbs_list` constant declared by `moduli_init!` and
used later in the complex setup macro was not guaranteed to be aligned
to a large enough alignment. This caused an issue in some use-case.

This change aligns the public constant `two_modular_limbs_list` to the
max block size of a modulus among all moduli declared to ensure no
memory alignment issues arise.

Fixes INT-5135

---------

Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
jonathanpwang and others added 23 commits September 28, 2025 23:58
For guest programs with MSRV 1.87+, it is convenient for `cargo openvm
build` to use a newer Rust toolchain. We chose `2025-08-02` since that
is when the last stable Rust 1.90.0 branched from master:
https://releases.rs/docs/1.90.0/

Note that users can always override the toolchain version with env var
`OPENVM_RUST_TOOLCHAIN`.

workflow run:
https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/18085897389/job/51456939479
`InterpretedInstance::execute_from_state` for pure execution was using
`num_insns` instead of `instret_end` in the `ExecutionCtx` constructor.
This was not caught because we currently only ever execute from
`instret=0`.
Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
Currently the invalid memory access is never used, but it's still better
to address it so `compute-analyzer` does not complain.
Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
Co-authored-by: Ayush Shukla <ayush@axiom.xyz>
It is not required after a small refactor.
Co-authored-by: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com>
Renamed `openvm-examples` since it has more than one example.
In some edge case where right after we start `build_async` on the memory
merkle subtrees, if the program panics, then the order of drop could be
that we drop the `initial_memory` buffers on the default stream first,
while the `build_async` kernels are still running and using those
buffers. This leads to a deadloop. I fixed it by just forcing the drop
to drop subtrees first (which should sync their special streams) before
dropping `initial_memory`.

compare:
https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/18733111153
When building autoprecompile chips in GPU in `powdr-labs/powdr`, we need
access to `histogram.cuh` in `openvm/circuit-primitives` to modify
counts of periphery chips. This requires exporting include dir in
`openvm/circuit-primitives` for downstream crates (`powdr-openvm`). We
believe this (and potentially other crates) to be general client
extension usage.

The way we export `openvm/circuit-primitives` to
`powdr-labs/powdr-openvm` is EXACTLY the same as how
`stark-backend/cuda-common` is exported to `openvm`:
-
https://github.com/openvm-org/stark-backend/blob/main/crates/cuda-common/build.rs#L18-L19
-
https://github.com/openvm-org/stark-backend/blob/main/crates/cuda-common/Cargo.toml#L7

---------

Co-authored-by: Schaeff <thibaut@powdrlabs.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- [x] Update workspace version
- [x] Update changelog
merge openvm main as of 11/16 (7e94889) over tag `v1.4.1-powdr`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.