Skip to content

Conversation

ironcev
Copy link
Member

@ironcev ironcev commented Aug 26, 2025

Description

This PR rewrites the compilation of storage accesses via storage keyword, e.g. storage.field. The new compilation is optimized for gas usage and bytecode size. Some performance comparisons are shown below.

When accessing storage, the old compilation was repeatedly constructing the same StorageKey as a local variable at each storage access site. E.g., for this example:

storage {
    scalar: u64 = 0,
}

impl Contract {
    #[storage(read)]
    fn poke_storage() {
        poke(storage.scalar);
    }
}

the resulting IR was:

pub entry fn poke_storage() -> (), !4 {
    local { b256, u64, b256 } __anon_0
    local b256 __const = const b256 0x9e0e87bef2e44d9771eb12cfc81e34e4dd6caad55385354757a8898a2c808b61
    local b256 __const0 = const b256 0x9e0e87bef2e44d9771eb12cfc81e34e4dd6caad55385354757a8898a2c808b61
    local { b256, u64, b256 } __tmp_arg

    entry():

    // BEGIN OF `storage.scalar` ACCESS

    v0 = get_local ptr b256, __const
    v1 = get_local ptr b256, __const0
    v6 = get_local ptr { b256, u64, b256 }, __anon_0, !5
    v7 = const u64 0
    v8 = get_elem_ptr v6, ptr b256, v7
    mem_copy_val v8, v0
    v9 = const u64 1
    v10 = get_elem_ptr v6, ptr u64, v9
    v11 = const u64 0
    store v11 to v10, !5
    v12 = const u64 2
    v13 = get_elem_ptr v6, ptr b256, v12
    mem_copy_val v13, v1

    // END OF `storage.scalar` ACCESS

    v14 = get_local ptr { b256, u64, b256 }, __tmp_arg
    mem_copy_val v14, v6
    v15 = call poke_2(v14)
    v9 = const unit ()
    ret () v9

The overall cost of each access was:

  • a local on the stack for each access, __anon_0 in this case.
  • two mem_copy_vals of the storage slot and field id into that local.
  • storing the slot offset into that local.

The bytecode size cost per access site was significant, but constant, six ASM instructions:

load $r0 data_NonConfigurable_0  ; get local constant
load $r1 data_NonConfigurable_0  ; get local constant
mcpi $$locbase $r0 i32           ; copy memory
sw   $$locbase $zero i4          ; store word
addi $r0 $$locbase i40           ; get offset to aggregate element
mcpi $r0 $r1 i32                 ; copy memory

The gas cost was especially problematic, because a single call site could appear in loops, where the six instructions were repeatedly called.

The new implementation stores whole StorageKeys into the data section, similar to global constants and configurables. Access sites is then compiled to a single pointer access. The new IR for the above example becomes:

contract {
    storage_key storage.scalar = 0x9e0e87bef2e44d9771eb12cfc81e34e4dd6caad55385354757a8898a2c808b61

    pub entry fn poke_storage() -> (), !4 {
        local { b256, u64, b256 } __tmp_arg

        entry():

        // BEGIN OF `storage.scalar` ACCESS

        v0 = get_storage_key __ptr { b256, u64, b256 }, storage.scalar, !5

        // END OF `storage.scalar` ACCESS

        v1 = get_local __ptr { b256, u64, b256 }, __tmp_arg
        mem_copy_val v1, v0
        v2 = call poke_2(v1)
        v9 = const unit ()
        ret () v9
    }
}

And in ASM:

addr $r0 data_NonConfigurable_0     ; get storage.scalar's address in data section

Performance Comparisons

The gas savings per storage access will, of course, depend on the number of actual calls.

To the bytecode size, the optimization deliberately increases the size of the data section, but only for the case of StorageKey slots being equal to field ids. Instead of storing only 32 bytes for a single b256 address, we are now using 32 + 32 + 8 = 72 bytes for storing the whole StorageKey, even if the slot address is the same as field id. That's 40 bytes bytecode increase per storage field.

However, we are reducing the storage access for 5 opcodes which results in 5 x 4 = 20 bytes of savings for every storage field access. In other words, if a storage field is accessed more then once anywhere in code, there is no size increase, on the contrary, the bytecode size decreases.

This is something we expect in real world programs - storage fields being accessed in more then two places, resulting in bytecode size decrease.

Bytecode size of should_pass tests

Test Before After Percentage
empty_fields_in_storage_struct 22528 21768 3.37%
language/fallback_only 2008 1944 3.19%
language/generics_in_contract 2448 2400 1.96%
static_analysis/cei_pattern_violation_more_complex_logic 15392 15064 2.13%
static_analysis/cei_pattern_violation_storage_map_and_vec 6424 6224 3.11%
static_analysis/cei_pattern_violation_storage_struct_read 2784 2760 0.86%
static_analysis/cei_pattern_violation_storage_var_read 2952 2928 0.81%
static_analysis/cei_pattern_violation_storage_var_update 2664 2648 0.60%
stdlib/storage_vec_insert 4864 4816 0.99%
storage_slot_key_calculation 4848 4856 -0.17%
supertraits_for_abis_ownable 3576 3536 1.12%
test_contracts/basic_storage 32272 31720 1.71%
test_contracts/increment_contract 3168 2984 5.81%
test_contracts/storage_access_contract 28248 26720 5.41%
test_contracts/storage_enum_contract 16128 13736 14.83%
test_contracts/storage_namespace 31248 30648 1.92%

The slight bytecode size increase in the storage_slot_key_calculation test comes from having exactly one storage.field access for every storage field.

Bytecode size and gas usage of Blackjack project

Test Before After Difference (Gas)
loss_game_test 49497 49181 316
simple_game_test 54123 53798 325

The bytecode size decreased from 23320 to 22760 bytes, 560 bytes.

Gas usage of storage_vec_iter_tests

Test Before After Gas Percentage
storage_vec_field_for_loop_iteration 163832 162916 916 0.56%
storage_vec_field_nested_for_loop_iteration 1395342 1392038 3304 0.24%

Checklist

  • I have linked to any relevant issues.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation where relevant (API docs, the reference, and the Sway book).
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
  • I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
  • I have requested a review from the relevant team or maintainers.

@ironcev ironcev self-assigned this Aug 26, 2025
@ironcev ironcev added compiler General compiler. Should eventually become more specific as the issue is triaged compiler: ir IRgen and sway-ir including optimization passes labels Aug 26, 2025
Copy link

codspeed-hq bot commented Aug 26, 2025

CodSpeed Performance Report

Merging #7357 will not alter performance

Comparing ironcev/optimize-storage-element-access (0f1e885) with master (d49cf8e)

Summary

✅ 25 untouched benchmarks

@ironcev ironcev marked this pull request as ready for review August 27, 2025 18:12
@ironcev ironcev requested review from a team as code owners August 27, 2025 18:12
@ironcev ironcev enabled auto-merge (squash) August 27, 2025 18:13
vaivaswatha
vaivaswatha previously approved these changes Aug 28, 2025
@ironcev ironcev merged commit 84e575a into master Aug 29, 2025
41 checks passed
@ironcev ironcev deleted the ironcev/optimize-storage-element-access branch August 29, 2025 01:20
Elaela22soL pushed a commit to Elaela22soL/sway that referenced this pull request Sep 26, 2025
## Description

This PR rewrites the compilation of storage accesses via `storage`
keyword, e.g. `storage.field`. The new compilation is optimized for gas
usage and bytecode size. Some performance comparisons are shown below.

When accessing storage, the old compilation was repeatedly constructing
the same `StorageKey` as a local variable at each storage access site.
E.g., for this example:

```sway
storage {
    scalar: u64 = 0,
}

impl Contract {
    #[storage(read)]
    fn poke_storage() {
        poke(storage.scalar);
    }
}
```
the resulting IR was:
```
pub entry fn poke_storage() -> (), !4 {
    local { b256, u64, b256 } __anon_0
    local b256 __const = const b256 0x9e0e87bef2e44d9771eb12cfc81e34e4dd6caad55385354757a8898a2c808b61
    local b256 __const0 = const b256 0x9e0e87bef2e44d9771eb12cfc81e34e4dd6caad55385354757a8898a2c808b61
    local { b256, u64, b256 } __tmp_arg

    entry():

    // BEGIN OF `storage.scalar` ACCESS

    v0 = get_local ptr b256, __const
    v1 = get_local ptr b256, __const0
    v6 = get_local ptr { b256, u64, b256 }, __anon_0, !5
    v7 = const u64 0
    v8 = get_elem_ptr v6, ptr b256, v7
    mem_copy_val v8, v0
    v9 = const u64 1
    v10 = get_elem_ptr v6, ptr u64, v9
    v11 = const u64 0
    store v11 to v10, !5
    v12 = const u64 2
    v13 = get_elem_ptr v6, ptr b256, v12
    mem_copy_val v13, v1

    // END OF `storage.scalar` ACCESS

    v14 = get_local ptr { b256, u64, b256 }, __tmp_arg
    mem_copy_val v14, v6
    v15 = call poke_2(v14)
    v9 = const unit ()
    ret () v9
```
The overall cost of each access was:
- a local on the stack for each access, `__anon_0` in this case.
- two `mem_copy_val`s of the storage slot and field id into that local.
- storing the slot offset into that local.

The bytecode size cost per access site was significant, but constant,
six ASM instructions:

```
load $r0 data_NonConfigurable_0  ; get local constant
load $r1 data_NonConfigurable_0  ; get local constant
mcpi $$locbase $r0 i32           ; copy memory
sw   $$locbase $zero i4          ; store word
addi $r0 $$locbase i40           ; get offset to aggregate element
mcpi $r0 $r1 i32                 ; copy memory
```

The gas cost was especially problematic, because a single call site
could appear in loops, where the six instructions were repeatedly
called.

The new implementation stores whole `StorageKey`s into the data section,
similar to global constants and configurables. Access sites is then
compiled to a single pointer access. The new IR for the above example
becomes:

```
contract {
    storage_key storage.scalar = 0x9e0e87bef2e44d9771eb12cfc81e34e4dd6caad55385354757a8898a2c808b61

    pub entry fn poke_storage() -> (), !4 {
        local { b256, u64, b256 } __tmp_arg

        entry():

        // BEGIN OF `storage.scalar` ACCESS

        v0 = get_storage_key __ptr { b256, u64, b256 }, storage.scalar, !5

        // END OF `storage.scalar` ACCESS

        v1 = get_local __ptr { b256, u64, b256 }, __tmp_arg
        mem_copy_val v1, v0
        v2 = call poke_2(v1)
        v9 = const unit ()
        ret () v9
    }
}
```

And in ASM:
```
addr $r0 data_NonConfigurable_0     ; get storage.scalar's address in data section
```

## Performance Comparisons

The gas savings per storage access will, of course, depend on the number
of actual calls.

To the bytecode size, the optimization deliberately increases the size
of the data section, but only for the case of `StorageKey` slots being
equal to field ids. Instead of storing only 32 bytes for a single `b256`
address, we are now using 32 + 32 + 8 = 72 bytes for storing the whole
`StorageKey`, even if the slot address is the same as field id. That's
40 bytes bytecode increase per storage field.

However, we are reducing the storage access for 5 opcodes which results
in 5 x 4 = 20 bytes of savings for every storage field access. In other
words, if a storage field is accessed more then once anywhere in code,
there is no size increase, on the contrary, the bytecode size decreases.

This is something we expect in real world programs - storage fields
being accessed in more then two places, resulting in bytecode size
decrease.

### Bytecode size of `should_pass` tests

| Test | Before | After | Percentage |
|------|--------|-------|------------|
| empty_fields_in_storage_struct | 22528 | 21768 | 3.37% |
| language/fallback_only | 2008 | 1944 | 3.19% |
| language/generics_in_contract | 2448 | 2400 | 1.96% |
| static_analysis/cei_pattern_violation_more_complex_logic | 15392 |
15064 | 2.13% |
| static_analysis/cei_pattern_violation_storage_map_and_vec | 6424 |
6224 | 3.11% |
| static_analysis/cei_pattern_violation_storage_struct_read | 2784 |
2760 | 0.86% |
| static_analysis/cei_pattern_violation_storage_var_read | 2952 | 2928 |
0.81% |
| static_analysis/cei_pattern_violation_storage_var_update | 2664 | 2648
| 0.60% |
| stdlib/storage_vec_insert | 4864 | 4816 | 0.99% |
| storage_slot_key_calculation | 4848 | 4856 | -0.17% |
| supertraits_for_abis_ownable | 3576 | 3536 | 1.12% |
| test_contracts/basic_storage | 32272 | 31720 | 1.71% |
| test_contracts/increment_contract | 3168 | 2984 | 5.81% |
| test_contracts/storage_access_contract | 28248 | 26720 | 5.41% |
| test_contracts/storage_enum_contract | 16128 | 13736 | 14.83% |
| test_contracts/storage_namespace | 31248 | 30648 | 1.92% |

The slight bytecode size increase in the `storage_slot_key_calculation`
test comes from having exactly one `storage.field` access for every
storage field.

### Bytecode size and gas usage of Blackjack project

| Test | Before | After | Difference (Gas) |
| ---- | ------- | ----- | -------------- |
| loss_game_test | 49497 | 49181 | 316 |
| simple_game_test | 54123 | 53798 | 325 |

The bytecode size decreased from 23320 to 22760 bytes, 560 bytes.

### Gas usage of `storage_vec_iter_tests`

| Test | Before | After | Gas | Percentage |
|------|--------|-------|-----|------------|
| storage_vec_field_for_loop_iteration | 163832 | 162916 | 916 | 0.56% |
| storage_vec_field_nested_for_loop_iteration | 1395342 | 1392038 | 3304
| 0.24% |

## Checklist

- [ ] I have linked to any relevant issues.
- [x] I have commented my code, particularly in hard-to-understand
areas.
- [ ] I have updated the documentation where relevant (API docs, the
reference, and the Sway book).
- [ ] If my change requires substantial documentation changes, I have
[requested support from the DevRel
team](https://github.com/FuelLabs/devrel-requests/issues/new/choose)
- [x] I have added tests that prove my fix is effective or that my
feature works.
- [ ] I have added (or requested a maintainer to add) the necessary
`Breaking*` or `New Feature` labels where relevant.
- [x] I have done my best to ensure that my PR adheres to [the Fuel Labs
Code Review
Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md).
- [x] I have requested a review from the relevant team or maintainers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler: ir IRgen and sway-ir including optimization passes compiler General compiler. Should eventually become more specific as the issue is triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants