-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Optimize storage access compilation #7357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CodSpeed Performance ReportMerging #7357 will not alter performanceComparing Summary
|
vaivaswatha
previously approved these changes
Aug 28, 2025
IGI-111
approved these changes
Aug 28, 2025
JoshuaBatty
approved these changes
Aug 29, 2025
Elaela22soL
pushed a commit
to Elaela22soL/sway
that referenced
this pull request
Sep 26, 2025
## Description This PR rewrites the compilation of storage accesses via `storage` keyword, e.g. `storage.field`. The new compilation is optimized for gas usage and bytecode size. Some performance comparisons are shown below. When accessing storage, the old compilation was repeatedly constructing the same `StorageKey` as a local variable at each storage access site. E.g., for this example: ```sway storage { scalar: u64 = 0, } impl Contract { #[storage(read)] fn poke_storage() { poke(storage.scalar); } } ``` the resulting IR was: ``` pub entry fn poke_storage() -> (), !4 { local { b256, u64, b256 } __anon_0 local b256 __const = const b256 0x9e0e87bef2e44d9771eb12cfc81e34e4dd6caad55385354757a8898a2c808b61 local b256 __const0 = const b256 0x9e0e87bef2e44d9771eb12cfc81e34e4dd6caad55385354757a8898a2c808b61 local { b256, u64, b256 } __tmp_arg entry(): // BEGIN OF `storage.scalar` ACCESS v0 = get_local ptr b256, __const v1 = get_local ptr b256, __const0 v6 = get_local ptr { b256, u64, b256 }, __anon_0, !5 v7 = const u64 0 v8 = get_elem_ptr v6, ptr b256, v7 mem_copy_val v8, v0 v9 = const u64 1 v10 = get_elem_ptr v6, ptr u64, v9 v11 = const u64 0 store v11 to v10, !5 v12 = const u64 2 v13 = get_elem_ptr v6, ptr b256, v12 mem_copy_val v13, v1 // END OF `storage.scalar` ACCESS v14 = get_local ptr { b256, u64, b256 }, __tmp_arg mem_copy_val v14, v6 v15 = call poke_2(v14) v9 = const unit () ret () v9 ``` The overall cost of each access was: - a local on the stack for each access, `__anon_0` in this case. - two `mem_copy_val`s of the storage slot and field id into that local. - storing the slot offset into that local. The bytecode size cost per access site was significant, but constant, six ASM instructions: ``` load $r0 data_NonConfigurable_0 ; get local constant load $r1 data_NonConfigurable_0 ; get local constant mcpi $$locbase $r0 i32 ; copy memory sw $$locbase $zero i4 ; store word addi $r0 $$locbase i40 ; get offset to aggregate element mcpi $r0 $r1 i32 ; copy memory ``` The gas cost was especially problematic, because a single call site could appear in loops, where the six instructions were repeatedly called. The new implementation stores whole `StorageKey`s into the data section, similar to global constants and configurables. Access sites is then compiled to a single pointer access. The new IR for the above example becomes: ``` contract { storage_key storage.scalar = 0x9e0e87bef2e44d9771eb12cfc81e34e4dd6caad55385354757a8898a2c808b61 pub entry fn poke_storage() -> (), !4 { local { b256, u64, b256 } __tmp_arg entry(): // BEGIN OF `storage.scalar` ACCESS v0 = get_storage_key __ptr { b256, u64, b256 }, storage.scalar, !5 // END OF `storage.scalar` ACCESS v1 = get_local __ptr { b256, u64, b256 }, __tmp_arg mem_copy_val v1, v0 v2 = call poke_2(v1) v9 = const unit () ret () v9 } } ``` And in ASM: ``` addr $r0 data_NonConfigurable_0 ; get storage.scalar's address in data section ``` ## Performance Comparisons The gas savings per storage access will, of course, depend on the number of actual calls. To the bytecode size, the optimization deliberately increases the size of the data section, but only for the case of `StorageKey` slots being equal to field ids. Instead of storing only 32 bytes for a single `b256` address, we are now using 32 + 32 + 8 = 72 bytes for storing the whole `StorageKey`, even if the slot address is the same as field id. That's 40 bytes bytecode increase per storage field. However, we are reducing the storage access for 5 opcodes which results in 5 x 4 = 20 bytes of savings for every storage field access. In other words, if a storage field is accessed more then once anywhere in code, there is no size increase, on the contrary, the bytecode size decreases. This is something we expect in real world programs - storage fields being accessed in more then two places, resulting in bytecode size decrease. ### Bytecode size of `should_pass` tests | Test | Before | After | Percentage | |------|--------|-------|------------| | empty_fields_in_storage_struct | 22528 | 21768 | 3.37% | | language/fallback_only | 2008 | 1944 | 3.19% | | language/generics_in_contract | 2448 | 2400 | 1.96% | | static_analysis/cei_pattern_violation_more_complex_logic | 15392 | 15064 | 2.13% | | static_analysis/cei_pattern_violation_storage_map_and_vec | 6424 | 6224 | 3.11% | | static_analysis/cei_pattern_violation_storage_struct_read | 2784 | 2760 | 0.86% | | static_analysis/cei_pattern_violation_storage_var_read | 2952 | 2928 | 0.81% | | static_analysis/cei_pattern_violation_storage_var_update | 2664 | 2648 | 0.60% | | stdlib/storage_vec_insert | 4864 | 4816 | 0.99% | | storage_slot_key_calculation | 4848 | 4856 | -0.17% | | supertraits_for_abis_ownable | 3576 | 3536 | 1.12% | | test_contracts/basic_storage | 32272 | 31720 | 1.71% | | test_contracts/increment_contract | 3168 | 2984 | 5.81% | | test_contracts/storage_access_contract | 28248 | 26720 | 5.41% | | test_contracts/storage_enum_contract | 16128 | 13736 | 14.83% | | test_contracts/storage_namespace | 31248 | 30648 | 1.92% | The slight bytecode size increase in the `storage_slot_key_calculation` test comes from having exactly one `storage.field` access for every storage field. ### Bytecode size and gas usage of Blackjack project | Test | Before | After | Difference (Gas) | | ---- | ------- | ----- | -------------- | | loss_game_test | 49497 | 49181 | 316 | | simple_game_test | 54123 | 53798 | 325 | The bytecode size decreased from 23320 to 22760 bytes, 560 bytes. ### Gas usage of `storage_vec_iter_tests` | Test | Before | After | Gas | Percentage | |------|--------|-------|-----|------------| | storage_vec_field_for_loop_iteration | 163832 | 162916 | 916 | 0.56% | | storage_vec_field_nested_for_loop_iteration | 1395342 | 1392038 | 3304 | 0.24% | ## Checklist - [ ] I have linked to any relevant issues. - [x] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [x] I have added tests that prove my fix is effective or that my feature works. - [ ] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [x] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [x] I have requested a review from the relevant team or maintainers.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
compiler: ir
IRgen and sway-ir including optimization passes
compiler
General compiler. Should eventually become more specific as the issue is triaged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR rewrites the compilation of storage accesses via
storage
keyword, e.g.storage.field
. The new compilation is optimized for gas usage and bytecode size. Some performance comparisons are shown below.When accessing storage, the old compilation was repeatedly constructing the same
StorageKey
as a local variable at each storage access site. E.g., for this example:the resulting IR was:
The overall cost of each access was:
__anon_0
in this case.mem_copy_val
s of the storage slot and field id into that local.The bytecode size cost per access site was significant, but constant, six ASM instructions:
The gas cost was especially problematic, because a single call site could appear in loops, where the six instructions were repeatedly called.
The new implementation stores whole
StorageKey
s into the data section, similar to global constants and configurables. Access sites is then compiled to a single pointer access. The new IR for the above example becomes:And in ASM:
Performance Comparisons
The gas savings per storage access will, of course, depend on the number of actual calls.
To the bytecode size, the optimization deliberately increases the size of the data section, but only for the case of
StorageKey
slots being equal to field ids. Instead of storing only 32 bytes for a singleb256
address, we are now using 32 + 32 + 8 = 72 bytes for storing the wholeStorageKey
, even if the slot address is the same as field id. That's 40 bytes bytecode increase per storage field.However, we are reducing the storage access for 5 opcodes which results in 5 x 4 = 20 bytes of savings for every storage field access. In other words, if a storage field is accessed more then once anywhere in code, there is no size increase, on the contrary, the bytecode size decreases.
This is something we expect in real world programs - storage fields being accessed in more then two places, resulting in bytecode size decrease.
Bytecode size of
should_pass
testsThe slight bytecode size increase in the
storage_slot_key_calculation
test comes from having exactly onestorage.field
access for every storage field.Bytecode size and gas usage of Blackjack project
The bytecode size decreased from 23320 to 22760 bytes, 560 bytes.
Gas usage of
storage_vec_iter_tests
Checklist
Breaking*
orNew Feature
labels where relevant.