Skip to content

Correct hart_stack_size assertion #296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 19, 2025

Conversation

janderholm
Copy link
Contributor

@janderholm janderholm commented Jun 11, 2025

Solves #295.

Corrected the assert such that the stack can be divided exactly in the number of harts. Also set the default hart stack size to this value.

@janderholm janderholm requested a review from a team as a code owner June 11, 2025 17:56
Copy link
Contributor

@romancardenas romancardenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please, add the change in the CHANGELOG.md

let hart_stack_size = if env::var_os("CARGO_FEATURE_SINGLE_HART").is_some() {
"SIZEOF(.stack)"
} else {
"2K"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this make sense?

Suggested change
"2K"
"SIZEOF(.stack) / (_max_hart_id + 1)"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is possible, in single-core targets, the result of SIZEOF(.stack) / (_max_hart_id + 1) would be SIZEOF(.stack)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the environment variable check can probably be skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree too. I have a question though since I don't have much experience with multi core devices.

If we default to splitting the stack equal parts among the cores (4 in case of 4 cores), do we need to add some more checks?

I see there's a heap variable with a default size of 0. If someone have currently set the heap to something that will cause the stack of their last core to become smaller than the others.

Is this something that needs to be taken into account?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember well, the .stack section is isolated from the .heap section. By default, the size of .heap is 0, and all the remaining space is left to the stack. If you set a size for the heap, then the size of the stack decreases, and is evenly distributed among cores.

But we must double-check this!

Copy link
Contributor Author

@janderholm janderholm Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, yes i see, because .heap does . += _heap_size;

With heap set to 1024:
.heap 0 0x1000008
.stack 4088 0x1000008

With heap set to 1024:
.heap 1024 0x1000008
.stack 3064 0x1000408

I checked with nm <elf> | grep _hart_stack_size and it looks like it works even with sizes not perfectly divisable (they get rounded down). But I'm unsure about the alignment. For example with 3 harts I get a stack size of 1362.

It looks like these instructions in _start might be there to fix that that though?

cfg_global_asm!(
    "la t1, _stack_start",
    #[cfg(not(feature = "single-hart"))]
    "sub t1, t1, t0",
    "andi sp, t1, -16 // align stack to 16-bytes
    add s0, sp, zero",
);

This level of assembler and bit fibbling is not my strong suite.

Copy link
Contributor

@rmsyn rmsyn Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm unsure about the alignment. For example with 3 harts I get a stack size of 1362.

We can probably include an assert to ensure that the resulting stack size(s) satisfy alignment requirements.

Alternatively, we could modify the SIZEOF(.stack) / (_max_hart_id + 1) to always produce the nearest multiple of 16.

For example, I'm not sure if linker scripts can do modulo operations, but we could do the equivalent of:

raw_size = SIZEOF(.stack) / (_max_hart_id + 1)
stack_rem = raw_size % 16 // get the remainder of our alignment size
_hart_stack_size = raw_size - stack_rem // correct for any misalignment

We of course would want to round down to the nearest stack alignment multiple to avoid blowing the stack.

It looks like these instructions in _start might be there to fix that that though?

Yes, but as I understand it, the stack offsets will also need to be properly aligned. So, having the assert to check at the linking stage will still be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assembly code already aligns the stack to 16 bytes. You divide evenly the stack between harts and, for each hart, if the starting point of the stack is not 16-bytes aligned, we leave a small offset.

So, I don't know if forcing the alignment at linker level is worth it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't know if forcing the alignment at linker level is worth it

Right, I'm not sure it's worth it either. Just wanted to provide some options for how we could do it, if we wanted to.

rmsyn
rmsyn previously approved these changes Jun 13, 2025
@romancardenas
Copy link
Contributor

@janderholm , do you want to try the new approach of dividing equally the stack? And remember to include your changes to CHANGELOG.md

For example:

4 cores, each with 1KB hart_stack_size, should be able to fit a stack
with a size of 4KB exactly. I.e 4 = (3 + 1) * 1
@janderholm
Copy link
Contributor Author

I have updated the PR according to the comments. As far as I can tell the numbers look correct but I don't have any multi hart RISC-V devices to test with.

@romancardenas romancardenas added this pull request to the merge queue Jun 19, 2025
Merged via the queue into rust-embedded:master with commit 4235ddb Jun 19, 2025
138 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants