-
Notifications
You must be signed in to change notification settings - Fork 7
No-std support? #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There has been a quick discussion on Both the language and my skills have improved since however, and I think these days, given an explanation of how the For your particular use case, though, I wonder if |
Ring buffers is a fair point. Though if you want to avoid copying data in or out of the ring buffer (doing DMA and processing directly on the data in the buffer) it seems difficult to me to discard old data if the consumer ends up lagging. The opposite: applying back pressure, is trivial. The other aspect (which I didn't think to mention) is that I need to process in fixed chunks since I'm doing FFT on the sound data that I'm reading in. And the third aspect that I should have perhaps clarified is that I don't have much ram. So I need to keep the number of buffers down. At that point, what is the difference between triple buffering and a ring buffer with three buffers that prefers discarding old values? I think I could do this would three statically allocated buffers in memory that supports DMA (that's right, only some of the RAM I have available actually works for DMA) and an pair of atomic pointer sized values. That would be the low level unsafe approach though. I need to sit down and figure such an implementation out to know for sure. The ideal API for me would be to let me provide my own buffers and the crate just handles the tricky transfer and juggling of three |
Thanks for clarifying! If you want to access data in place, then triple-buffer's synchronization protocol does indeed sound like a better starting point to me, because it gives you exclusive access to the current read buffer, which means that you can access it using normal memory accesses. That's actually the main way in which triple-buffer's synchronization protocol differs from that of nonblocking ring buffers that continuously overwrite old contents without waiting for the reader, like my rt-history crate does. In those ring buffers, the writer thread may concurrently overwrite the ring elements that you are in the process of reading. This will result in UB if you access the ring buffer storage using normal memory reads and writes. To prevent this UB, you must access the ring buffer data in place using special memory operations. For simple inter-thread communication Relaxed atomic reads/writes combined with Acquire/Release fences are enough, but DMA makes life more complicated. I think from the Rust memory model's perspective you would need at least a combination of volatile accesses and Acquire/Release fences to correctly handle it, and then on top of that the CPU µarch will likely require you to take extra steps in order to make sure that you get fresh data from RAM and not a weird mixture of fresh data and stale CPU cache lines... All this to say, you likely don't want to patch your FFT routine so that it takes all these precautions :) Which is why I would never advise anyone to use this sort of ring buffer outside of scenarios where it is possible to simply copy data in and out, treating the ring buffer's inner storage as an encapsulated black box. Now, since we agree that extending
Obviously, I would prefer not to expose these implementation details if I can avoid it. So do you think an encapsulated API design along these lines could work for you? #[link_section = ".whatever.you.want"]
// Shared state is built using encapsulated methods, content is not directly exposed
static SHARED: triple_buffer::SharedState<T> = SharedState::new(...);
// API option 1: Building the input/output is unsafe, by doing so you assert
// that you have only one input interface and one output interface pointing to
// this shared state.
let mut input = unsafe { Input::from_shared_state_unchecked(&SHARED) };
// API option 2: I can make the input/output building safe by tracking which has
// been already built inside of the shared state. I have enough unused bits
// remaining in the shared state's bitfield to do so, no extra storage required.
let mut input = Input::from_shared_state(&SHARED); Since sane apps should not be building triple buffer input/output interfaces in a tight loop, I think API option 2 probably makes more sense, following the general Rust principle of keeping APIs safe when there's not a strong argument against it. |
Yes, I would build the triple buffer structure once at startup. Probably using https://lib.rs/crates/static_cell if the constructing function isn't const. So API 2 should be fine. As a side note: while cache lines are typically 64 bytes on anything remotely modern desktop-ish, not so on embedded. I have seen both 16 and 32 bytes. I don't know if you can auto detect that or not. Since I will be using power of 2 buffers anyway it shouldn't really affect me. I don't actually know what exactly I need to do for DMA here: because the frameworks we use with Rust on embedded handles it for us: I use the rather excellent Embassy framework on embedded. It is an async runtime and it makes life so much easier (not what people usually say about async rust, I know). Basically, things like using interrupts, power states and completion on DMA notifications all gets optimally handled internally, and what the user sees is just awaitable futures. Embedded rust and async work so well together. So I will have one async task (let's call it A) waiting on DMA completing, starting the next DMA and sending the old buffer off to the other task (B). As such I don't think special synchronisation would actually be needed here in triple buffer: that should already happen between embassy and task A. Between A and B it should just work like normal (for cross core communication, Embassy runs a separate single threaded runtime on each core). Task A will share a core with various other tasks (control over WiFi, output LED matrix control, ...), while Task B will get a dedicated core essentially. |
Indeed, implementing something like Generally speaking, a CPU's precise cache layout may only be known at runtime (if known at all). It can even vary over time with SMP scheduling schemes that allow tasks to migrate across heterogeneous cores on big.LITTLE-ish architectures. Because the layout of a Since alas we can't have Regarding DMA, can you point me to the API documentation of one of the |
Creating DMA buffers and doing DMA is a microcontroller specific thing unfortunately. While the Rust community has done a heroic job on trying to create standardised traits things like pins, I2C bus etc (which is way better than in C/C++, where porting to a new micro-controller might mean rewriting everything from scratch), there is still a bunch of things that are not abstracted away since the capabilities vary so much. DMA is one of those. For example: The RP2040 (the chip found on Raspberry Pi Pico) only requires The chip I'm using (ESP32) has a complicated enough setup that they provide helper macros for setup. Looking through the multiple layers of macros I get to The actual API to use these DMA buffers would be what is found in I2sRx in my case (since it is the I2S bus I'm doing DMA on). Looking at it, I'm not sure that API is actually sound, since it does seem to accept buffers not created with the above macros. Hm, I will have to investigate this. This is why I prefer to bring my own buffers. However, there is a saying that everything in computer science can be solved by adding a layer of indirection! I could just use triple_buffer to transfer a |
Overall, I can think about three possible degrees of "bring my own buffers" on the ergonomics vs flexibility tradeoff axis. For maximal flexibility at the expense of rather poor ergonomics, I could extract the index management / synchronization protocol of This option is not ideal from my perspective as a In this scheme, the API would look like this: use triple_buffer::{IndexingSharedState, IndexingInput};
// Set up buffers. I don't know what you are doing here, and I do not need to know.
static BUFFERS: [Buffer; 3] = core::array::from_fn(|_| /* ... allocate a buffer somehow ... */);
// Set up shared state
static SHARED: IndexingSharedState = IndexingSharedState::new();
// Set up input interface
let mut input = IndexingInput::from_shared(&SHARED);
// Query current index of input buffer
let input_idx_1: usize = input.input_buffer_idx();
let input_buffer = &BUFFERS[input_idx_1];
/* TODO: Fill up input_buffer. This will require unsafe. */
// Publish current input buffer after filling it up
input.publish();
// SAFETY: input_idx_1 and input_buffer must not be used after this point
// Query index of new input buffer and move on
let input_idx_2: usize = input.input_buffer_idx();
assert_ne!(input_idx_1, input_idx_2);
let input_buffer = &BUFFERS[input_idx_2]; For optimal ergonomics at the expense of minimal flexibility, we have the option that I discussed so far, where I fully manage the buffer allocation on my side, and the only knob I give you is a way to put some directives like If I understand your last message correctly, this option might work on both RP2040 and ESP32 (via the I2sRx API for the latter), but we're not sure if that's true of all hardware. In this scheme, the API would look like this: use triple_buffer::{SharedState, Input};
// Set up shared state with internal buffers
#[link_section = ".dma.black.magic"]
static SHARED: SharedState<Buffer> = SharedState::new(/* ... initial T value that will be cloned 3 times ... */);
// Set up input interface
let mut input = Input::from_shared(&SHARED);
// Access current input buffer
let input_buffer: &mut Buffer = input.input_buffer_mut();
/* TODO: Fill up input_buffer. No unsafe required. */
// Publish current input buffer after filling it up
input.publish();
// SAFETY: Borrow checker automatically ensures that input_buffer cannot be used
// Query new input buffer and move on
let input_buffer: &mut Buffer = input.input_buffer_mut(); And as a middle ground between ergonomics and flexibility, we can have a scenario where use triple_buffer::{SharedState, Input};
// Set up shared state with internal buffers
// NOTE: No link_section required here, we are not allocating buffers inline
static SHARED: SharedState<BufferDescriptor> = SharedState::with_buffer_builder(|| /* ... allocate one buffer, return a descriptor to it ... */);
// Set up input interface
let mut input = Input::from_shared(&SHARED);
// Access current input buffer
let input_buffer: &mut BufferDescriptor = input.input_buffer_mut();
/* TODO: Fill up input_buffer. May or may not require unsafe depending on how BufferDescriptor works, probably hardware-specific. */
// Publish current input buffer after filling it up
input.publish();
// SAFETY: Borrow checker automatically ensures that input_buffer cannot be used
// Query new input buffer and move on
let input_buffer: &mut BufferDescriptor = input.input_buffer_mut(); As you can see, this third option is almost identical to the previous one from the perspective of Do you agree that these look like the three main API designs that I could provide on the |
I'm on my phone right now (it is almost midnight here), so excuse the crude explanation. What I meant was that I could probably use triple_buffer as it is today (changed to work with no-std) but bring my own struct BufferHandle {
data: /* Pointer or mut ref to my actual buffer */,
/// How much data is actually in buffer, as not every read fills it completely.
size: u16
}
// Raw pointers aren't send, so I might need this if I have to use those.
unsafe impl Send for BufferHandle; At that point, the requirements for the buffer themselves are not something that triple buffer needs to care about. The BufferHandles themselves do need to be allocated somewhere though of course. So it would be nice to do that in a static, or if alloc is used, to have an opt in feature for the nightly allocator API so I can specify where it gets allocated from. |
Okay, I think I can get this working in a way that you can use by making the various structs from triple-buffer generic over the kind of shared state storage (inline vs heap-allocated):
Need to find the time to actually implement this of course, but if you can have a quick look and tell me if that looks sensible on your side, that's much appreciated already. |
Oh, and bonus question: are you fine with me bundling this feature into the major release that I need to do at some point this year in order to move forward with issue #30? I'm not a Rust semver hazard guru, but I think I remember that adding a generic parameter to a struct is not semver-safe in Rust even if the parameter is defaulted, due to the fact that default type parameters are a suboptimally implemented language feature which |
I think that would work, but I'm having a hard time picturing the API in my head. I could test a pre-release version of this though to make sure.
Neither am I, but have you tried cargo-semver-checks? It can catch a lot of semver issues for you. |
Alright, I have pushed a first prototype in the |
That was quick! Unfortunately I won't be. I should have some time to test it this weekend or the weekend after. |
Uh oh!
There was an error while loading. Please reload this page.
Would it be feasible to add optional no-std/no-alloc support to this crate? Or would it be easier to write a separate implementation entirely for my use case in embedded? And is this something you would even be interested in?
Looking at your dependencies crossbeam-utils already supports disabling the std feature, though I don't know if that would remove any API you depend on.
My use case is in embedded for reading in high volume sensor data into buffers with DMA and then having another thread process the data. If I don't manage to keep up with the processing I would rather drop the oldest data than the newest data, and it seems this crate would solve that issue.
Everything will have to be statically allocated, as I don't have alloc, and I need control over which linker section the allocations goes into (which can be done using
#[link_section = ".section.name"]
on the static).I understand if this is out of scope for your crate or if it would be too disruptive to the way it is architected.
The text was updated successfully, but these errors were encountered: