Hi,
After inspecting the source and disassembly, ib_write_lat’s BlueFlame path on AArch64 writes a 64B WQE into the device using function mmio_memcpy_x64, which is vst4q_u64() and is "ST4 instruction" in asm level.
On ARMv8-A, ST4 does not provide 64-byte atomic MMIO. Writes can be split into smaller beats, and arrival order at the device may not be strictly sequential.
Question:
Is using ST4 to post a 64B BlueFlame WQE on AArch64 correct by design? Does the BlueFlame hardware explicitly tolerate non-atomic, potentially out-of-order DWORD writes and reassemble based on DS, or is there any assumed CPU/RC guarantee?
If this is device-tolerant, could you confirm the exact contract the code relies on (and, if possible, point to the PRM section stating it)?
Thanks.