Questions about zil+zio layer #17232
-
Hello, When computing the ZIL block checksum ( using I understand there are pipeline stalls to ensure that child nodes complete before their parents (e.g., VDEV I/O start/done/assess). However, it is unclear whether the checksum computation respects this ordering (has pipeline stalls). The reason I am asking is that I am building a more robust ZIL-chain where each ZIL-block depends on the checksum of the previous block. I am implementing that using a structure for global state in the |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Hello, this is a friendly reminder :) |
Beta Was this translation helpful? Give feedback.
-
Hi Dimitra, IIRC ZIL code itself does not serialize its ZIOs, issuing all of them it has as soon as it can. But it heavily relies on general ZIO pipeline rules. For example, in case of indirect ZIL write LWB ZIO must not start its processing until all the data blocks it references are compressed, encrypted, checksumed and allocated, otherwise it won't have the block pointer to include into the LWB. This wait is implemented by When we are talking about two consecutive LWBs, formally they should not depend on each-other for data, so it would be great if they were checksumed in parallel, not only issued to the disks. But I suspect they follow the same rules as any other children and parents, which means the following (parent) LWB ZIO will wait at But I am not happy about this observation, and rather than fixate it that way by introducing the mentioned chaining, I'd honestly prefer the dependency to be disabled somehow to parallelize the processing. Otherwise ZIL write throughput is limited to a checksuming throughput of single CPU, which may be a problem in some configurations. Previously before my work around 2.2 we were limited by memory copy throughput of single CPU (which is not brilliant sometimes), when LWB population was done under ZIL lock. I've fixed the locking back then, so we no longer have that lock contention. But looking on this now, I suspect this ZIO serialization might be the next throughput limitation of synchronous writes, unless I am wrong somehow in all the above. |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks much for the reply -- some minor questions (also I am using the version with tag: zfs-2.2.4) (1)
Can you clarify what is the difference between: (2) I clearly see in the ZIL codebase that when the Copying the explanatory comment from the code in the file:
Is my understanding here correct in your opinion? |
Beta Was this translation helpful? Give feedback.
As you could guess, the first is used for dedup writes. It is similar to normal write up to the checksum stage, but then may skip physical write, replacing it with DDT update.
The order they are listed in the macros does not matter. The stages are always executed according to their order in
enum zio_stage
.