Provide an efficient way to decompress a sequence of chunks compressed with ZstdCompressionChunker

My program wants to compress some large cached strings and decompress them later.  I have no particular requirements on the form of the compressed data, so I used ZstdCompressionChunker to do the compression to avoid repeated reallocation of the output buffer.  I would like to process the decompressed data in chunks to reduce peak memory usage.  However there is no obvious efficient way to decompress chunks to chunks:

- The ZstdCompressionChunker round-trip tests all concatenate the chunks with `bytes.join` for one-shot decompression.  (Fine, they're tests.)
- I tried `chain.from_iterable(dctx.read_to_iter(c) for c in chunks)`.  This doesn't work because each `read_to_iter` iterator expects to process a full stream.  (I expected it to hold state in the ZstdDecompressor it was obtained from.)
- ZstdCompressionObj's documentation says it isn't efficient:
  > Because calls to decompress() may need to perform multiple memory (re)allocations, this streaming decompression API isn’t as efficient as other APIs.
- `read_to_iter`'s documentation says
  > read_to_iter() accepts an object with a read(size) method that will return compressed bytes or an object conforming to the buffer protocol.
  
  so I wrote a class with a read method that returns memoryviews over the chunks (to avoid copying slices).  The documentation is grammatically ambiguous; it turns out that `read_to_iter` segfaults (!) when given an object with a read method that returns an object conforming to the buffer protocol that is not exactly `bytes` (reduced test case below).

My feature request is to provide an efficient way to decompress a sequence of chunks compressed with ZstdCompressionChunker (or to document an existing method as the efficient way, if there is one).

---

```
import zstandard as zstd
b = b'AB' * 1000
d = zstd.compress(b)
assert zstd.decompress(memoryview(d)) == b # passes
class Whatever:
    def __init__(self, data):
        self.data = data
    def read(self, size):
        assert len(data) <= size
        return memoryview(self.data)
dctx = zstd.ZstdDecompressor()
assert b''.join(dctx.read_to_iter(Whatever(d))) == b # segfault
```
Segfaults using Arch Linux's python 3.13.2-1 and python-zstandard 0.23.0-2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Provide an efficient way to decompress a sequence of chunks compressed with ZstdCompressionChunker #259

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Provide an efficient way to decompress a sequence of chunks compressed with ZstdCompressionChunker #259

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions