-
Notifications
You must be signed in to change notification settings - Fork 19
feat(decoder): Add support for loading Zstandard-compressed text files (resolves #272). #368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 7 commits
e2fbeee
ef95eda
2ddcdae
50aa871
680fb22
629b5e5
bf412c8
6b669d8
6ef328f
928f5f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Henry8192 marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| import { | ||
| decompress, | ||
| init, | ||
| } from "@bokuweb/zstd-wasm"; | ||
|
|
||
| import PlainTextDecoder from "../PlainTextDecoder"; | ||
|
|
||
|
|
||
| class ZstdDecoder extends PlainTextDecoder { | ||
| private constructor (logArray: Uint8Array) { | ||
| super(logArray); | ||
| } | ||
|
|
||
Henry8192 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| static override async create (dataArray: Uint8Array) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add missing DecoderOptions parameter. The method signature is missing the Apply this diff to add the missing parameter: - static override async create (dataArray: Uint8Array) {
+ static override async create (dataArray: Uint8Array, _decoderOptions: DecoderOptions) {You'll also need to import the +import {DecoderOptions} from "../../../typings/decoders";
+
import PlainTextDecoder from "../PlainTextDecoder";🤖 Prompt for AI Agents |
||
| await init(); | ||
| const logArrayBuffer = decompress(dataArray); | ||
Henry8192 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| return new ZstdDecoder(logArrayBuffer); | ||
| } | ||
| } | ||
|
|
||
| export default ZstdDecoder; | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -2,6 +2,7 @@ import ClpIrDecoder from "../services/decoders/ClpIrDecoder"; | |||||
| import {CLP_IR_STREAM_TYPE} from "../services/decoders/ClpIrDecoder/utils"; | ||||||
| import JsonlDecoder from "../services/decoders/JsonlDecoder"; | ||||||
| import PlainTextDecoder from "../services/decoders/PlainTextDecoder"; | ||||||
| import ZstdDecoder from "../services/decoders/ZstdDecoder"; | ||||||
| import {Decoder} from "./decoders"; | ||||||
|
|
||||||
|
|
||||||
|
|
@@ -21,7 +22,8 @@ interface FileTypeInfo { | |||||
| * Represents a file type with its identifying properties and decoder. | ||||||
| */ | ||||||
| interface FileTypeDef { | ||||||
| DecoderFactory: typeof ClpIrDecoder | typeof JsonlDecoder | typeof PlainTextDecoder; | ||||||
| DecoderFactory: typeof ClpIrDecoder | typeof JsonlDecoder | | ||||||
| typeof ZstdDecoder | typeof PlainTextDecoder; | ||||||
|
Comment on lines
+25
to
+26
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Avoid ever-growing unions for DecoderFactory; use a structural static type. Every new decoder requires editing this union. Define a structural type for the static side (create()) and reference that here to improve maintainability. Apply this diff within the selected lines: - DecoderFactory: typeof ClpIrDecoder | typeof JsonlDecoder |
- typeof ZstdDecoder | typeof PlainTextDecoder;
+ DecoderFactory: DecoderStatic;Add this type alias near the top of the file (e.g., above interface FileTypeDef): type DecoderStatic = {
// Matches the static factory shape used by decodeUtils
create: (dataArray: Uint8Array) => Promise<Decoder>;
};🤖 Prompt for AI Agents |
||||||
|
|
||||||
| checkIsStructured: (decoder: Decoder) => FileTypeInfo["isStructured"]; | ||||||
| extensions: FileTypeInfo["extension"][]; | ||||||
|
|
@@ -46,6 +48,13 @@ const FILE_TYPE_DEFINITIONS: FileTypeDef[] = [ | |||||
| name: "JSON Lines", | ||||||
| signature: ["{".charCodeAt(0)], | ||||||
| }, | ||||||
| { | ||||||
| DecoderFactory: ZstdDecoder, | ||||||
| checkIsStructured: () => false, | ||||||
| extensions: [".zst", ".zstd"], | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's a good mindset to brainstorm different file extensions, though in the IANA registry only
Suggested change
|
||||||
| name: "Zstandard", | ||||||
| signature: [0x28, 0xb5, 0x2f, 0xfd], | ||||||
| }, | ||||||
|
Comment on lines
+51
to
+57
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) DRY the Zstd magic bytes into a shared constant to prevent drift. Both CLP IR and Zstandard rely on the same magic bytes. Centralise them to avoid subtle mismatches later. Apply within the selected lines: - signature: [0x28, 0xb5, 0x2f, 0xfd],
+ signature: ZSTD_MAGIC,Add this near the top of the file (close to other typings) and use it for the CLP IR entry as well: const ZSTD_MAGIC = [0x28, 0xb5, 0x2f, 0xfd] as const;🤖 Prompt for AI Agents💡 Verification agent 🧩 Analysis chainGuard against decompression bombs; confirm streaming/lazy WASM init. Zstandard content can expand massively. Ensure ZstdDecoder enforces a sane max-out size or uses streaming, and lazily imports the WASM so non-Zstd users don’t pay the startup cost. Run: If limits/streaming aren’t present, I can draft a guard that rejects when the projected decompressed size exceeds a threshold and documents the UX. 🏁 Script executed: #!/bin/bash
# ZstdDecoder should import the WASM lazily (dynamic import) inside create().
rg -nP -C3 '@bokuweb/zstd-wasm' src/services/decoders/ZstdDecoder
rg -nP -C3 'import\(\s*[\'"]@bokuweb/zstd-wasm[\'"]\s*\)' src/services/decoders/ZstdDecoder
# Check for size limits / streaming usage.
rg -nP -C3 '(max|limit|cap).*bytes' src/services/decoders/ZstdDecoder
rg -nP -C3 'decompress(Stream|ing)?' src/services/decoders/ZstdDecoder
rg -nP -C3 '(AbortSignal|controller|cancel|abort)' src/services/decoders/ZstdDecoder
# Confirm bundler config excludes the WASM from pre-bundling (as mentioned in the PR).
rg -nP -C3 'optimizeDeps\s*:\s*\{[^}]*exclude[^}]*@bokuweb/zstd-wasm' vite.config.tsLength of output: 751 This will confirm whether the decoder still uses a static import, lacks size guards, and if the Vite config properly excludes the Zstd WASM from eager bundling. 🏁 Script executed: #!/bin/bash
# Show static imports from @bokuweb/zstd-wasm in ZstdDecoder
rg -n "@bokuweb/zstd-wasm" -C3 src/services/decoders/ZstdDecoder/index.ts || true
# Check for any dynamic imports (should be none if static import present)
rg -n "import(" -C3 src/services/decoders/ZstdDecoder/index.ts || true
# Inspect the create() method to see where WASM is initialized
rg -n "create" -C3 src/services/decoders/ZstdDecoder/index.ts || true
# Look for any cap/limit/max guards in the decompression logic
rg -nE "(limit|max|cap)" -C3 src/services/decoders/ZstdDecoder/index.ts || true
# Search for streaming or chunked decompress calls
rg -n "stream" -C3 src/services/decoders/ZstdDecoder/index.ts || true
# Check for AbortSignal/controller usage in decoder (to allow cancellation)
rg -nE "(AbortSignal|controller|cancel|abort)" -C3 src/services/decoders/ZstdDecoder/index.ts || true
# Locate vite config file(s) to verify exclude of @bokuweb/zstd-wasm
rg -l "optimizeDeps" -g "vite.config.*" || echo "No vite.config.* found"
rg -n "optimizeDeps" -C3 vite.config.ts vite.config.js || true
rg -n "exclude" -C3 vite.config.ts vite.config.js || trueLength of output: 2080 ZstdDecoder eagerly loads WASM and lacks decompression safeguards Our verification shows the decoder still statically imports
Recommendations to address both performance and safety concerns:
These changes are critical to prevent decompression-bomb attacks and optimize startup performance for users who never decode Zstandard payloads. 🤖 Prompt for AI Agents |
||||||
| { | ||||||
| DecoderFactory: PlainTextDecoder, | ||||||
| checkIsStructured: () => false, | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems some dependencies are getting updated / removed unexpectedly. can we