Skip to content

Conversation

drbh
Copy link

@drbh drbh commented Oct 13, 2025

This PR adds an experimental parallel-chunk example binary that is feature gated with a parallel-chunking flag.

The changes include:

  • adding chunk_file_parallel to Chunker
  • multithread chunking logic in parallel_chunking.rs
  • a feature flag for logic and deps
  • a small cli binary to test the feature

Run

build the examples

cargo build --release --features parallel-chunking --example parallel-chunk
cargo build --release --features parallel-chunking --example chunk

chunk a file

target/release/examples/parallel-chunk --input FILE.ABC

the output should be identical to the original chunk binary output

target/release/examples/chunk --input FILE.ABC

Benching

below is a small benchmark script that compares chunk and parallel-chunk runtimes on a generated 1GB input file using hyperfine.

#!/bin/bash

# if user provides a number argument, use it as the size of the file to create
if [ $# -eq 1 ]; then
    N=$1
else
    N=1
fi

# add a N GB file in /tmp if it doesn't exist
if [ ! -f "/tmp/random_${N}.0gb.bin" ]; then
    echo "Creating a ${N}.0gb random file in /tmp..."
    dd if=/dev/urandom of="/tmp/random_${N}.0gb.bin" bs=1M count=$((N * 1024))
fi

file="/tmp/random_${N}.0gb.bin"

echo "Building examples..."
cargo build --release --features parallel-chunking --example parallel-chunk
cargo build --release --features parallel-chunking --example chunk

echo "" 
echo "Reference version... (last 10 lines)"
target/release/examples/chunk --input $file | tail -n 10

echo "" 
echo "Parallel version... (last 10 lines)"
target/release/examples/parallel-chunk --input $file | tail -n 10

echo "" 
echo "Running reference benchmarks..."
hyperfine "target/release/examples/chunk --input $file"

echo "" 
echo "Running parallel benchmarks..."
hyperfine "target/release/examples/parallel-chunk --input $file"
./bench.sh 1 # the number indicates the random file gb size
Building examples...
   Compiling deduplication v0.14.5 (/Users/drbh/Projects/xet-core-tmp/deduplication)
   Compiling cas_object v0.1.0 (/Users/drbh/Projects/xet-core-tmp/cas_object)
   Compiling cas_client v0.14.5 (/Users/drbh/Projects/xet-core-tmp/cas_client)
   Compiling hub_client v0.1.0 (/Users/drbh/Projects/xet-core-tmp/hub_client)
   Compiling data v0.14.5 (/Users/drbh/Projects/xet-core-tmp/data)
    Finished `release` profile [optimized + debuginfo] target(s) in 10.88s
   Compiling data v0.14.5 (/Users/drbh/Projects/xet-core-tmp/data)
    Finished `release` profile [optimized + debuginfo] target(s) in 5.47s

Reference version... (last 10 lines)
fd27a0d46a8d835b52ab6bba351b8f977ac6a31ea26f2e56f558a4def0cf41c0 40657
290e4a68705cbb8ee95274f027bd17356438c4ab94c41b58ee72d22c0d6afb88 93564
5ae15e4077c349647cdc0f9f2272b91fdf6ab50f6ef67763b3c165ec925385e5 55783
8a5e75b8c173d1b466a2a66ba54853c317f8913a544c5d5ab547bff1dec5960f 59859
0b899aad6c918cd5a709c5bd5994c036f37509e8345ac503c2d059e766e06924 131072
4d0c6e8d974da6f53f959a55fbc735a484c09f6ce3dde5410ef505a590d3bbef 111800
02dc025bdb7afd7e0b32c545f1abd70015c728508c3aa6ec3a7cf08a6c52a250 89915
a602df3c6e333fdbe6434158945aa05ec82139bed8741336b128322423005112 29275
234ce19a584bb9996f3a909adada4dff8de8b30a296b8cbb4b31099af09501b3 36264
5090799ec0d6b1dba9a69048e6af617371a1305d98bb12c83659e86dcad1dce0 59004

Parallel version... (last 10 lines)
fd27a0d46a8d835b52ab6bba351b8f977ac6a31ea26f2e56f558a4def0cf41c0 40657
290e4a68705cbb8ee95274f027bd17356438c4ab94c41b58ee72d22c0d6afb88 93564
5ae15e4077c349647cdc0f9f2272b91fdf6ab50f6ef67763b3c165ec925385e5 55783
8a5e75b8c173d1b466a2a66ba54853c317f8913a544c5d5ab547bff1dec5960f 59859
0b899aad6c918cd5a709c5bd5994c036f37509e8345ac503c2d059e766e06924 131072
4d0c6e8d974da6f53f959a55fbc735a484c09f6ce3dde5410ef505a590d3bbef 111800
02dc025bdb7afd7e0b32c545f1abd70015c728508c3aa6ec3a7cf08a6c52a250 89915
a602df3c6e333fdbe6434158945aa05ec82139bed8741336b128322423005112 29275
234ce19a584bb9996f3a909adada4dff8de8b30a296b8cbb4b31099af09501b3 36264
5090799ec0d6b1dba9a69048e6af617371a1305d98bb12c83659e86dcad1dce0 59004

Running reference benchmarks...
Benchmark 1: target/release/examples/chunk --input /tmp/random_1.0gb.bin
  Time (mean ± σ):      1.169 s ±  0.017 s    [User: 1.015 s, System: 0.151 s]
  Range (min … max):    1.149 s …  1.205 s    10 runs


Running parallel benchmarks...
Benchmark 1: target/release/examples/parallel-chunk --input /tmp/random_1.0gb.bin
  Time (mean ± σ):     217.4 ms ±  10.4 ms    [User: 1317.2 ms, System: 264.6 ms]
  Range (min … max):   201.7 ms … 234.3 ms    14 runs

**run on a Macbook M3 Max

As shown in the benches above. the user time (~compute time) is roughly the same in both cases but the wall clock time is >5x faster when the work can be distributed across multiple threads.

PS: the chunked boundaries and hash values appear to exactly match the reference in all cases when testing locally - however more tests to ensure correctness would increase confidence.

Opening this PR as a draft since it adds a new file, dependencies and a feature flag which may need to be refactored/changed. Looking forward to feedback and please let me know what needs to be updated/changed to complete this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant