-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
The file size I'm using is about 200MB. I've already made this work by dumping everything into memory, but I want to use streams and put some custom logic into the parsing step to speed it up.
Doing this with streams is easy enough in Node:
import fs from 'fs'
import zlib from 'zlib'
import papaparse from 'papaparse'
const path = "/path/to/file.csv.gz"
const data = [];
const parser = papaparse.parse(papaparse.NODE_STREAM_INPUT)
parser.on('data', (chunk) => {
data.push(chunk);
}
);
parser.on('end', () => {
console.log(data)
}
);
fs.createReadStream(filePath)
.pipe(zlib.createGunzip())
.pipe(parser);
To do the same in the browser (as far as I can tell), you need to turn an <input>
file into a stream, push it through a DecompressionStream('gzip').writable
and then push that into a stream capable papaparse
parser. So I have this:
const fileInput = document.getElementById('selectFileBtn');
async function parseGzippedCsv(file) {
const parser = papaparse.parse(papaparse.NODE_STREAM_INPUT)
parser.on('data', (chunk) => {
data.push(chunk);
}
);
parser.on('end', () => {
// I actually need to wrap in a Promise to make this work, but it fails before it gets here
resolve(data);
}
);
file.stream()
.pipeTo(new DecompressionStream('gzip').writable)
.pipeTo(parser);
}
fileInput.addEventListener('change', async function(e) {
const file = e.target.files[0];
const data = await parseGzippedCsv(file);
});
The error is: TypeError: Cannot read properties of null (reading 'stream')
on the line that creates the papaparse
parser, so maybe I can't use papaparse.NODE_STREAM_INPUT
in the browser...?
I've also tried to do something similar with csv-parse
without success. I'm a bit surprised nobody else wants to do this in the browser 🤔