Parse a large uploaded gzipped csv file in the browser

The file size I'm using is about 200MB. I've already made this work by dumping everything into memory, but I want to use streams and put some custom logic into the parsing step to speed it up.

Doing this with streams is easy enough in Node:

```
import fs from 'fs'
import zlib from 'zlib'
import papaparse from 'papaparse'

const path = "/path/to/file.csv.gz"    
const data = [];

const parser = papaparse.parse(papaparse.NODE_STREAM_INPUT)

parser.on('data', (chunk) => {
    data.push(chunk);
    }
);

parser.on('end', () => {
    console.log(data)
    }
);

fs.createReadStream(filePath)
  .pipe(zlib.createGunzip())
  .pipe(parser);
```

To do the same in the browser (as far as I can tell), you need to turn an `<input>` file into a stream, push it through a `DecompressionStream('gzip').writable` and then push that into a stream capable `papaparse` parser. So I have this:

```
const fileInput = document.getElementById('selectFileBtn');

async function parseGzippedCsv(file) {
  const parser = papaparse.parse(papaparse.NODE_STREAM_INPUT)
  parser.on('data', (chunk) => {
      data.push(chunk);
      }
  );

  parser.on('end', () => {
      // I actually need to wrap in a Promise to make this work, but it fails before it gets here
      resolve(data);
    }
  );
  
  file.stream()        
    .pipeTo(new DecompressionStream('gzip').writable)
    .pipeTo(parser);
}

fileInput.addEventListener('change', async function(e) {
  const file = e.target.files[0];
  const data = await parseGzippedCsv(file);
});
```

The error is: `TypeError: Cannot read properties of null (reading 'stream')` on the line that creates the `papaparse` parser, so maybe I can't use `papaparse.NODE_STREAM_INPUT` in the browser...? 

I've also tried to do something similar with `csv-parse` without success. I'm a bit surprised nobody else wants to do this in the browser 🤔  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parse a large uploaded gzipped csv file in the browser #1074

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Parse a large uploaded gzipped csv file in the browser #1074

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions