diff --git a/README.md b/README.md index 9fcd4c6f..bafce32b 100644 --- a/README.md +++ b/README.md @@ -36,6 +36,7 @@ This module provides the following parsers: * [Raw body parser](#bodyparserrawoptions) * [Text body parser](#bodyparsertextoptions) * [URL-encoded form body parser](#bodyparserurlencodedoptions) + * [Generic body parser](#bodyparsergenericoptions) Other body parsers you might be interested in: @@ -295,6 +296,84 @@ form. Defaults to `false`. The `depth` option is used to configure the maximum depth of the `qs` library when `extended` is `true`. This allows you to limit the amount of keys that are parsed and can be useful to prevent certain types of abuse. Defaults to `32`. It is recommended to keep this value as low as possible. +### bodyParser.generic([options]) + +Returns middleware that enables you to create custom body parsers while inheriting all the common functionality from body-parser. This is the recommended approach for building custom parsers for content types not covered by the built-in parsers. + +The generic parser handles all the common concerns like: +- Content-type matching +- Charset detection and handling +- Body size limits and validation +- Decompression (gzip, brotli, deflate) +- Error standardization +- Buffer handling + +You only need to provide the type to match and the parsing logic specific to your content type. + +A new `body` object containing the parsed data is populated on the `request` object after the middleware (i.e. `req.body`). The structure of this object depends on what your custom parse function returns. + +See [Custom parser examples](#custom-parser-examples) for examples of creating your own parsers based on the generic parser. + +#### Options + +The `generic` function takes an `options` object that may contain any of the following keys: + +##### parse + +**Required.** The `parse` function that converts the raw request body buffer into a JavaScript object. This function receives two arguments: + +```js +function parse (buffer, charset) { + // Convert Buffer to parsed object + return parsedObject +} +``` + +- `buffer`: A Buffer containing the raw request body +- `charset`: The detected charset from Content-Type header or defaultCharset +- Return value becomes `req.body` +- **IMPORTANT**: This function MUST be synchronous and return the parsed result directly + - It cannot be an `async` function + - It cannot return a Promise + +Your parse function will be called even for empty bodies (with a zero-length buffer), but not for requests with no body concept (like GET requests). + +For empty bodies, consider following these conventions: +- For JSON-like parsers: Return `{}` (empty object) +- For text/raw-like parsers: Return the empty buffer/string as-is + +##### type + +**Required.** The `type` option is used to determine what media type the middleware will parse. This option can be: + +- A string mime type (like `application/xml`) +- An extension name (like `xml`) +- A mime type with a wildcard (like `*/xml`) +- An array of any of the above +- A function that takes a request and returns a boolean + +If not a function, the `type` option is passed directly to the [type-is](https://www.npmjs.org/package/type-is#readme) library. If a function, it will be called as `fn(req)` and the request is parsed if it returns a truthy value. + +##### defaultCharset + +Specify the default character set for the content if the charset is not specified in the `Content-Type` header of the request. Defaults to `utf-8`. + +##### charset + +If specified, the charset of the request must match this option. If the request charset doesn't match, a 415 Unsupported Media Type error is returned. + +##### inflate + +When set to `true`, then deflated (compressed) bodies will be inflated; when `false`, deflated bodies are rejected. Defaults to `true`. + +##### limit + +Controls the maximum request body size. If this is a number, then the value specifies the number of bytes; if it is a string, the value is passed to the [bytes](https://www.npmjs.com/package/bytes) library for parsing. Defaults to `'100kb'`. + +##### verify + +The `verify` option, if supplied, is called as `verify(req, res, buf, encoding)`, where `buf` is a `Buffer` of the raw request body and `encoding` is the encoding of the request. The parsing can be aborted by throwing an error. + ## Errors The middlewares provided by this module create errors using the @@ -474,6 +553,143 @@ app.use(bodyParser.raw({ type: 'application/vnd.custom-type' })) app.use(bodyParser.text({ type: 'text/html' })) ``` +### Custom parser examples + +These examples demonstrate how to create parsers for custom content types while leveraging the generic parser for common HTTP concerns. + +#### XML Parser Example + +Create a custom middleware for parsing XML requests: + +```js +const express = require('express') +const bodyParser = require('body-parser') +const xmljs = require('xml-js') + +const app = express() + +// Create XML parser middleware +const xmlParser = bodyParser.generic({ + // Accept both application/xml and text/xml + type: ['application/xml', 'text/xml', '+xml'], + + // Set limits to prevent abuse + limit: '500kb', + + parse: function (buf, charset) { + // Handle empty body case + if (buf.length === 0) return {} + + try { + const result = xmljs.xml2js(buf.toString(charset), { + compact: true, + trim: true, + nativeType: true + }) + return result + } catch (err) { + const error = new Error(`Invalid XML: ${err.message}`) + error.status = 400 + throw error + } + } +}) + +// Use parser in routes +app.post('/api/xml', xmlParser, function (req, res) { + res.json(req.body) +}) +``` + + +#### Creating Your Own Parser Module + +You can also create your own reusable parser module similar to the built-in parsers: + +```js +const bodyParser = require('body-parser') + +// Create a factory function for CSV parsing middleware +function csvParser (options) { + const opts = options || {} + + const delimiter = opts.delimiter || ',' + const hasHeaders = opts.hasHeaders !== false + + return bodyParser.generic({ + type: opts.type || ['text/csv', 'application/csv'], + limit: opts.limit, + inflate: opts.inflate, + verify: opts.verify, + + parse: function (buf, charset) { + // Handle empty body + if (buf.length === 0) return { rows: [] } + + try { + const csvText = buf.toString(charset) + const lines = csvText.split(/\r?\n/).filter(line => line.trim()) + + if (lines.length === 0) return { rows: [] } + + if (hasHeaders) { + const headers = lines[0].split(delimiter) + const rows = lines.slice(1).map(line => { + const values = line.split(delimiter) + const row = {} + + headers.forEach((header, i) => { + row[header] = values[i] + }) + + return row + }) + + return { headers, rows } + } else { + const rows = lines.map(line => line.split(delimiter)) + return { rows } + } + } catch (err) { + const error = new Error(`CSV parse error: ${err.message}`) + error.status = 400 + throw error + } + } + }) +} + +module.exports = csvParser +``` + +Using the custom parser module: + +```js +const express = require('express') +const csvParser = require('./csv-parser') + +const app = express() + +// Use with default options +app.post('/api/csv', csvParser(), function (req, res) { + res.json({ + rowCount: req.body.rows.length, + data: req.body + }) +}) + +// Or with custom options +app.post('/api/customcsv', csvParser({ + limit: '250kb', + delimiter: ';', + hasHeaders: true +}), function (req, res) { + res.json(req.body) +}) +``` + +This pattern makes it easy to create reusable, configurable custom parsers that follow the same interface as the built-in parsers. + ## License [MIT](LICENSE) @@ -488,4 +704,4 @@ app.use(bodyParser.text({ type: 'text/html' })) [npm-url]: https://npmjs.org/package/body-parser [npm-version-image]: https://badgen.net/npm/v/body-parser [ossf-scorecard-badge]: https://api.scorecard.dev/projects/github.com/expressjs/body-parser/badge -[ossf-scorecard-visualizer]: https://ossf.github.io/scorecard-visualizer/#/projects/github.com/expressjs/body-parser \ No newline at end of file +[ossf-scorecard-visualizer]: https://ossf.github.io/scorecard-visualizer/#/projects/github.com/expressjs/body-parser diff --git a/index.js b/index.js index d722d0b2..b08b51b7 100644 --- a/index.js +++ b/index.js @@ -13,6 +13,7 @@ * @property {function} raw * @property {function} text * @property {function} urlencoded + * @property {function} generic */ /** @@ -66,6 +67,17 @@ Object.defineProperty(exports, 'urlencoded', { get: () => require('./lib/types/urlencoded') }) +/** + * Generic parser for custom body formats. + * @public + */ + +Object.defineProperty(exports, 'generic', { + configurable: true, + enumerable: true, + get: () => require('./lib/generic') +}) + /** * Create a middleware to parse json and urlencoded bodies. * diff --git a/lib/generic.js b/lib/generic.js new file mode 100644 index 00000000..9d1a390b --- /dev/null +++ b/lib/generic.js @@ -0,0 +1,153 @@ +/*! + * body-parser + * Copyright(c) 2014 Jonathan Ong + * Copyright(c) 2014-2015 Douglas Christopher Wilson + * MIT Licensed + */ + +'use strict' + +/** + * Module dependencies. + * @private + */ + +const contentType = require('content-type') +const createError = require('http-errors') +const debug = require('debug')('body-parser:generic') +const { isFinished } = require('on-finished') +const read = require('./read') +const typeis = require('type-is') +const utils = require('./utils') + +/** + * Module exports. + * @public + */ + +module.exports = generic + +/** + * Create a middleware to parse request bodies. + * + * @param {Object} [options] + * @param {Function} [options.parse] Function to parse body (required). This function: + * - Receives (buffer, charset) as arguments + * - Must be synchronous (cannot be async or return a Promise) + * - Will be called for requests with empty bodies (zero-length buffer) + * - Will NOT be called for requests with no body at all (e.g., typical GET requests) + * - Return value becomes req.body + * @param {String|Function} [options.type] Request content-type to match (required) + * @param {String|Number} [options.limit] Maximum request body size + * @param {Boolean} [options.inflate] Enable handling compressed bodies + * @param {Function} [options.verify] Verify body content + * @param {String} [options.defaultCharset] Default charset when not specified + * @param {String} [options.charset] Expected charset (will respond with 415 if not matched) + * @return {Function} middleware + * @public + */ + +function generic (options) { + // === STEP 0: VALIDATE OPTIONS === + const opts = options || {} + + if (typeof opts.parse !== 'function') { + throw new TypeError('option parse must be a function') + } + + // For generic parser, type is a required option + if (opts.type === undefined) { + throw new TypeError('option type must be specified for generic parser') + } + + // === CONFIGURE PARSER OPTIONS === + const defaultCharset = opts.defaultCharset || 'utf-8' + + // Use the common options normalization function + const { inflate, limit, verify, shouldParse } = utils.normalizeOptions(opts, opts.type) + + debug('creating parser with options %j', { + limit, + inflate, + defaultCharset + }) + + return function genericParser (req, res, next) { + // === STEP 1: REQUEST EVALUATION === + if (isFinished(req)) { + debug('request already finished') + next() + return + } + + // Initialize body property if not exists + if (!('body' in req)) { + debug('initializing body property') + req.body = undefined + } + + // Skip empty bodies + if (!typeis.hasBody(req)) { + debug('skip empty body') + next() + return + } + + // === STEP 2: CONTENT TYPE MATCHING === + debug('content-type %j', req.headers['content-type']) + + if (!shouldParse(req)) { + debug('skip parsing: content-type mismatch') + next() + return + } + + // === STEP 3: CHARSET DETECTION === + let charset + try { + const ct = contentType.parse(req) + charset = (ct.parameters.charset || defaultCharset).toLowerCase() + debug('charset %s', charset) + } catch (err) { + debug('charset error: %s', err.message) + charset = defaultCharset + } + + // Check if charset is supported + if (opts.charset !== undefined && opts.charset !== charset) { + debug('unsupported charset %s (expecting %s)', charset, opts.charset) + next(createError(415, 'unsupported charset "' + charset.toUpperCase() + '"', { + charset: charset, + type: 'charset.unsupported' + })) + return + } + + // === STEP 4 & 5: BODY READING AND PARSING === + // The read function handles the actual body reading + // and passes the result to our parse function + read(req, res, next, function parseBody (buf) { + debug('parse %d byte body', buf.length) + + try { + // Call the parse function + const result = opts.parse(buf, charset) + debug('parsed as %o', result) + return result + } catch (err) { + debug('parse error: %s', err.message) + + throw createError(400, err.message, { + body: buf.toString().substring(0, 100), + charset, + type: 'entity.parse.failed' + }) + } + }, debug, { + encoding: charset, + inflate, + limit, + verify + }) + } +} diff --git a/test/generic.js b/test/generic.js new file mode 100644 index 00000000..1d092050 --- /dev/null +++ b/test/generic.js @@ -0,0 +1,443 @@ +'use strict' + +const assert = require('node:assert') +const http = require('node:http') +const request = require('supertest') + +const { generic } = require('..') + +const PARSERS = { + // Reverses the input string + reverse: (buf, charset) => buf.toString(charset).split('').reverse().join(''), + + json: (buf, charset) => JSON.parse(buf.toString(charset)) +} + +// Tracks if a function was called +function trackCall (fn) { + let called = false + const wrapped = function (...args) { + called = true + return fn?.(...args) + } + wrapped.called = () => called + return wrapped +} + +describe('generic()', function () { + it('should reject without parse function', function () { + assert.throws(function () { + generic() + }, /option parse must be a function/) + }) + + it('should reject without type option', function () { + assert.throws(function () { + generic({ parse: function () {} }) + }, /option type must be specified for generic parser/) + }) + + describe('core functionality', function () { + it('should provide request body to parse function and use result as req.body', function (done) { + const testResult = { parsed: true, id: Date.now() } + const testBody = 'hello parser' + + const parseFn = trackCall(function (body, charset) { + const content = body.toString(charset) + assert.strictEqual(content, testBody, 'should receive request body content') + return testResult + }) + + const server = createServer(parseFn) + + request(server) + .post('/') + .set('Content-Type', 'text/plain') + .send(testBody) + .expect(200) + .end(function (err, res) { + if (err) return done(err) + assert(parseFn.called(), 'parse function should be called') + const body = JSON.parse(res.text) + assert.deepStrictEqual(body, testResult, 'parse result should become req.body') + done() + }) + }) + + describe('request body handling', function () { + it('should call parse function with empty buffer for Content-Length: 0', function (done) { + const parseFn = trackCall(function (buf, _charset) { + assert.strictEqual(buf.length, 0, 'buffer should be empty') + // Return empty object like JSON/URL-encoded parsers do + return { empty: true } + }) + + const server = createServer(parseFn) + + request(server) + .post('/') // Using POST with empty body + .set('Content-Type', 'text/plain') + .set('Content-Length', '0') + .expect(200, '{"empty":true}') + .end(function (err) { + if (err) return done(err) + assert(parseFn.called(), 'parse function should be called for empty body') + done() + }) + }) + + it('should call parse function with empty buffer for chunked encoding', function (done) { + const parseFn = trackCall(function (buf, _charset) { + assert.strictEqual(buf.length, 0, 'buffer should be empty') + // Return empty object like JSON/URL-encoded parsers do + return { empty: true } + }) + + const server = createServer(parseFn) + + request(server) + .post('/') // Using POST with empty body + .set('Content-Type', 'text/plain') + .set('Transfer-Encoding', 'chunked') + .expect(200, '{"empty":true}') + .end(function (err) { + if (err) return done(err) + assert(parseFn.called(), 'parse function should be called for empty body') + done() + }) + }) + + it('should NOT call parse function for requests with no body concept', function (done) { + const parseFn = trackCall(function (buf, _charset) { + return { called: true } + }) + + const server = createServer(parseFn) + + request(server) + .get('/') // GET with no body concept + .expect(200, 'undefined') + .end(function (err) { + if (err) return done(err) + assert.strictEqual(parseFn.called(), false, 'parse function should not be called for no-body requests') + done() + }) + }) + }) + }) + + describe('error handling', function () { + it('should return 400 for parsing errors', function (done) { + const server1 = createServer(function (_buf) { + throw new Error('parse error') + }) + + const server2 = createServer({ + type: 'application/json' + }, PARSERS.json) + + request(server1) + .post('/') + .set('Content-Type', 'text/plain') + .send('hello') + .expect(400, '[entity.parse.failed] parse error') + .end(function (err) { + if (err) return done(err) + + request(server2) + .post('/') + .set('Content-Type', 'application/json') + .send('{"broken": "json') + .expect(400) + .expect(function (res) { + assert(res.text.includes('entity.parse.failed')) + }) + .end(done) + }) + }) + + it('should 413 when body too large', function (done) { + const server = createServer({ limit: '1kb' }, function (buf, charset) { + return buf.toString(charset) + }) + + const largeText = new Array(1024 * 10 + 1).join('x') + + request(server) + .post('/') + .set('Content-Type', 'text/plain') + .send(largeText) + .expect(413, '[entity.too.large] request entity too large', done) + }) + }) + + describe('content-type matching', function () { + it('should match exact content type', function (done) { + const server = createServer({ + type: 'text/markdown' + }, _buf => 'markdown matched') + + request(server) + .post('/') + .set('Content-Type', 'text/markdown') + .send('# heading') + .expect(200, '"markdown matched"', done) + }) + + it('should match custom media type', function (done) { + const server = createServer({ + type: 'application/vnd.custom+plain' + }, function (buf, charset) { + return buf.toString(charset) + }) + + request(server) + .post('/') + .set('Content-Type', 'application/vnd.custom+plain') + .send('custom format') + .expect(200, '"custom format"', done) + }) + + it('should not parse when content-type does not match', function (done) { + const server = createServer({ + type: 'text/markdown' + }, _buf => 'should not be called') + + request(server) + .post('/') + .set('Content-Type', 'text/html') + .send('