Skip to content

Commit 09ea310

Browse files
authored
Merge pull request #10 from spyoungtech/more_benches
add new benchmark
2 parents e86372a + 650dd4d commit 09ea310

File tree

6 files changed

+227
-189
lines changed

6 files changed

+227
-189
lines changed

Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ regex = "1"
3131
name = "bench_compare"
3232
harness = false
3333

34+
[[bench]]
35+
name = "bench_deserialize"
36+
harness = false
37+
3438
[[example]]
3539
crate-type = ["bin"]
3640
path = "examples/json5-doublequote-fixer/src/main.rs"

README.md

Lines changed: 110 additions & 168 deletions
Original file line numberDiff line numberDiff line change
@@ -33,195 +33,137 @@ fn main() {
3333
{
3434
name: 'Hello',
3535
count: 42,
36-
maybe: NaN
36+
maybe: null
3737
}
3838
"#;
3939

4040
let parsed = from_str::<MyData>(source).unwrap();
41-
let expected = MyData {name: "Hello".to_string(), count: 42, maybe: Some(NaN)}
42-
assert_eq!(parsed, expected)
41+
let expected = MyData {name: "Hello".to_string(), count: 42, maybe: None};
42+
assert_eq!(parsed, expected);
4343
}
4444
```
45-
## Examples
46-
47-
See the `examples/` directory for examples of programs that utilize round-tripping features.
48-
49-
- `examples/json5-doublequote-fixer` gives an example of tokenization-based round-tripping edits
50-
- `examples/json5-trailing-comma-formatter` gives an example of model-based round-tripping edits
51-
52-
## Benchmarking
53-
54-
Benchmarks are available in the `benches/` directory. Test data is in the `data/` directory. A couple of benchmarks use
55-
big files that are not committed to this repo. So run `./data/setupdata.sh` to download the required data files
56-
so that you don't skip the big benchmarks. The benchmarks compare `json_five` (this crate) to
57-
[serde_json](https://github.com/serde-rs/json) and [json5-rs](https://github.com/callum-oakley/json5-rs).
58-
59-
Notwithstanding the general caveats of benchmarks, in initial testing, `json_five` outperforms `json5-rs`.
60-
In typical scenarios: 3-4x performance, it seems. At time of writing (pre- v0) no performance optimizations have been done. I
61-
expect performance to improve, if at least marginally, in the future.
62-
63-
These benchmarks were run on Windows on an i9-10900K. This table won't be updated unless significant changes happen.
6445

65-
| test | json_five | serde_json | json5 |
66-
|--------------------|---------------|---------------|---------------|
67-
| big (25MB) | 580.31 ms | 150.39 ms | 3.0861 s |
68-
| medium-ascii (5MB) | 199.88 ms | 59.008 ms | 706.94 ms |
69-
| empty | 228.62 ns | 38.786 ns | 708.00 ns |
70-
| arrays | 578.24 ns | 100.95 ns | 1.3228 µs |
71-
| objects | 922.91 ns | 205.75 ns | 2.0748 µs |
72-
| nested-array | 22.990 µs | 5.0483 µs | 29.356 µs |
73-
| nested-objects | 50.659 µs | 14.755 µs | 132.75 µs |
74-
| string | 421.17 ns | 91.051 ns | 3.5691 µs |
75-
| number | 238.75 ns | 36.179 ns | 779.13 ns |
76-
77-
78-
79-
# Round-trip model
80-
81-
The `rt` module contains the round-trip parser. This is intended to be ergonomic for round-trip use cases, although
82-
it is still very possible to use the default parser (which is more performance-oriented) for certain round-trip use cases.
83-
The round-trip AST model produced by the round-trip parser includes additional `context` fields that describe the whitespace, comments,
84-
and (where applicable) trailing commas on each production. Moreover, unlike the default parser, the AST consists
85-
entirely of owned types, allowing for simplified in-place editing.
86-
87-
88-
The `context` field holds a single field struct that contains the field `wsc` (meaning 'white space and comments')
89-
which holds a tuple of `String`s that represent the contextual whitespace and comments. The last element in
90-
the `wsc` tuple in the `context` of `JSONArrayValue` and `JSONKeyValuePair` objects is an `Option<String>` -- which
91-
is used as a marker to indicate an optional trailing comma and any whitespace that may follow that optional comma.
92-
93-
The `context` field is always an `Option`.
94-
95-
Contexts are associated with the following structs (which correspond to the JSON5 productions) and their context layout:
96-
97-
## `rt::parser::JSONText`
98-
99-
Represents the top-level Text production of a JSON5 document. It consists solely of a single (required) value.
100-
It may have whitespace/comments before or after the value. The `value` field contains any `JSONValue` and the `context`
101-
field contains the context struct containing the `wsc` field, a two-length tuple that describes the whitespace before and after the value.
102-
In other words: `{ wsc.0 } value { wsc.1 }`
46+
Serializing also works in the usual way. The re-exported `to_string` function comes from the `ser` module and works
47+
how you'd expect with default formatting.
10348

10449
```rust
105-
use json_five::rt::parser::from_str;
106-
use json_five::rt::parser::JSONValue;
107-
108-
let doc = from_str(" 'foo'\n").unwrap();
109-
let context = doc.context.unwrap();
110-
111-
assert_eq!(&context.wsc.0, " ");
112-
assert_eq!(doc.value, JSONValue::SingleQuotedString("foo".to_string()));
113-
assert_eq!(&context.wsc.1, "\n");
50+
use serde::Serialize;
51+
use json_five::to_string;
52+
#[derive(Serialize)]
53+
struct Test {
54+
int: u32,
55+
seq: Vec<&'static str>,
56+
}
57+
let test = Test {
58+
int: 1,
59+
seq: vec!["a", "b"],
60+
};
61+
let expected = r#"{"int": 1, "seq": ["a", "b"]}"#;
62+
assert_eq!(to_string(&test).unwrap(), expected);
11463
```
11564

65+
You may also use the `to_string_formatted` with a `FormatConfiguration` to control the output format, including
66+
indentation, trailing commas, and key/item separators.
11667

117-
## `rt::parser::JSONValue::JSONObject`
118-
119-
Member of the `rt::parser::JSONValue` enum representing [JSON5 objects](https://spec.json5.org/#objects).
120-
121-
There are two fields: `key_value_pairs`, which is a `Vec` of `JSONKeyValuePair`s, and `context` whose `wsc` is
122-
a one-length tuple containing the whitespace/comments that occur after the opening brace. In non-empty objects,
123-
the whitespace that precedes the closing brace is part of the last item in the `key_value_pairs` Vec.
124-
In other words: `LBRACE { wsc.0 } [ key_value_pairs ] RBRACE`
125-
and: `.context.wsc: (String,)`
126-
127-
### `rt::parser::KeyValuePair`
128-
129-
The `KeyValuePair` struct represents the ['JSON5Member' production](https://spec.json5.org/#prod-JSON5Member).
130-
It has three fields: `key`, `value`, and `context`. The `key` is a `JSONValue`, in practice limited to `JSONValue::Identifier`,
131-
`JSONValue::DoubleQuotedString` or a `JSONValue::SingleQuotedString`. The `value` is any `JSONValue`.
132-
133-
Its context describes whitespace/comments that are between the key
134-
and `:`, between the `:` and the value, after the value, and (optionally) a trailing comma and whitespace trailing the
135-
comma.
136-
In other words, roughly: `key { wsc.0 } COLON { wsc.1 } value { wsc.2 } [ COMMA { wsc.3 } [ next_key_value_pair ] ]`
137-
and: `.context.wsc: (String, String, String, Option<String>)`
138-
139-
When `context.wsc.3` is `Some()`, it indicates the presence of a trailing comma (not included in the string) and
140-
whitespace that follows the comma. This item MUST be `Some()` when it is not the last member in the object.
141-
142-
## `rt::parser::JSONValue::JSONArray`
143-
144-
Member of the `rt::parser::JSONValue` enum representing [JSON5 arrays](https://spec.json5.org/#arrays).
145-
146-
There are two fields on this struct: `values`, which is of type `Vec<JSONArrayValue>`, and `context` which holds
147-
a one-length tuple containing the whitespace/comments that occur after the opening bracket. In non-empty arrays,
148-
the whitespace that precedes the closing bracket is part of the last item in the `values` Vec.
149-
In other words: `LBRACKET { wsc.0 } [ values ] RBRACKET`
150-
and: `.context.wsc: (String,)`
151-
152-
153-
### `rt::parser::JSONArrayValue`
154-
155-
The `JSONArrayValue` struct represents a single member of a JSON5 Array. It has two fields: `value`, which is any
156-
`JSONValue`, and `context` which contains the contextual whitespace/comments around the member. The `context`'s `wsc`
157-
field is a two-length tuple for the whitespace that may occur after the value and (optionally) after the comma following the value.
158-
In other words, roughly: `value { wsc.0 } [ COMMA { wsc.1 } [ next_value ]]`
159-
and: `.context.wsc: (String, Option<String>)`
160-
161-
When `context.wsc.1` is `Some()` it indicates the presence of the comma (not included in the string) and any whitespace
162-
following the comma is contained in the string. This item MUST be `Some()` when it is not the last member of the array.
163-
164-
## Other `rt::parser::JSONValue`s
165-
68+
```rust
69+
use serde::Serialize;
70+
use json_five::{to_string_formatted, FormatConfiguration, TrailingComma};
71+
#[derive(Serialize)]
72+
struct Test {
73+
int: u32,
74+
seq: Vec<&'static str>,
75+
}
76+
let test = Test {
77+
int: 1,
78+
seq: vec!["a", "b"],
79+
};
80+
81+
let config = FormatConfiguration::with_indent(4, TrailingComma::ALL);
82+
let formatted_doc = to_string_formatted(&test, config).unwrap();
83+
84+
let expected = r#"{
85+
"int": 1,
86+
"seq": [
87+
"a",
88+
"b",
89+
],
90+
}"#;
91+
92+
assert_eq!(formatted_doc, expected);
93+
```
16694

95+
## Examples
16796

168-
- `JSONValue::Integer(String)`
169-
- `JSONValue::Float(String)`
170-
- `JSONValue::Exponent(String)`
171-
- `JSONValue::Null`
172-
- `JSONValue::Infinity`
173-
- `JSONValue::NaN`
174-
- `JSONValue::Hexadecimal(String)`
175-
- `JSONValue::Bool(bool)`
176-
- `JSONValue::DoubleQuotedString(String)`
177-
- `JSONValue::SingleQuotedString(String)`
178-
- `JSONValue::Unary { operator: UnaryOperator, value: Box<JSONValue> }`
179-
- `JSONValue::Identifier(String)` (for object keys only!).
97+
See the `examples/` directory for examples of programs that utilize round-tripping features.
18098

181-
Where these enum members have `String`s, they represent the object as it was tokenized without any modifications (that
182-
is, for example, without any escape sequences un-escaped). The single- and double-quoted `String`s do not include the surrounding
183-
quote characters. These members alone have no `context`.
99+
- `examples/json5-doublequote-fixer` gives an example of tokenization-based round-tripping edits
100+
- `examples/json5-trailing-comma-formatter` gives an example of model-based round-tripping edits
184101

185-
# round-trip tokenizer
186102

187-
The `rt::tokenizer` module contains some useful tools for round-tripping tokens. The `Token`s produced by the
188-
rt tokenizer are owned types containing the lexeme from the source. There are two key functions in the tokenizer module:
103+
# Benchmarking
189104

190-
- `rt::tokenize::source_to_tokens`
191-
- `rt::tokenize::tokens_to_source`
105+
Benchmarks are available in the `benches/` directory. Test data is in the `data/` directory. A couple of benchmarks use
106+
big files that are not committed to this repo. So run `./data/setupdata.sh` to download the required data files
107+
so that you don't skip the big benchmarks. The benchmarks compare `json_five` (this crate) to
108+
[serde_json](https://github.com/serde-rs/json) and [json5-rs](https://github.com/callum-oakley/json5-rs).
192109

193-
Each `Token` generated from `source_to_tokens` also contains some contextual information, such as line/col numbers, offsets, etc.
194-
This contextual information is not required for `tokens_to_source` -- that is: you can create new tokens and insert them
195-
into your tokens array and process those tokens back to JSON5 source without issue.
110+
Notwithstanding the general caveats of benchmarks, in initial testing, `json_five` definitively outperforms `json5-rs`.
111+
In typical scenarios observations have been 3-4x performance, and up to 20x faster in some synthetic tests on extremely large data.
112+
At time of writing (pre- v0) no performance optimizations have been done. I expect performance to improve,
113+
if at least marginally, in the future.
114+
115+
These benchmarks were run on Windows on an i9-10900K with rustc 1.83.0 (90b35a623 2024-11-26). This table won't be updated unless significant changes happen.
116+
117+
118+
| test | json_five | json5 | serde_json |
119+
|----------------------------|-----------|-----------|------------|
120+
| big (25MB) | 580.31 ms | 3.0861 s | 150.39 ms |
121+
| medium-ascii (5MB) | 199.88 ms | 706.94 ms | 59.008 ms |
122+
| empty | 228.62 ns | 708.00 ns | 38.786 ns |
123+
| arrays | 578.24 ns | 1.3228 µs | 100.95 ns |
124+
| objects | 922.91 ns | 2.0748 µs | 205.75 ns |
125+
| nested-array | 22.990 µs | 29.356 µs | 5.0483 µs |
126+
| nested-objects | 50.659 µs | 132.75 µs | 14.755 µs |
127+
| string | 421.17 ns | 3.5691 µs | 91.051 ns |
128+
| number | 238.75 ns | 779.13 ns | 36.179 ns |
129+
| deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
130+
| deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
131+
| deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
132+
| deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
133+
| deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
134+
| deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
135+
| deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
136+
| deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
137+
| deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
138+
| deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
139+
| deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
140+
| deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
141+
| deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
142+
| deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
143+
| deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
144+
| deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
145+
| deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
146+
| deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
147+
| serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
148+
| serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
149+
| serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
150+
| serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
151+
| serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
152+
| serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
153+
| serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
154+
| serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
155+
| serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
156+
| serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
157+
| serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
158+
| serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
159+
| serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
160+
| serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
161+
| serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
162+
| serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
163+
| serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
164+
| serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
196165

197-
The `tok_type` attribute leverages the same `json_five::tokenize::TokType` types. Those are:
198166

199-
- `LeftBrace`
200-
- `RightBrace`
201-
- `LeftBracket`
202-
- `RightBracket`
203-
- `Comma`
204-
- `Colon`
205-
- `Name` (Identifiers)
206-
- `SingleQuotedString`
207-
- `DoubleQuotedString`
208-
- `BlockComment`
209-
- `LineComment` note: the lexeme includes the singular trailing newline, if present (e.g., not a comment just before EOF with no newline at end of file)
210-
- `Whitespace`
211-
- `True`
212-
- `False`
213-
- `Null`
214-
- `Integer`
215-
- `Float`
216-
- `Infinity`
217-
- `Nan`
218-
- `Exponent`
219-
- `Hexadecimal`
220-
- `Plus`
221-
- `Minus`
222-
- `EOF`
223-
224-
Note: string tokens will include surrounding quotes.
225167

226168

227169
# Notes

0 commit comments

Comments
 (0)