@@ -33,195 +33,137 @@ fn main() {
33
33
{
34
34
name: 'Hello',
35
35
count: 42,
36
- maybe: NaN
36
+ maybe: null
37
37
}
38
38
" # ;
39
39
40
40
let parsed = from_str :: <MyData >(source ). unwrap ();
41
- let expected = MyData {name : " Hello" . to_string (), count : 42 , maybe : Some ( NaN )}
42
- assert_eq! (parsed , expected )
41
+ let expected = MyData {name : " Hello" . to_string (), count : 42 , maybe : None };
42
+ assert_eq! (parsed , expected );
43
43
}
44
44
```
45
- ## Examples
46
-
47
- See the ` examples/ ` directory for examples of programs that utilize round-tripping features.
48
-
49
- - ` examples/json5-doublequote-fixer ` gives an example of tokenization-based round-tripping edits
50
- - ` examples/json5-trailing-comma-formatter ` gives an example of model-based round-tripping edits
51
-
52
- ## Benchmarking
53
-
54
- Benchmarks are available in the ` benches/ ` directory. Test data is in the ` data/ ` directory. A couple of benchmarks use
55
- big files that are not committed to this repo. So run ` ./data/setupdata.sh ` to download the required data files
56
- so that you don't skip the big benchmarks. The benchmarks compare ` json_five ` (this crate) to
57
- [ serde_json] ( https://github.com/serde-rs/json ) and [ json5-rs] ( https://github.com/callum-oakley/json5-rs ) .
58
-
59
- Notwithstanding the general caveats of benchmarks, in initial testing, ` json_five ` outperforms ` json5-rs ` .
60
- In typical scenarios: 3-4x performance, it seems. At time of writing (pre- v0) no performance optimizations have been done. I
61
- expect performance to improve, if at least marginally, in the future.
62
-
63
- These benchmarks were run on Windows on an i9-10900K. This table won't be updated unless significant changes happen.
64
45
65
- | test | json_five | serde_json | json5 |
66
- | --------------------| ---------------| ---------------| ---------------|
67
- | big (25MB) | 580.31 ms | 150.39 ms | 3.0861 s |
68
- | medium-ascii (5MB) | 199.88 ms | 59.008 ms | 706.94 ms |
69
- | empty | 228.62 ns | 38.786 ns | 708.00 ns |
70
- | arrays | 578.24 ns | 100.95 ns | 1.3228 µs |
71
- | objects | 922.91 ns | 205.75 ns | 2.0748 µs |
72
- | nested-array | 22.990 µs | 5.0483 µs | 29.356 µs |
73
- | nested-objects | 50.659 µs | 14.755 µs | 132.75 µs |
74
- | string | 421.17 ns | 91.051 ns | 3.5691 µs |
75
- | number | 238.75 ns | 36.179 ns | 779.13 ns |
76
-
77
-
78
-
79
- # Round-trip model
80
-
81
- The ` rt ` module contains the round-trip parser. This is intended to be ergonomic for round-trip use cases, although
82
- it is still very possible to use the default parser (which is more performance-oriented) for certain round-trip use cases.
83
- The round-trip AST model produced by the round-trip parser includes additional ` context ` fields that describe the whitespace, comments,
84
- and (where applicable) trailing commas on each production. Moreover, unlike the default parser, the AST consists
85
- entirely of owned types, allowing for simplified in-place editing.
86
-
87
-
88
- The ` context ` field holds a single field struct that contains the field ` wsc ` (meaning 'white space and comments')
89
- which holds a tuple of ` String ` s that represent the contextual whitespace and comments. The last element in
90
- the ` wsc ` tuple in the ` context ` of ` JSONArrayValue ` and ` JSONKeyValuePair ` objects is an ` Option<String> ` -- which
91
- is used as a marker to indicate an optional trailing comma and any whitespace that may follow that optional comma.
92
-
93
- The ` context ` field is always an ` Option ` .
94
-
95
- Contexts are associated with the following structs (which correspond to the JSON5 productions) and their context layout:
96
-
97
- ## ` rt::parser::JSONText `
98
-
99
- Represents the top-level Text production of a JSON5 document. It consists solely of a single (required) value.
100
- It may have whitespace/comments before or after the value. The ` value ` field contains any ` JSONValue ` and the ` context `
101
- field contains the context struct containing the ` wsc ` field, a two-length tuple that describes the whitespace before and after the value.
102
- In other words: ` { wsc.0 } value { wsc.1 } `
46
+ Serializing also works in the usual way. The re-exported ` to_string ` function comes from the ` ser ` module and works
47
+ how you'd expect with default formatting.
103
48
104
49
``` rust
105
- use json_five :: rt :: parser :: from_str;
106
- use json_five :: rt :: parser :: JSONValue ;
107
-
108
- let doc = from_str (" 'foo'\ n" ). unwrap ();
109
- let context = doc . context. unwrap ();
110
-
111
- assert_eq! (& context . wsc. 0 , " " );
112
- assert_eq! (doc . value, JSONValue :: SingleQuotedString (" foo" . to_string ()));
113
- assert_eq! (& context . wsc. 1 , " \ n" );
50
+ use serde :: Serialize ;
51
+ use json_five :: to_string;
52
+ #[derive(Serialize )]
53
+ struct Test {
54
+ int : u32 ,
55
+ seq : Vec <& 'static str >,
56
+ }
57
+ let test = Test {
58
+ int : 1 ,
59
+ seq : vec! [" a" , " b" ],
60
+ };
61
+ let expected = r # " {"int": 1, "seq": ["a", "b"]}" # ;
62
+ assert_eq! (to_string (& test ). unwrap (), expected );
114
63
```
115
64
65
+ You may also use the ` to_string_formatted ` with a ` FormatConfiguration ` to control the output format, including
66
+ indentation, trailing commas, and key/item separators.
116
67
117
- ## ` rt::parser::JSONValue::JSONObject `
118
-
119
- Member of the ` rt::parser::JSONValue ` enum representing [ JSON5 objects] ( https://spec.json5.org/#objects ) .
120
-
121
- There are two fields: ` key_value_pairs ` , which is a ` Vec ` of ` JSONKeyValuePair ` s, and ` context ` whose ` wsc ` is
122
- a one-length tuple containing the whitespace/comments that occur after the opening brace. In non-empty objects,
123
- the whitespace that precedes the closing brace is part of the last item in the ` key_value_pairs ` Vec.
124
- In other words: ` LBRACE { wsc.0 } [ key_value_pairs ] RBRACE `
125
- and: ` .context.wsc: (String,) `
126
-
127
- ### ` rt::parser::KeyValuePair `
128
-
129
- The ` KeyValuePair ` struct represents the [ 'JSON5Member' production] ( https://spec.json5.org/#prod-JSON5Member ) .
130
- It has three fields: ` key ` , ` value ` , and ` context ` . The ` key ` is a ` JSONValue ` , in practice limited to ` JSONValue::Identifier ` ,
131
- ` JSONValue::DoubleQuotedString ` or a ` JSONValue::SingleQuotedString ` . The ` value ` is any ` JSONValue ` .
132
-
133
- Its context describes whitespace/comments that are between the key
134
- and ` : ` , between the ` : ` and the value, after the value, and (optionally) a trailing comma and whitespace trailing the
135
- comma.
136
- In other words, roughly: ` key { wsc.0 } COLON { wsc.1 } value { wsc.2 } [ COMMA { wsc.3 } [ next_key_value_pair ] ] `
137
- and: ` .context.wsc: (String, String, String, Option<String>) `
138
-
139
- When ` context.wsc.3 ` is ` Some() ` , it indicates the presence of a trailing comma (not included in the string) and
140
- whitespace that follows the comma. This item MUST be ` Some() ` when it is not the last member in the object.
141
-
142
- ## ` rt::parser::JSONValue::JSONArray `
143
-
144
- Member of the ` rt::parser::JSONValue ` enum representing [ JSON5 arrays] ( https://spec.json5.org/#arrays ) .
145
-
146
- There are two fields on this struct: ` values ` , which is of type ` Vec<JSONArrayValue> ` , and ` context ` which holds
147
- a one-length tuple containing the whitespace/comments that occur after the opening bracket. In non-empty arrays,
148
- the whitespace that precedes the closing bracket is part of the last item in the ` values ` Vec.
149
- In other words: ` LBRACKET { wsc.0 } [ values ] RBRACKET `
150
- and: ` .context.wsc: (String,) `
151
-
152
-
153
- ### ` rt::parser::JSONArrayValue `
154
-
155
- The ` JSONArrayValue ` struct represents a single member of a JSON5 Array. It has two fields: ` value ` , which is any
156
- ` JSONValue ` , and ` context ` which contains the contextual whitespace/comments around the member. The ` context ` 's ` wsc `
157
- field is a two-length tuple for the whitespace that may occur after the value and (optionally) after the comma following the value.
158
- In other words, roughly: ` value { wsc.0 } [ COMMA { wsc.1 } [ next_value ]] `
159
- and: ` .context.wsc: (String, Option<String>) `
160
-
161
- When ` context.wsc.1 ` is ` Some() ` it indicates the presence of the comma (not included in the string) and any whitespace
162
- following the comma is contained in the string. This item MUST be ` Some() ` when it is not the last member of the array.
163
-
164
- ## Other ` rt::parser::JSONValue ` s
165
-
68
+ ``` rust
69
+ use serde :: Serialize ;
70
+ use json_five :: {to_string_formatted, FormatConfiguration , TrailingComma };
71
+ #[derive(Serialize )]
72
+ struct Test {
73
+ int : u32 ,
74
+ seq : Vec <& 'static str >,
75
+ }
76
+ let test = Test {
77
+ int : 1 ,
78
+ seq : vec! [" a" , " b" ],
79
+ };
80
+
81
+ let config = FormatConfiguration :: with_indent (4 , TrailingComma :: ALL );
82
+ let formatted_doc = to_string_formatted (& test , config ). unwrap ();
83
+
84
+ let expected = r # " {
85
+ "int": 1,
86
+ "seq": [
87
+ "a",
88
+ "b",
89
+ ],
90
+ }" # ;
91
+
92
+ assert_eq! (formatted_doc , expected );
93
+ ```
166
94
95
+ ## Examples
167
96
168
- - ` JSONValue::Integer(String) `
169
- - ` JSONValue::Float(String) `
170
- - ` JSONValue::Exponent(String) `
171
- - ` JSONValue::Null `
172
- - ` JSONValue::Infinity `
173
- - ` JSONValue::NaN `
174
- - ` JSONValue::Hexadecimal(String) `
175
- - ` JSONValue::Bool(bool) `
176
- - ` JSONValue::DoubleQuotedString(String) `
177
- - ` JSONValue::SingleQuotedString(String) `
178
- - ` JSONValue::Unary { operator: UnaryOperator, value: Box<JSONValue> } `
179
- - ` JSONValue::Identifier(String) ` (for object keys only!).
97
+ See the ` examples/ ` directory for examples of programs that utilize round-tripping features.
180
98
181
- Where these enum members have ` String ` s, they represent the object as it was tokenized without any modifications (that
182
- is, for example, without any escape sequences un-escaped). The single- and double-quoted ` String ` s do not include the surrounding
183
- quote characters. These members alone have no ` context ` .
99
+ - ` examples/json5-doublequote-fixer ` gives an example of tokenization-based round-tripping edits
100
+ - ` examples/json5-trailing-comma-formatter ` gives an example of model-based round-tripping edits
184
101
185
- # round-trip tokenizer
186
102
187
- The ` rt::tokenizer ` module contains some useful tools for round-tripping tokens. The ` Token ` s produced by the
188
- rt tokenizer are owned types containing the lexeme from the source. There are two key functions in the tokenizer module:
103
+ # Benchmarking
189
104
190
- - ` rt::tokenize::source_to_tokens `
191
- - ` rt::tokenize::tokens_to_source `
105
+ Benchmarks are available in the ` benches/ ` directory. Test data is in the ` data/ ` directory. A couple of benchmarks use
106
+ big files that are not committed to this repo. So run ` ./data/setupdata.sh ` to download the required data files
107
+ so that you don't skip the big benchmarks. The benchmarks compare ` json_five ` (this crate) to
108
+ [ serde_json] ( https://github.com/serde-rs/json ) and [ json5-rs] ( https://github.com/callum-oakley/json5-rs ) .
192
109
193
- Each ` Token ` generated from ` source_to_tokens ` also contains some contextual information, such as line/col numbers, offsets, etc.
194
- This contextual information is not required for ` tokens_to_source ` -- that is: you can create new tokens and insert them
195
- into your tokens array and process those tokens back to JSON5 source without issue.
110
+ Notwithstanding the general caveats of benchmarks, in initial testing, ` json_five ` definitively outperforms ` json5-rs ` .
111
+ In typical scenarios observations have been 3-4x performance, and up to 20x faster in some synthetic tests on extremely large data.
112
+ At time of writing (pre- v0) no performance optimizations have been done. I expect performance to improve,
113
+ if at least marginally, in the future.
114
+
115
+ These benchmarks were run on Windows on an i9-10900K with rustc 1.83.0 (90b35a623 2024-11-26). This table won't be updated unless significant changes happen.
116
+
117
+
118
+ | test | json_five | json5 | serde_json |
119
+ | ----------------------------| -----------| -----------| ------------|
120
+ | big (25MB) | 580.31 ms | 3.0861 s | 150.39 ms |
121
+ | medium-ascii (5MB) | 199.88 ms | 706.94 ms | 59.008 ms |
122
+ | empty | 228.62 ns | 708.00 ns | 38.786 ns |
123
+ | arrays | 578.24 ns | 1.3228 µs | 100.95 ns |
124
+ | objects | 922.91 ns | 2.0748 µs | 205.75 ns |
125
+ | nested-array | 22.990 µs | 29.356 µs | 5.0483 µs |
126
+ | nested-objects | 50.659 µs | 132.75 µs | 14.755 µs |
127
+ | string | 421.17 ns | 3.5691 µs | 91.051 ns |
128
+ | number | 238.75 ns | 779.13 ns | 36.179 ns |
129
+ | deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
130
+ | deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
131
+ | deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
132
+ | deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
133
+ | deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
134
+ | deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
135
+ | deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
136
+ | deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
137
+ | deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
138
+ | deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
139
+ | deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
140
+ | deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
141
+ | deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
142
+ | deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
143
+ | deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
144
+ | deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
145
+ | deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
146
+ | deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
147
+ | serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
148
+ | serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
149
+ | serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
150
+ | serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
151
+ | serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
152
+ | serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
153
+ | serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
154
+ | serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
155
+ | serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
156
+ | serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
157
+ | serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
158
+ | serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
159
+ | serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
160
+ | serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
161
+ | serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
162
+ | serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
163
+ | serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
164
+ | serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
196
165
197
- The ` tok_type ` attribute leverages the same ` json_five::tokenize::TokType ` types. Those are:
198
166
199
- - ` LeftBrace `
200
- - ` RightBrace `
201
- - ` LeftBracket `
202
- - ` RightBracket `
203
- - ` Comma `
204
- - ` Colon `
205
- - ` Name ` (Identifiers)
206
- - ` SingleQuotedString `
207
- - ` DoubleQuotedString `
208
- - ` BlockComment `
209
- - ` LineComment ` note: the lexeme includes the singular trailing newline, if present (e.g., not a comment just before EOF with no newline at end of file)
210
- - ` Whitespace `
211
- - ` True `
212
- - ` False `
213
- - ` Null `
214
- - ` Integer `
215
- - ` Float `
216
- - ` Infinity `
217
- - ` Nan `
218
- - ` Exponent `
219
- - ` Hexadecimal `
220
- - ` Plus `
221
- - ` Minus `
222
- - ` EOF `
223
-
224
- Note: string tokens will include surrounding quotes.
225
167
226
168
227
169
# Notes
0 commit comments