Restore support for improperly encoded strings by casperisfine · Pull Request #609 · ruby/json

casperisfine · 2024-10-14T12:26:01Z

Since the json gem was initially written for Ruby 1.8 before strings had encoding, it used to do its own UTF-8 validation direction on bytes and never really considered the string declared encoding. So passing a ASCII-8BIT string worked as long as the bytes were valid UTF-8

We may want to deprecate this, but we should emit warnings first.

Ref: #595 (comment)

FYI: @Earlopain

When rubygems is double loaded it fails the test. The warning should happen in the first place but this makes the test more resilient.

Since the `json` gem was initially written for Ruby 1.8 before strings had encoding, it used to do its own UTF-8 validation direction on bytes and never really considered the string declared encoding. So passing a ASCII-8BIT string worked as long as the bytes were valid UTF-8 We may want to deprecate this, but we should emit warnings first.

Both the pure and java version already raise an error on such case, so this confirms that we're rather deprecate and fix the C version. We shouldn't make the pure or java versions accept these broken strings.

casperisfine · 2024-10-14T13:26:15Z

I pushed c5a6d80, because after reflection is doesn't make sense to introduce what I think is a bug in the ruby and java versions.

Only the C version was this liberal, so only on this version we should make sure to keep supporting it for a while and probably deprecate it.

Earlopain · 2024-10-14T13:31:09Z

Thanks for the ping. I've since found out that my issue originated from webmock where some adapters return the mocked response in BINARY, regardless of what string was originally provided, and some return the string as-is, with the encoding provded by the user.

stub_request(:get, "https://example.com").to_return(body: "Hello World!")
response = http.get("https://example.com")
puts response.body.encoding
# Either ASCII-8BIT or UTF-8, depending on who you ask

I'm not sure what would be correct, I guess it is up to the adapter/library. Just some context from where I was coming from.

This actually uncovered an issue where I assumed the response would always be UTF-8 which in real-life just happened to be so, but it's not guaranteed.

casperisfine · 2024-10-14T13:32:36Z

Yep. That's why I think we should get rid of that behavior, but not in a minor like this. And ideally we'd warn first.

byroot added 3 commits October 14, 2024 14:08

ractor_test.rb: ignore stderr

513ddea

When rubygems is double loaded it fails the test. The warning should happen in the first place but this makes the test more resilient.

Only test the wrongly encoded string behavior on the C version

c5a6d80

Both the pure and java version already raise an error on such case, so this confirms that we're rather deprecate and fix the C version. We shouldn't make the pure or java versions accept these broken strings.

byroot merged commit 2ad3514 into ruby:master Oct 14, 2024

byroot mentioned this pull request Oct 21, 2024

JSON.dump("\x82\xAC\xEF".b) no error with the C extension #634

Closed

eregon mentioned this pull request Oct 30, 2024

Add test for parsing broken strings and use String#encode instead of rb_str_conv_enc() in parser #665

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore support for improperly encoded strings#609

Restore support for improperly encoded strings#609
byroot merged 3 commits intoruby:masterfrom
casperisfine:handle-wrongly-encoded-strings

casperisfine commented Oct 14, 2024

Uh oh!

casperisfine commented Oct 14, 2024

Uh oh!

Earlopain commented Oct 14, 2024

Uh oh!

casperisfine commented Oct 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

casperisfine commented Oct 14, 2024

Uh oh!

casperisfine commented Oct 14, 2024

Uh oh!

Earlopain commented Oct 14, 2024

Uh oh!

casperisfine commented Oct 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants