Skip to content

Restore support for improperly encoded strings #609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 14, 2024

Conversation

casperisfine
Copy link

Since the json gem was initially written for Ruby 1.8 before strings had encoding, it used to do its own UTF-8 validation direction on bytes and never really considered the string declared encoding. So passing a ASCII-8BIT string worked as long as the bytes were valid UTF-8

We may want to deprecate this, but we should emit warnings first.

Ref: #595 (comment)

FYI: @Earlopain

When rubygems is double loaded it fails the test.

The warning should happen in the first place but this
makes the test more resilient.
Since the `json` gem was initially written for Ruby 1.8
before strings had encoding, it used to do its own UTF-8
validation direction on bytes and never really considered
the string declared encoding. So passing a ASCII-8BIT string
worked as long as the bytes were valid UTF-8

We may want to deprecate this, but we should emit warnings first.
Both the pure and java version already raise an error on such case,
so this confirms that we're rather deprecate and fix the C version.

We shouldn't make the pure or java versions accept these broken
strings.
@casperisfine
Copy link
Author

I pushed c5a6d80, because after reflection is doesn't make sense to introduce what I think is a bug in the ruby and java versions.

Only the C version was this liberal, so only on this version we should make sure to keep supporting it for a while and probably deprecate it.

@byroot byroot merged commit 2ad3514 into ruby:master Oct 14, 2024
73 checks passed
@Earlopain
Copy link

Thanks for the ping. I've since found out that my issue originated from webmock where some adapters return the mocked response in BINARY, regardless of what string was originally provided, and some return the string as-is, with the encoding provded by the user.

stub_request(:get, "https://example.com").to_return(body: "Hello World!")
response = http.get("https://example.com")
puts response.body.encoding
# Either ASCII-8BIT or UTF-8, depending on who you ask

I'm not sure what would be correct, I guess it is up to the adapter/library. Just some context from where I was coming from.

This actually uncovered an issue where I assumed the response would always be UTF-8 which in real-life just happened to be so, but it's not guaranteed.

@casperisfine
Copy link
Author

Yep. That's why I think we should get rid of that behavior, but not in a minor like this. And ideally we'd warn first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants