Skip to content

Conversation

Earlopain
Copy link
Collaborator

@Earlopain Earlopain commented Apr 26, 2025

Description

Historically, the json gem that is default from ruby has been somewhat slow compared to other available implementations like the oj gem. Gems like multi_json exist to allow people to choose which backend they want to use. For example, to use oj, one has to simply add it to their gemfile and then use MultiJson.load and it will automatically use the oj gem.

Over the last few months there has been some great work to make the default json implementation in ruby more performant. In the following script, json used to be 1.5x times slower, now it is 1.4x times faster compared to oj:

Benchmark

require "bundler/inline"

gemfile do
  source 'https://rubygems.org'
  gem 'json'
  gem 'multi_json'
  gem 'oj'
  gem 'benchmark-ips'
end

# https://github.com/ruby/json/blob/3e025f76d77e323b30f6f6d2d8d06e787d497a0c/benchmark/data/twitter.json
data = File.read("twitter.json")

pp MultiJson.adapter

Benchmark.ips do |x|
  x.report("json") do
    JSON.parse(data)
  end

  x.report("multi_json") do
    MultiJson.load(data)
  end

  x.compare!
end
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
Warming up --------------------------------------
                json    56.000 i/100ms
          multi_json    42.000 i/100ms
Calculating -------------------------------------
                json    570.026 (± 5.3%) i/s    (1.75 ms/i) -      2.856k in   5.025704s
          multi_json    417.460 (± 2.4%) i/s    (2.40 ms/i) -      2.100k in   5.033273s

Comparison:
                json:      570.0 i/s
          multi_json:      417.5 i/s - 1.37x  slower

You can read more about this work on the blog of the person who is responsible for these improvements here: https://byroot.github.io/ruby/json/2024/12/15/optimizing-ruby-json-part-1.html. It is 7 parts and contains many other benchmarks and interesting information.

So, at this moment using multi_json is actually making things worse (if oj is in the gemfile) because it considers oj to be the preferable option. multi_json is in maintenance mode as per a maintainer intridea/multi_json#198 (comment) and not recommended anymore for new projects. There has also been no release in the last 5 years (but to be fair, last year there were many commits made by a different person, so not entirely a dead project).

Anyways, I do not think there is any value in shipping with support for multi_json today anymore. If for some reason people want to use a different parser, they can easily accomplish this by implementing a custom serializer. So, for opensearch-ruby 4 I want to default to just using the json gem instead.

Issues Resolved

None

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Earlopain Earlopain marked this pull request as ready for review April 26, 2025 14:41
Copy link
Collaborator

@nhtruong nhtruong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thank you. After this is merged, Feel free to tag this then push the tag upstream to trigger the Release pipeline.

@Earlopain
Copy link
Collaborator Author

Sure, I will do that! What do you think about also requiring faraday >= 2? Currently it permits 1.x but 2.x has been out for more than 3 years ago already.

@dblock
Copy link
Member

dblock commented Apr 26, 2025

Sure, I will do that! What do you think about also requiring faraday >= 2? Currently it permits 1.x but 2.x has been out for more than 3 years ago already.

It will probably break someone, but I vote +1. I generally remove dependencies if there's a good reason, e.g. it's difficult to fix a real problem, the gem is out of support, we carry code that we want to remove, etc. Either way you'll want to do it in a major version, so now would be a good time, and update UPGRADING accordingly.

@nhtruong
Copy link
Collaborator

Sure, I will do that! What do you think about also requiring faraday >= 2? Currently it permits 1.x but 2.x has been out for more than 3 years ago already.

That will resolve a lot of dependency issues for gems that require this gem (I had trouble with Faraday version when releasing Sigv4 gem previously). I agree with @dblock , we won't have another major version bump for a long while so this is a good time to do so.

@Earlopain
Copy link
Collaborator Author

👍 I will make a PR for it tomorrow. Btw, jruby failures seem unrelated, I was getting the exact same errors before the PR on a workflow on main.

That actually reminded me that faraday 1 is still being tested because one of the workflows that failed is named test-opensearch-faraday-1 😅

@nhtruong
Copy link
Collaborator

One more reason to remove Faraday 1.
I can merge this now since your next PR will resolve the error.

@Earlopain
Copy link
Collaborator Author

Ah, two jobs actually failed. It also happens on jruby with faraday 2. I will see if I can figure out what's going on with that when I got some time. Maybe it'll just be fixed by using jruby 10 (rather recently released, currently locked to 9.4)

Historically, the `json` gem that is default from ruby has been somewhat slow compared to other available implementations like the `oj` gem. Gems like `multi_json` exist to allow people to choose which backend they want to use. For example, to use `oj`, one has to simply add it to their gemfile and then use `MultiJson.load` and it will automatically use the `oj` gem.

Over the last few months there has been some great work to make the default `json` implementation in ruby more performant. In the following script, `json` used to be 1.5x times slower, now it is 1.4x times faster compared to `oj`:

<details><summary>Benchmark</summary>
<p>

```rb
require "bundler/inline"

gemfile do
  source 'https://rubygems.org'
  gem 'json'
  gem 'multi_json'
  gem 'oj'
  gem 'benchmark-ips'
end

# https://github.com/ruby/json/blob/3e025f76d77e323b30f6f6d2d8d06e787d497a0c/benchmark/data/twitter.json
data = File.read("twitter.json")

pp MultiJson.adapter

Benchmark.ips do |x|
  x.report("json") do
    JSON.parse(data)
  end

  x.report("multi_json") do
    MultiJson.load(data)
  end

  x.compare!
end
```

```
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
Warming up --------------------------------------
                json    56.000 i/100ms
          multi_json    42.000 i/100ms
Calculating -------------------------------------
                json    570.026 (± 5.3%) i/s    (1.75 ms/i) -      2.856k in   5.025704s
          multi_json    417.460 (± 2.4%) i/s    (2.40 ms/i) -      2.100k in   5.033273s

Comparison:
                json:      570.0 i/s
          multi_json:      417.5 i/s - 1.37x  slower
```

</p>
</details>

You can read more about this work on the blog of the person who is responsible for these improvements here: https://byroot.github.io/ruby/json/2024/12/15/optimizing-ruby-json-part-1.html. It is 7 parts and contains many other benchmarks and interesting information.

So, at this moment using `multi_json` is actually making things worse (if `oj` is in the gemfile) because it considers `oj` to be the preferable option. `multi_json` is in maintenance mode as per a maintainer intridea/multi_json#198 (comment) and not recommended anymore for new projects. There has also been no release in the last 5 years (but to be fair, last year there were many commits made by a different person, so not entirely a dead project).

Anyways, I do not think there is any value in shipping with support for `multi_json` today anymore. If for some reason people want to use a different parser, they can easily accomplish this by implementing a custom serializer. So, for `opensearch-ruby` 4 I want to default to just using the `json` gem instead.

Signed-off-by: Earlopain <14981592+Earlopain@users.noreply.github.com>
@nhtruong nhtruong merged commit 4d1f4a8 into opensearch-project:main Apr 27, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants