Use IO.copy_stream when possible #383
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix: #66
Context
Ref: googleapis/google-cloud-ruby#1897
We noticed that Google Cloud Storage's ruby library performance on download was heavily impacted by CPU usage on the host, especially for big files. After some digging it was clear it's due to how the data has to transit through
read()
andwrite()
instead of leveragingsendfile()
.An experiment using a quick and dirty patch showed a reduction from
15s
to5s
for a 500MB download.The patch
To leverage
sendfile()
in ruby, the best and simplest API isIO.copy_stream
as suggested in #66.The problem is that
copy_stream
need IO or IO like objects to work with, andhttpclient
's API mostly deal with blocks, so I had to adapt the API somehow.One important thing to note, is that we can only leverage
sendfile
if there is no modifications to apply on the request body, e.g. no chunking, no compression.I'll add comments on specific parts of the patch in a later comments.