Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions lib/go/thrift/binary_protocol.go
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,15 @@ func safeReadBytes(size int32, trans io.Reader) ([]byte, error) {
return nil, nil
}

// Fast path for reads smaller than 10 MiB that only allocates exactly
// what is asked for.
const readLimit = 10 * 1024 * 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the stated goal here is to optimize for ordinary cases (with size < readLimit), why not just do io.CopyN with bytes.Buffer when it's the edge case? that way we don't have to allocate for an 10MiB buffer upfront if the size is malformed, and if it's really a very huge payload we just pay the price for a bit more allocations (same as today).

(I would also consider making this limit configurable, but don't feel too strongly either way here)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just io.CopyN [...]?

I have no strong opinions in either direction, that works fine for me! The reason I changed it was that io.CopyN uses more memory than this approach. Looking at 40MB ask from the issue tracker, io.CopyN allocates:

    main_test.go:36: 134217366 B/op	      21 allocs/op - ask: 41943040, data: 41943040

while this approach allocates:

    main_test.go:44: 83886131 B/op	       5 allocs/op - ask: 41943040, data: 41943040

But the downside, as you said, is that we will allocate more than available when we get a malformed message.
Let me know if I should change it to use io.CopyN instead!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there are 3 different cases:

  1. ordinary case (reasonable size, not malformed)
  2. legit very large payload (large size, not malformed)
  3. malformed payload with large size

and the choice here is between 2 and 3 (we already agreed to optimize for 1).

between 2 and 3 I would prefer to optimize for 3, based on the consequences of the opposite. if we optimize for 2 (your current approach), then whenever the code get a malformed payload they would need to allocate 10MiB up front, that can be a big risk for code running with tight resources so the consequence can be more severe (e.g. they have more risk to crash due to insufficient memory). with the CopyN approach the consequence is just more allocations for legit cases, which is slightly slower and use more memory, but if the code is intended to handle legit very large payloads then we can already assume that it has more resource and those "wasted" memory will have a smaller consequences.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'll update the PR later today, thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, PTAL

if size < readLimit {
b := make([]byte, size)
n, err := io.ReadFull(trans, b)
return b[:n], err
}

buf := new(bytes.Buffer)
_, err := io.CopyN(buf, trans, int64(size))
return buf.Bytes(), err
Expand Down
Loading