Skip to content

Fix sizing of Maps #3868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: 2.x
Choose a base branch
from
Open

Fix sizing of Maps #3868

wants to merge 2 commits into from

Conversation

kilink
Copy link

@kilink kilink commented Aug 1, 2025

Fix several instances where Maps were sized incorrectly when the number of elements to insert is known; specifying the capacity without taking into account the load factor meant it would always result in a rehash / resize operation.

To this end, add a simple Maps utility class that handles the capacity calculation, much like the static HashMap.newHashMap helper that was added in JDK19.

Checklist

Before we can review and merge your changes, please go through the checklist below. If you're still working on some items, feel free to submit your pull request as a draft—our CI will help guide you through the remaining steps.

✅ Required checks

🧪 Tests (select one)

  • I have added or updated tests to cover my changes.
  • No additional tests are needed for this change.

📝 Changelog (select one)

  • I added a changelog entry in src/changelog/.2.x.x. (See Changelog Entry File Guide).
  • This is a trivial change and does not require a changelog entry.

Fix several instances where Maps were sized incorrectly when the number of
elements to insert is known; specifying the capacity without taking into
account the load factor meant it would always result in a rehash / resize
operation.

To this end, add a simple Maps utility class that handles the capacity
calculation, much like the static HashMap.newHashMap helper that was added in JDK19.
@vy
Copy link
Member

vy commented Aug 2, 2025 via email

@kilink
Copy link
Author

kilink commented Aug 5, 2025

Did you observe a particular performance issue?

I don't recall how I stumbled upon it or if it showed up in any performance profiles (it definitely has for other libraries). It's just a common thing I've noticed across various projects (e.g., spring and grpc-java) since the HashMap constructor is a foot gun and it's not easy to do correctly until Java 19. Since the intent here was to presize the map, and we know how many elements to insert, it just seemed to make sense to size it so we always avoid the resize / rehash operations.

Copy link

github-actions bot commented Aug 6, 2025

Job Requested goals Build Tool Version Build Outcome Build Scan®
build-macos-latest clean install 3.9.8 Build Scan PUBLISHED
build-ubuntu-latest clean install 3.9.8 Build Scan PUBLISHED
build-windows-latest clean install 3.9.8 Build Scan PUBLISHED
Generated by gradle/develocity-actions

@ppkarwasz
Copy link
Contributor

Hi @kilink,

Adding a new utility class to log4j-api and then using it in log4j-core reinforces an unnatural dependency between the two modules, and would introduce a hard incompatibility between Log4j Core 2.26.0 and Log4j API 2.25.x. The Core module should implement the API, not depend on it for internal utilities. I realize this kind of coupling already exists in some places, but I’d prefer not to increase it.

Could you instead:

  • Move Maps to o.a.l.l.util.internal to make it clear it’s not intended for external use. While o.a.l.l.util is theoretically internal, it remains exported due to past API leaks (e.g., o.a.l.l.util.Supplier).
  • Create a separate o.a.l.l.core.util.internal.Maps helper for log4j-core, so it can be used independently without introducing additional cross-module dependencies.

I agree with @vy that, without clear benchmark results, the value of this enhancement is questionable. We avoid using HashMap in performance-critical paths (such as the logging context map) because its instantiation and operation costs are high for small maps. Instead, we use o.a.l.l.util.StringMap, which offers enhanced features like mutability control (freeze()) and a foreach method (introduced for Map only in JDK 8). However, StringMap represents a significant technical debt we would like to eliminate. If you are interested in map performance, could you explore the feasibility of replacing StringMap with Map in version 3.x?

@vy
Copy link
Member

vy commented Aug 6, 2025

I'm against enhancements without convincing figures obtained from benchmarks mapping to real-world use cases. If this optimization yields a tangible benefit on the hot path, provide a reproducer please.

@kilink
Copy link
Author

kilink commented Aug 6, 2025

I personally don't think it's worthwhile to spend time writing a benchmark for what amounts to a bug fix / correctness issue. If you'd rather not accept the change, feel free to close the issue. I'll leave my rationale for fixing it however:

  • Without a benchmark, we can see that the incorrect supplied capacity will cause a resize / rehash, and result in a map that is over-sized for the number of items inserted. It's easy to avoid with a trivial calculation which takes into account the load factor.
  • It's true that not all instances of this I changed in the PR are likely to be on the hot path, but fixing it everywhere is more of a consistency thing; if the helper is there, why not, otherwise it would just be more confusing.
  • There are likely callers in the wild that at least would be hitting the toMap implementation of the ReadOnlyStringMap, e.g. likely through ThreadContextMap (I can confirm such instances internally, as well as in OSS projects such as Sentry).
  • Whoever wrote those lines of code was attempting to size the HashMaps precisely by passing in a capacity, it just so happens that the number of items to insert != the map capacity, so seems like an easy mistake to have made. I assume care was taken here though since log4j generally tries to be optimal in terms of allocations and performance.

I assume the main point of contention is the addition of a utility class, and I agree it is ugly, but it's a stopgap until (?) Log4j can use the static HashMap.newHashMap added in Java 19, I presume when the baseline JDK version log4j requires allows us to use that, we can just switch to it and remove any such shim utility class. Or another way to put it, I presume you would readily accept this change if it were simply delegating to the JDK helpers (assuming we could use it)?

@@ -206,7 +207,7 @@ public void close() {
}

private void closeMap() {
final Map<String, String> valuesToReplace = new HashMap<>(originalValues.size());
final Map<String, String> valuesToReplace = Maps.newHashMap(originalValues.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed all log4j-api and log4j-core changes, and found this to be only change that one might claim to be of importance. I prefer a PR that

  1. Replaces all HashMap#new(int) calls with a HashMap#new() (Since they're all incorrectly implemented, it is better to just stick to defaults.)
  2. Replaces only this line in CloseableThreadContext with new HashMap((int) Math.ceil(originalValues.size() / 0.75))

@ppkarwasz, WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To triage
Development

Successfully merging this pull request may close these issues.

3 participants