Skip to content

[CXF-8947] - Avoid expensive regex operations in Rfc3986UriValidator#1483

Merged
gnodet merged 2 commits intoapache:mainfrom
WhiteCat22:Rfc3986UriValidator_performance_improvement
Mar 11, 2026
Merged

[CXF-8947] - Avoid expensive regex operations in Rfc3986UriValidator#1483
gnodet merged 2 commits intoapache:mainfrom
WhiteCat22:Rfc3986UriValidator_performance_improvement

Conversation

@WhiteCat22
Copy link
Contributor

…if URI.getHost() returns a host name

Signed-off-by: Adam Anderson <atanderson9383@gmail.com>
if (HttpUtils.isHttpScheme(uri.getScheme())) {
// If URI.getHost() returns a host name, validate it and
// skip the expensive regular expression logic.
final String uriHost = uri.getHost();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WhiteCat22 the reason for this validator to exists sadly is the fact that Java's URI is not RFC-3986 complaint. The host is not trustful source here hence we validate it against the pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point that we had not considered.

private static final Set<String> KNOWN_HTTP_VERBS_WITH_NO_RESPONSE_CONTENT =
new HashSet<>(Arrays.asList(new String[]{"HEAD", "OPTIONS"}));

private static final Pattern HTTP_SCHEME_PATTERN = Pattern.compile("^(?i)(http|https)$");
Copy link
Member

@reta reta Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is super straightforward to look up, what are exactly the gains here (vs adding the set + comparator + lowecasing)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @reta , Sorry for the delay. The reason we changed this was because HashSet.contains(String) uses "WAY" less CPU than Pattern.matcher(String).matches()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, may be we could just have two constants instead?

…isHttpScheme()

- Remove the URI.getHost() shortcut in Rfc3986UriValidator since
  Java's URI.getHost() is not RFC-3986 compliant (per reviewer feedback)
- Replace HashSet-based HTTP scheme check with simple
  String.equalsIgnoreCase() comparisons using constants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gnodet
Copy link
Contributor

gnodet commented Mar 11, 2026

Hi @reta, I've pushed a commit addressing your review feedback:

  • Removed the URI.getHost() shortcut in Rfc3986UriValidator — you were right that Java's URI.getHost() is not RFC-3986 compliant, so it can't be trusted here.
  • Simplified HttpUtils.isHttpScheme() to use equalsIgnoreCase() with two string constants instead of a HashSet, as you suggested.

@gnodet gnodet merged commit 3046dda into apache:main Mar 11, 2026
5 of 6 checks passed
reta pushed a commit that referenced this pull request Mar 11, 2026
…1483)

* [CXF-8947] - Avoid expensive regex operations in Rfc3986UriValidator if URI.getHost() returns a host name

Signed-off-by: Adam Anderson <atanderson9383@gmail.com>

* Address review feedback: drop Rfc3986UriValidator shortcut, simplify isHttpScheme()

- Remove the URI.getHost() shortcut in Rfc3986UriValidator since
  Java's URI.getHost() is not RFC-3986 compliant (per reviewer feedback)
- Replace HashSet-based HTTP scheme check with simple
  String.equalsIgnoreCase() comparisons using constants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Adam Anderson <atanderson9383@gmail.com>
Co-authored-by: Guillaume Nodet <gnodet@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 3046dda)
reta pushed a commit that referenced this pull request Mar 11, 2026
…1483)

* [CXF-8947] - Avoid expensive regex operations in Rfc3986UriValidator if URI.getHost() returns a host name

Signed-off-by: Adam Anderson <atanderson9383@gmail.com>

* Address review feedback: drop Rfc3986UriValidator shortcut, simplify isHttpScheme()

- Remove the URI.getHost() shortcut in Rfc3986UriValidator since
  Java's URI.getHost() is not RFC-3986 compliant (per reviewer feedback)
- Replace HashSet-based HTTP scheme check with simple
  String.equalsIgnoreCase() comparisons using constants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Adam Anderson <atanderson9383@gmail.com>
Co-authored-by: Guillaume Nodet <gnodet@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 3046dda)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants