[CXF-8947] - Avoid expensive regex operations in Rfc3986UriValidator#1483
Conversation
…if URI.getHost() returns a host name Signed-off-by: Adam Anderson <atanderson9383@gmail.com>
| if (HttpUtils.isHttpScheme(uri.getScheme())) { | ||
| // If URI.getHost() returns a host name, validate it and | ||
| // skip the expensive regular expression logic. | ||
| final String uriHost = uri.getHost(); |
There was a problem hiding this comment.
@WhiteCat22 the reason for this validator to exists sadly is the fact that Java's URI is not RFC-3986 complaint. The host is not trustful source here hence we validate it against the pattern.
There was a problem hiding this comment.
That is a good point that we had not considered.
| private static final Set<String> KNOWN_HTTP_VERBS_WITH_NO_RESPONSE_CONTENT = | ||
| new HashSet<>(Arrays.asList(new String[]{"HEAD", "OPTIONS"})); | ||
|
|
||
| private static final Pattern HTTP_SCHEME_PATTERN = Pattern.compile("^(?i)(http|https)$"); |
There was a problem hiding this comment.
This pattern is super straightforward to look up, what are exactly the gains here (vs adding the set + comparator + lowecasing)?
There was a problem hiding this comment.
Hi @reta , Sorry for the delay. The reason we changed this was because HashSet.contains(String) uses "WAY" less CPU than Pattern.matcher(String).matches()
There was a problem hiding this comment.
Fair enough, may be we could just have two constants instead?
…isHttpScheme() - Remove the URI.getHost() shortcut in Rfc3986UriValidator since Java's URI.getHost() is not RFC-3986 compliant (per reviewer feedback) - Replace HashSet-based HTTP scheme check with simple String.equalsIgnoreCase() comparisons using constants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hi @reta, I've pushed a commit addressing your review feedback:
|
…1483) * [CXF-8947] - Avoid expensive regex operations in Rfc3986UriValidator if URI.getHost() returns a host name Signed-off-by: Adam Anderson <atanderson9383@gmail.com> * Address review feedback: drop Rfc3986UriValidator shortcut, simplify isHttpScheme() - Remove the URI.getHost() shortcut in Rfc3986UriValidator since Java's URI.getHost() is not RFC-3986 compliant (per reviewer feedback) - Replace HashSet-based HTTP scheme check with simple String.equalsIgnoreCase() comparisons using constants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Signed-off-by: Adam Anderson <atanderson9383@gmail.com> Co-authored-by: Guillaume Nodet <gnodet@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> (cherry picked from commit 3046dda)
…1483) * [CXF-8947] - Avoid expensive regex operations in Rfc3986UriValidator if URI.getHost() returns a host name Signed-off-by: Adam Anderson <atanderson9383@gmail.com> * Address review feedback: drop Rfc3986UriValidator shortcut, simplify isHttpScheme() - Remove the URI.getHost() shortcut in Rfc3986UriValidator since Java's URI.getHost() is not RFC-3986 compliant (per reviewer feedback) - Replace HashSet-based HTTP scheme check with simple String.equalsIgnoreCase() comparisons using constants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Signed-off-by: Adam Anderson <atanderson9383@gmail.com> Co-authored-by: Guillaume Nodet <gnodet@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> (cherry picked from commit 3046dda)
https://issues.apache.org/jira/browse/CXF-8947