- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2.3k
 
[Push AV]: Add url validation as per RFC3986 #41502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds URL validation using the uriparser library, which is a good approach. The changes correctly add the new dependency to the build system. In the implementation, I've suggested a few improvements for correctness, efficiency, and code clarity. Specifically, I recommend making the scheme validation case-insensitive as per RFC 3986 and enhancing isValidRFC3986Uri to also check for query and fragment components. This avoids redundant processing and makes the validation logic more efficient and maintainable.
| bool hasHttpsScheme(const UriUriA & uri) | ||
| { | ||
| if (!uri.scheme.first || !uri.scheme.afterLast) | ||
| return false; | ||
| 
               | 
          ||
| std::string scheme(uri.scheme.first, uri.scheme.afterLast); | ||
| return scheme == "https"; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to RFC 3986, the scheme component is case-insensitive. The current implementation performs a case-sensitive comparison with "https". This should be changed to a case-insensitive comparison.
Additionally, creating a std::string for comparison is inefficient as it can cause heap allocation. A direct character-by-character comparison can avoid this.
You will need to include <cctype> for the tolower function.
bool hasHttpsScheme(const UriUriA & uri)
{
    if (!uri.scheme.first || !uri.scheme.afterLast)
    {
        return false;
    }
    const char * scheme_start = uri.scheme.first;
    const size_t scheme_len   = static_cast<size_t>(uri.scheme.afterLast - uri.scheme.first);
    constexpr char https_scheme[] = "https";
    constexpr size_t https_scheme_len = sizeof(https_scheme) - 1;
    if (scheme_len != https_scheme_len)
    {
        return false;
    }
    for (size_t i = 0; i < https_scheme_len; ++i)
    {
        if (tolower(scheme_start[i]) != https_scheme[i])
        {
            return false;
        }
    }
    return true;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code from Gemini above is cruddy, but the comment is correct.
Please:
- Create a new src/lib/support/StringComparison.h and associated unit test
 - Have a new method for string view case insensitive compare:
 
#include <string_view>
#include <cctype>
#include <algorithm>
//// NAMESPACE IT properly
bool CaseInsensitiveCompare(std::string_view left, std::string_view right) {
    if (left.length() != right.length()) {
        return false; // Lengths differ, so they cannot be equal case-insensitively
    }
    for (size_t i = 0; i < left.length(); ++i) {
        // Convert characters to lowercase for comparison
        if (std::tolower(static_cast<unsigned char>(left[i])) !=
            std::tolower(static_cast<unsigned char>(right[i]))) {
            return false; // Characters differ after case conversion
        }
    }
    return true; // All characters matched after case conversion
}
| bool isValidRFC3986Uri(const std::string & uriStr) | ||
| { | ||
| UriUriA uri; | ||
| const char * errorPos; | ||
| 
               | 
          ||
| int result = uriParseSingleUriA(&uri, uriStr.c_str(), &errorPos); | ||
| if (result != URI_SUCCESS) | ||
| { | ||
| return false; | ||
| } | ||
| 
               | 
          ||
| bool isHttps = hasHttpsScheme(uri); | ||
| 
               | 
          ||
| uriFreeUriMembersA(&uri); | ||
| return isHttps; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the URL validation more efficient, you can extend this function to also check for the presence of a query or fragment, since uriparser already provides this information after parsing. This avoids scanning the URL string again in ValidateUrl. The checks in ValidateUrl for '?' and '#' would then become redundant and could be removed in a follow-up change.
bool isValidRFC3986Uri(const std::string & uriStr)
{
    UriUriA uri;
    const char * errorPos;
    if (uriParseSingleUriA(&uri, uriStr.c_str(), &errorPos) != URI_SUCCESS)
    {
        return false;
    }
    bool isHttps     = hasHttpsScheme(uri);
    bool hasQuery    = (uri.query.first != nullptr);
    bool hasFragment = (uri.fragment.first != nullptr);
    uriFreeUriMembersA(&uri);
    return isHttps && !hasQuery && !hasFragment;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation has to happen for all URIs, but the query/fragment bits only need to happen for CMAF URIs. So the bot's advice here would only work if this function had a way to configure whether query/fragment are allowed.
| 
           PR #41502: Size comparison from a95f163 to c0610c6 Full report (28 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, efr32, qpg, realtek, stm32, telink)
  | 
    
| 
               | 
          ||
| // Check minimum length and https prefix | ||
| if (url.size() <= https.size() || url.substr(0, https.size()) != https) | ||
| if (!isValidRFC3986Uri(url)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11.7.6.13. TransportOptionsStruct Type
11.7.6.13.5. URL Field
This field SHALL be a valid string in RFC 3986 format representing the upload location. The field
SHALL use the https scheme which will be validated by the underlying TLSEndpointID.
These RFC Checks are for all Transports, while the following invalid character checks are CMAF-specific in the spec.
More details at https://github.com/CHIP-Specifications/connectedhomeip-spec/issues/12442 and Full discussion at #41321 (comment).
Also, heads up - #41321 addresses other issues in this API and also call this only when ContainerType is CMAF. So would need to rebase to it once it's merged.
5344b5c    to
    d0d13e2      
    Compare
  
    d0d13e2    to
    911d797      
    Compare
  
    | 
           PR #41502: Size comparison from 43e4daf to 911d797 Full report (28 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, efr32, qpg, realtek, stm32, telink)
  | 
    
          Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@            Coverage Diff             @@
##           master   #41502      +/-   ##
==========================================
+ Coverage   51.01%   51.06%   +0.04%     
==========================================
  Files        1386     1384       -2     
  Lines      100988   100900      -88     
  Branches    13081    13057      -24     
==========================================
+ Hits        51519    51523       +4     
+ Misses      49469    49377      -92     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
  | 
    
        
          
                third_party/uriparser/BUILD.gn
              
                Outdated
          
        
      | action("build_uriparser") { | ||
| script = "scripts/build_uriparser.py" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can this not be a portable BUILD.gn target? This overrides a lot of the build infra in a non portable way.
| @click.command() | ||
| @click.option("--clang", is_flag=True, default=False, help="Use clang instead of gcc") | ||
| @click.option("--output-dir", default="build", help="Output directory for uriparser build") | ||
| def main(clang, output_dir): | ||
| script_dir = os.path.dirname(os.path.abspath(__file__)) | ||
| repo_dir = os.path.join(script_dir, "..", "repo") | ||
| build_dir = os.path.join(repo_dir, output_dir) | ||
| 
               | 
          ||
| # CMake command | ||
| cmake_cmd = [ | ||
| "cmake", | ||
| "-B", build_dir, | ||
| "-DCMAKE_BUILD_TYPE=Release", | ||
| "-DBUILD_SHARED_LIBS=OFF", | ||
| "-DURIPARSER_BUILD_TESTS=OFF", | ||
| "-DURIPARSER_BUILD_DOCS=OFF" | ||
| ] | ||
| 
               | 
          ||
| if clang: | ||
| cmake_cmd += ["-DCMAKE_C_COMPILER=clang", "-DCMAKE_CXX_COMPILER=clang++"] | ||
| else: | ||
| cmake_cmd += ["-DCMAKE_C_COMPILER=gcc", "-DCMAKE_CXX_COMPILER=g++"] | ||
| 
               | 
          ||
| print("Running:", " ".join(cmake_cmd)) | ||
| subprocess.run(cmake_cmd, cwd=repo_dir, check=True) | ||
| 
               | 
          ||
| # Build | ||
| make_cmd = ["make", "-j", str(os.cpu_count())] | ||
| print("Running:", " ".join(make_cmd)) | ||
| subprocess.run(make_cmd, cwd=build_dir, check=True) | ||
| 
               | 
          ||
| print("uriparser build complete at", build_dir) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works for some platforms and likely just in CI. Why does this need to be a new dep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Consider avoiding new dependency, unless it is portably built
 - New code requires unit tests for passing/failing URI
 - Add necessary reusable case insensitive string compare
 
| 
           PR #41502: Size comparison from 43e4daf to e937127 Increases above 0.2%: 
 Full report (32 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, efr32, psoc6, qpg, realtek, stm32, telink)
  | 
    
Summary
Fixes #41483
Testing
CI