feat(scanner): Merge duplicate scan results that share a provenance #10502

maennchen · 2025-06-19T18:38:05Z

TODO

Merge refactor(scanner): Extract NONE-padding of scan result into functions #10582 first
Merge feat(model): Add + merge operators for scan-domain objects #10585 first
Rebase PR

When the SpdxDocumentFile package manager is used, the project and all contained packages often resolve to the same VCS provenance (e.g. the root of the Git repository).
Before this change ORT stored two separate ScanResults for such a provenance – one keyed to the project, one keyed to the package.

That caused two follow-on problems:

Both results appeared in the OrtResult, so evaluators saw duplicate findings for the same source tree.
Because projects and packages are handled by different rules the package result was additionally padded with a SpdxConstants.NONE finding whenever includeFilesWithoutFindings was enabled. The evaluator therefore compared real license findings from the project result with NONE from the package result and failed with a violation.

This patch

merges ScannerRun, and
performs the "pad with NONE" step only after deduplication, so every path is represented exactly once.

As a consequence the evaluator now receives one consistent set of license findings per provenance / scanner, eliminating the false mismatch.

This is the first time for me writing Kotlin. Sorry if the code is not up to the usual standards.

scanner/src/main/kotlin/Scanner.kt

codecov · 2025-06-19T19:31:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.30%. Comparing base (e2dd087) to head (d536a3e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #10502      +/-   ##
============================================
+ Coverage     57.28%   57.30%   +0.01%     
- Complexity     1644     1648       +4     
============================================
  Files           341      341              
  Lines         12722    12722              
  Branches       1206     1206              
============================================
+ Hits           7288     7290       +2     
+ Misses         4971     4969       -2     
  Partials        463      463

Flag	Coverage Δ
funTest-docker	`71.28% <ø> (ø)`
funTest-non-docker	`33.00% <ø> (-0.01%)`	⬇️
test-ubuntu-24.04	`41.77% <ø> (ø)`
test-windows-2022	`41.75% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

MarcelBochtler

Hi @maennchen,
thanks for your contribution!
Do you mind adding a test that demonstrates the issue with the duplicated scan results and that your changes fix these issues?

~~Edit: As there is currently no existing test for the scan function that use an ORT result, I won't insist on this change.~~

Edit 2: I Found these existing tests which already test the scan() function:

ort/scanner/src/funTest/kotlin/scanners/ScannerIntegrationFunTest.kt

Lines 63 to 102 in c086b3b

    
           "Scanning all packages corresponding to a single VCS" should { 
        
               val analyzerResult = createAnalyzerResult(pkg0, pkg1, pkg2, pkg3, pkg4) 
        
               val ortResult = createScanner().scan(analyzerResult, skipExcluded = false, emptyMap()) 
        
               "return the expected ORT result" { 
        
                   val expectedResult = readResource("/scanner-integration-all-pkgs-expected-ort-result.yml") 
        
                   patchActualResult(ortResult.toYaml(), patchStartAndEndTime = true) shouldBe 
        
                       patchExpectedResult(expectedResult) 
        
               } 
        
               "return the expected (merged) scan results" { 
        
                   val expectedResult = readResource("/scanner-integration-expected-scan-results.yml") 
        
                   val scanResults = ortResult.getScanResults().toSortedMap() 
        
                   patchActualResult(scanResults.toYaml(), patchStartAndEndTime = true) shouldBe 
        
                       patchExpectedResult(expectedResult) 
        
               } 
        
               "return the expected (merged) file lists" { 
        
                   val expectedResult = readResource("/scanner-integration-expected-file-lists.yml") 
        
                   val fileLists = ortResult.getFileLists().toSortedMap() 
        
                   fileLists.toYaml() shouldBe patchExpectedResult(expectedResult) 
        
               } 
        
           } 
        
           "Scanning a subset of the packages corresponding to a single VCS" should { 
        
               "return the expected ORT result" { 
        
                   val analyzerResult = createAnalyzerResult(pkg1, pkg3) 
        
                   val expectedResult = readResource("/scanner-integration-subset-pkgs-expected-ort-result.yml") 
        
                   val ortResult = createScanner().scan(analyzerResult, skipExcluded = false, emptyMap()) 
        
                   patchActualResult(ortResult.toYaml(), patchStartAndEndTime = true) shouldBe 
        
                       patchExpectedResult(expectedResult) 
        
               } 
        
           }

Can you have a look at them and extend them to test your use-case?

maennchen · 2025-06-25T17:02:57Z

@MarcelBochtler I added a test to test the deduplication. Running the same test on main will cause duplicated information.

$ ./gradlew scanner:funTest --tests "org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest"

Parallel Configuration Cache is an incubating feature.
Calculating task graph as configuration cache cannot be reused because a build logic input of type 'SemInfoVersionValueSource' has changed.
Type-safe project accessors is an incubating feature.

> Configure project :
Building ORT version 61.1.0.

> Task :scanner:funTest

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected ORT result STARTED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected ORT result PASSED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) scan results STARTED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) scan results PASSED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) file lists STARTED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) file lists PASSED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning a subset of the packages corresponding to a single VCS should > return the expected ORT result STARTED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning a subset of the packages corresponding to a single VCS should > return the expected ORT result FAILED
    io.kotest.assertions.AssertionFailedError: expected:<[Deletion at line 298]           end_line: -1
      scanners:
        Dummy::pkg1:1.0.0:
        - "Dummy"
        Dummy::pkg3:1.0.0:
        - "Dummy"
        Dummy::project:1.0.0:
        - "Dummy"
      files:
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo.git"
            revision: "a732695e03efcbd74539208af98c297ee86e49d5"
            path: ""
          resolved_revision: "a732695e03efcbd74539208af98c297ee86e49d5"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s1/pkg-s1.txt"
          sha1: "e5fb17f8f4f4ef0748bb5ba137fd0e091dd5a1f6"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo2.git"
            revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
            path: ""
          resolved_revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s2/pkg-s2.txt"
          sha1: "37996d13eceb6b29db43a381ce8df375b5eee8e9"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: ".gitmodules"
          sha1: "d7f070ddbe0b6dd8a173714d565a1240dd96eacd"
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "82cfc115138054ce5b5e6839f38687c9d7186710"
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg2/pkg2.txt"
          sha1: "cc8f97cebe1dc0ed889a31f504bcf491d5241aaa"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
        - path: "pkg4/pkg4.txt"
          sha1: "3cba29011be2b9d59f6204d6fa0a386b1b2dbd90"
    advisor: null
    evaluator: null
    resolved_configuration: {}


    [Deletion at line 386] > but was:<[Deletion at line 298]           end_line: -1
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        scanner:
          name: "Dummy"
          version: "1.0.0"
          configuration: ""
        summary:
          start_time: "1970-01-01T00:00:00Z"
          end_time: "1970-01-01T00:00:00Z"
          licenses:
          - license: "NOASSERTION"
            location:
              path: "LICENSE"
              start_line: -1
              end_line: -1
          - license: "NOASSERTION"
            location:
              path: "pkg1/pkg1.txt"
              start_line: -1
              end_line: -1
          - license: "NOASSERTION"
            location:
              path: "pkg3/pkg3.txt"
              start_line: -1
              end_line: -1
      scanners:
        Dummy::pkg1:1.0.0:
        - "Dummy"
        Dummy::pkg3:1.0.0:
        - "Dummy"
        Dummy::project:1.0.0:
        - "Dummy"
      files:
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo.git"
            revision: "a732695e03efcbd74539208af98c297ee86e49d5"
            path: ""
          resolved_revision: "a732695e03efcbd74539208af98c297ee86e49d5"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s1/pkg-s1.txt"
          sha1: "e5fb17f8f4f4ef0748bb5ba137fd0e091dd5a1f6"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo2.git"
            revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
            path: ""
          resolved_revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s2/pkg-s2.txt"
          sha1: "37996d13eceb6b29db43a381ce8df375b5eee8e9"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: ".gitmodules"
          sha1: "d7f070ddbe0b6dd8a173714d565a1240dd96eacd"
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "82cfc115138054ce5b5e6839f38687c9d7186710"
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg2/pkg2.txt"
          sha1: "cc8f97cebe1dc0ed889a31f504bcf491d5241aaa"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
        - path: "pkg4/pkg4.txt"
          sha1: "3cba29011be2b9d59f6204d6fa0a386b1b2dbd90"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
    advisor: null
    evaluator: null
    resolved_configuration: {}


    [Deletion at line 356]         path: ""
          resolved_revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s2/pkg-s2.txt"
          sha1: "37996d13eceb6b29db43a381ce8df375b5eee8e9"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: ".gitmodules"
          sha1: "d7f070ddbe0b6dd8a173714d565a1240dd96eacd"
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "82cfc115138054ce5b5e6839f38687c9d7186710"
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg2/pkg2.txt"
          sha1: "cc8f97cebe1dc0ed889a31f504bcf491d5241aaa"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
        - path: "pkg4/pkg4.txt"
          sha1: "3cba29011be2b9d59f6204d6fa0a386b1b2dbd90"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
    advisor: null
    evaluator: null
    resolved_configuration: {}
    >

16:50:31.855 [ForkJoinPool-1-worker-1] DEBUG org.eclipse.jgit.internal.util.ShutdownHook - Cleanup org.eclipse.jgit.util.FS$FileStoreAttributes$$Lambda/0x00007eabf0386c10@e239dec during JVM shutdown

> Task :scanner:funTest FAILED

4 tests completed, 1 failed

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':scanner:funTest'.
> There were failing tests. See the report at: file:///workspaces/ort/scanner/build/reports/tests/funTest/index.html

* Try:
> Run with --scan to get full insights.

BUILD FAILED in 53s
81 actionable tasks: 52 executed, 29 up-to-date
Configuration cache entry stored.

scanner/src/funTest/kotlin/scanners/ScannerIntegrationFunTest.kt

MarcelBochtler

Thanks for adding the test. This really helps to understand the necessity of this change.

The change looks good to me. Any other opinions @oss-review-toolkit/kotlin-devs ?

scanner/src/funTest/kotlin/scanners/ScannerIntegrationFunTest.kt

scanner/src/funTest/resources/scanner-integration-all-pkgs-expected-ort-result.yml

scanner/src/main/kotlin/Scanner.kt

scanner/src/funTest/kotlin/scanners/ScannerIntegrationFunTest.kt

maennchen · 2025-07-05T09:44:50Z

The code can be seen in action here: https://github.com/elixir-lang/elixir/actions/runs/16086802339/job/45398947306
(See the Action Artifacts for Details)

scanner/src/funTest/kotlin/scanners/ScannerIntegrationFunTest.kt

scanner/src/main/kotlin/Scanner.kt

scanner/src/funTest/kotlin/scanners/ScannerIntegrationFunTest.kt

Prep work for PR oss-review-toolkit#10502. * Add ScanResult.padNoneLicenseFindings() Inserts one synthetic LicenseFinding(SpdxConstants.NONE, UNKNOWN_LINE) for every path that lacks a license finding. * Add ScannerRun.padNoneLicenseFindings() Calls the above helper for each contained ScanResult, using the run’s FileList data to obtain the full set of paths. * Replace the in-place code in Scanner with a single call to ScannerRun.padNoneLicenseFindings() when includeFilesWithoutFindings is true. * Introduce ScannerRunTest.padNoneLicenseFindings() Verifies that a path without findings receives exactly one NONE entry. This refactor keeps behaviour unchanged while isolating the padding logic in the domain model, improving reuse and testability and keeping later PRs focused. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR oss-review-toolkit#10502. * Add ScanResult.padNoneLicenseFindings(paths) Creates a copy whose summary contains a single LicenseFinding(SpdxConstants.NONE, UNKNOWN_LINE) for every path in *paths* that currently lacks a licence finding. * Add ScannerRun.padNoneLicenseFindings() Uses the run’s FileList map to obtain all paths per provenance and delegates to the ScanResult helper for every contained result. * Remove the ad-hoc padding code from Scanner.kt and replace it with a single call to scannerRun.padNoneLicenseFindings() when includeFilesWithoutFindings is enabled. * Introduce ScannerRunTest.padNoneLicenseFindings() Ensures a file without licence findings receives exactly one NONE entry. This refactor keeps behaviour unchanged while centralising the padding logic in the model layer, improving reuse and testability. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

…ovenance Prep work for upcoming PR oss-review-toolkit#10502. * `ScannerRun.init` - Add a `require` check that no two `ScanResult`s share the same provenance. - Add a similar check that no two `FileList`s share the same provenance. The offending provenance is rendered via `toYaml()` to aid debugging. * `ScannerRunTest` - Introduce two unit tests that assert an `IllegalArgumentException` is thrown when duplicates are present for scan results or file lists, respectively. These runtime checks make improper scan runs surface early and provide clear error messages, which will simplify the upcoming deduplication work. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

…ovenance Prep work for upcoming PR oss-review-toolkit#10502. * `ScannerRun.init` - Add a `require` check that no two `ScanResult`s share the same provenance and scanner. - Add a similar check that no two `FileList`s share the same provenance. The offending provenance is rendered via `toYaml()` to aid debugging. * `ScannerRunTest` - Introduce two unit tests that assert an `IllegalArgumentException` is thrown when duplicates are present for scan results or file lists, respectively. These runtime checks make improper scan runs surface early and provide clear error messages, which will simplify the upcoming deduplication work. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR oss-review-toolkit#10502. * Add ScanResult.padNoneLicenseFindings(paths) Creates a copy whose summary contains a single LicenseFinding(SpdxConstants.NONE, UNKNOWN_LINE) for every path in *paths* that currently lacks a licence finding. * Add ScannerRun.padNoneLicenseFindings() Uses the run’s FileList map to obtain all paths per provenance and delegates to the ScanResult helper for every contained result. * Remove the ad-hoc padding code from Scanner.kt and replace it with a single call to scannerRun.padNoneLicenseFindings() when includeFilesWithoutFindings is enabled. * Introduce ScannerRunTest.padNoneLicenseFindings() Ensures a file without licence findings receives exactly one NONE entry. This refactor keeps behaviour unchanged while centralising the padding logic in the model layer, improving reuse and testability. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR oss-review-toolkit#10502. * `ScannerRun.init` - Add a `require` check that no two `ScanResult`s share the same provenance and scanner. - Add a similar check that no two `FileList`s share the same provenance. The offending provenance is rendered via `toYaml()` to aid debugging. * `ScannerRunTest` - Introduce two unit tests that assert an `IllegalArgumentException` is thrown when duplicates are present for scan results or file lists, respectively. These runtime checks make improper scan runs surface early and provide clear error messages, which will simplify the upcoming deduplication work. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR oss-review-toolkit#10502. * `FileList`, `ScanSummary`, `ScanResult` and `ScannerRun` now implement `operator fun plus(...)`, enabling declarative, type-safe merging of scan data. * Each operator validates invariants (e.g. identical provenance, identical scanner / environment / config) and throws an `IllegalArgumentException` on mismatch. * For compatible objects, collections are union-merged; time ranges are widened (`startTime = min`, `endTime = max`). * `ScannerRun.plus()` * Merges file lists and scan results by `provenance to scanner` using the new operators. * Combines issues / scanners maps by key union. * Carries forward earliest start and latest end timestamps. * **Unit tests** * Extensive coverage in `ScannerRunTest` for all merge scenarios, including negative cases (different config / environment) and positive cases for times, provenances, issues, scanners, file lists and scan results. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR oss-review-toolkit#10502. * Add ScanResult.padNoneLicenseFindings(paths) Creates a copy whose summary contains a single LicenseFinding(SpdxConstants.NONE, UNKNOWN_LINE) for every path in *paths* that currently lacks a licence finding. * Add ScannerRun.padNoneLicenseFindings() Uses the run’s FileList map to obtain all paths per provenance and delegates to the ScanResult helper for every contained result. * Remove the ad-hoc padding code from Scanner.kt and replace it with a single call to scannerRun.padNoneLicenseFindings() when includeFilesWithoutFindings is enabled. * Introduce ScannerRunTest.padNoneLicenseFindings() Ensures a file without licence findings receives exactly one NONE entry. This refactor keeps behaviour unchanged while centralising the padding logic in the model layer, improving reuse and testability. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR oss-review-toolkit#10502. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR oss-review-toolkit#10502. * `FileList`, `ScanSummary`, `ScanResult` and `ScannerRun` now implement `operator fun plus(...)`, enabling declarative, type-safe merging of scan data. * Each operator validates invariants (e.g. identical provenance, identical scanner / environment / config) and throws an `IllegalArgumentException` on mismatch. * For compatible objects, collections are union-merged; time ranges are widened (`startTime = min`, `endTime = max`). * `ScannerRun.plus()` * Merges file lists and scan results by `provenance to scanner` using the new operators. * Combines issues / scanners maps by key union. * Carries forward earliest start and latest end timestamps. * **Unit tests** * Extensive coverage in `ScannerRunTest` for all merge scenarios, including negative cases (different config / environment) and positive cases for times, provenances, issues, scanners, file lists and scan results. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR #10502. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR oss-review-toolkit#10502. * `FileList`, `ScanSummary`, `ScanResult` and `ScannerRun` now implement `operator fun plus(...)`, enabling declarative, type-safe merging of scan data. * Each operator validates invariants (e.g. identical provenance, identical scanner / environment / config) and throws an `IllegalArgumentException` on mismatch. * For compatible objects, collections are union-merged; time ranges are widened (`startTime = min`, `endTime = max`). * `ScannerRun.plus()` * Merges file lists and scan results by `provenance to scanner` using the new operators. * Combines issues / scanners maps by key union. * Carries forward earliest start and latest end timestamps. * **Unit tests** * Extensive coverage in `ScannerRunTest` for all merge scenarios, including negative cases (different config / environment) and positive cases for times, provenances, issues, scanners, file lists and scan results. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Prep work for upcoming PR #10502. * `FileList`, `ScanSummary`, `ScanResult` and `ScannerRun` now implement `operator fun plus(...)`, enabling declarative, type-safe merging of scan data. * Each operator validates invariants (e.g. identical provenance, identical scanner / environment / config) and throws an `IllegalArgumentException` on mismatch. * For compatible objects, collections are union-merged; time ranges are widened (`startTime = min`, `endTime = max`). * `ScannerRun.plus()` * Merges file lists and scan results by `provenance to scanner` using the new operators. * Combines issues / scanners maps by key union. * Carries forward earliest start and latest end timestamps. * **Unit tests** * Extensive coverage in `ScannerRunTest` for all merge scenarios, including negative cases (different config / environment) and positive cases for times, provenances, issues, scanners, file lists and scan results. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

scanner/src/funTest/kotlin/scanners/ScannerIntegrationFunTest.kt

scanner/src/main/kotlin/Scanner.kt

When the SpdxDocumentFile package manager is used, the *project* and all contained *packages* often resolve to the **same VCS provenance** (e.g. the root of the Git repository). Before this change ORT stored two separate `ScanResult`s for such a provenance – one keyed to the project, one keyed to the package. That caused two follow-on problems: * Both results appeared in the `OrtResult`, so evaluators saw **duplicate findings** for the *same* source tree. * Because projects and packages are handled by different rules the package result was additionally **padded with a `SpdxConstants.NONE` finding** whenever `includeFilesWithoutFindings` was enabled. The evaluator therefore compared *real* license findings from the project result with `NONE` from the package result and failed with a violation. This patch * merges `ScannerRun`, and * performs the "pad with NONE" step only **after** deduplication, so every path is represented exactly once. As a consequence the evaluator now receives one consistent set of license findings per provenance / scanner, eliminating the false mismatch. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

Comments addressed.

maennchen requested a review from a team as a code owner June 19, 2025 18:38

maennchen force-pushed the jm/scan_deduplication branch from 038c082 to 40190e3 Compare June 19, 2025 18:42

github-advanced-security bot found potential problems Jun 19, 2025

View reviewed changes

scanner/src/main/kotlin/Scanner.kt Fixed Show resolved Hide resolved

maennchen changed the title ~~scanner: Merge duplicate scan results that share a provenance~~ feat(scanner): Merge duplicate scan results that share a provenance Jun 19, 2025

maennchen mentioned this pull request Jun 19, 2025

Update ORT Scanner elixir-lang/elixir#14594

Draft

1 task

maennchen force-pushed the jm/scan_deduplication branch from 40190e3 to ed1df5f Compare June 19, 2025 19:37

MarcelBochtler reviewed Jun 24, 2025

View reviewed changes

maennchen force-pushed the jm/scan_deduplication branch from 4432de1 to 263be95 Compare June 25, 2025 17:06

github-advanced-security bot found potential problems Jun 25, 2025

View reviewed changes

maennchen force-pushed the jm/scan_deduplication branch from 263be95 to 276bc2f Compare June 25, 2025 18:59

MarcelBochtler previously approved these changes Jun 25, 2025

View reviewed changes

sschuberth requested changes Jun 26, 2025

View reviewed changes

fviernau requested changes Jun 26, 2025

View reviewed changes

scanner/src/main/kotlin/Scanner.kt Outdated Show resolved Hide resolved

scanner/src/main/kotlin/Scanner.kt Outdated Show resolved Hide resolved

scanner/src/main/kotlin/Scanner.kt Outdated Show resolved Hide resolved

scanner/src/main/kotlin/Scanner.kt Outdated Show resolved Hide resolved

fviernau requested changes Jun 26, 2025

View reviewed changes

scanner/src/funTest/kotlin/scanners/ScannerIntegrationFunTest.kt Outdated Show resolved Hide resolved

maennchen dismissed MarcelBochtler’s stale review via dfb5f87 July 5, 2025 09:28

maennchen force-pushed the jm/scan_deduplication branch from 276bc2f to dfb5f87 Compare July 5, 2025 09:28

sschuberth requested review from fviernau and MarcelBochtler July 5, 2025 09:47

fviernau previously requested changes Jul 7, 2025

View reviewed changes

maennchen marked this pull request as draft July 10, 2025 09:02

maennchen mentioned this pull request Jul 10, 2025

refactor(scanner): Extract NONE-padding of scan result into functions #10582

Merged

maennchen mentioned this pull request Jul 10, 2025

feat(model): Guard duplicate scan results / file lists per provenance #10584

Merged

3 tasks

maennchen force-pushed the jm/scan_deduplication branch from ed52821 to 56c12eb Compare July 10, 2025 14:51

maennchen requested review from sschuberth and fviernau July 10, 2025 18:21

maennchen added a commit to maennchen/ort that referenced this pull request Jul 11, 2025

refactor(scanner): extract NONE-padding of scan result into functions

9507abf

Prep work for upcoming PR oss-review-toolkit#10502. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

maennchen added a commit to maennchen/ort that referenced this pull request Jul 11, 2025

refactor(scanner): Extract NONE-padding of scan result into functions

f02fc54

Prep work for upcoming PR oss-review-toolkit#10502. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

maennchen force-pushed the jm/scan_deduplication branch from 56c12eb to 671db0d Compare July 11, 2025 11:28

fviernau pushed a commit that referenced this pull request Jul 11, 2025

refactor(scanner): Extract NONE-padding of scan result into functions

0fe35b1

Prep work for upcoming PR #10502. Signed-off-by: Jonatan Männchen <jonatan@maennchen.ch>

maennchen force-pushed the jm/scan_deduplication branch from 671db0d to 6a583fb Compare July 11, 2025 13:14

maennchen force-pushed the jm/scan_deduplication branch from 6a583fb to f6a6d27 Compare July 29, 2025 11:56

maennchen force-pushed the jm/scan_deduplication branch from f6a6d27 to d780617 Compare July 29, 2025 12:11

sschuberth requested changes Jul 29, 2025

View reviewed changes

maennchen force-pushed the jm/scan_deduplication branch from d780617 to d536a3e Compare July 29, 2025 14:31

sschuberth enabled auto-merge (rebase) July 29, 2025 14:47

sschuberth approved these changes Jul 29, 2025

View reviewed changes

sschuberth merged commit 927e47c into oss-review-toolkit:main Jul 29, 2025
25 checks passed

maennchen deleted the jm/scan_deduplication branch July 29, 2025 14:55

	"Scanning all packages corresponding to a single VCS" should {
	val analyzerResult = createAnalyzerResult(pkg0, pkg1, pkg2, pkg3, pkg4)
	val ortResult = createScanner().scan(analyzerResult, skipExcluded = false, emptyMap())

	"return the expected ORT result" {
	val expectedResult = readResource("/scanner-integration-all-pkgs-expected-ort-result.yml")

	patchActualResult(ortResult.toYaml(), patchStartAndEndTime = true) shouldBe
	patchExpectedResult(expectedResult)
	}

	"return the expected (merged) scan results" {
	val expectedResult = readResource("/scanner-integration-expected-scan-results.yml")

	val scanResults = ortResult.getScanResults().toSortedMap()

	patchActualResult(scanResults.toYaml(), patchStartAndEndTime = true) shouldBe
	patchExpectedResult(expectedResult)
	}

	"return the expected (merged) file lists" {
	val expectedResult = readResource("/scanner-integration-expected-file-lists.yml")

	val fileLists = ortResult.getFileLists().toSortedMap()

	fileLists.toYaml() shouldBe patchExpectedResult(expectedResult)
	}
	}

	"Scanning a subset of the packages corresponding to a single VCS" should {
	"return the expected ORT result" {
	val analyzerResult = createAnalyzerResult(pkg1, pkg3)
	val expectedResult = readResource("/scanner-integration-subset-pkgs-expected-ort-result.yml")

	val ortResult = createScanner().scan(analyzerResult, skipExcluded = false, emptyMap())

	patchActualResult(ortResult.toYaml(), patchStartAndEndTime = true) shouldBe
	patchExpectedResult(expectedResult)
	}
	}

feat(scanner): Merge duplicate scan results that share a provenance #10502

feat(scanner): Merge duplicate scan results that share a provenance #10502

Conversation

maennchen commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

Uh oh!

codecov bot commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MarcelBochtler left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maennchen commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcelBochtler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maennchen commented Jul 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maennchen commented Jun 19, 2025 •

edited

Loading

codecov bot commented Jun 19, 2025 •

edited

Loading

MarcelBochtler left a comment •

edited

Loading

maennchen commented Jun 25, 2025 •

edited

Loading