|
| 1 | +======================================================================== |
| 2 | +Compute summary for all detected packages. |
| 3 | +======================================================================== |
| 4 | + |
| 5 | + |
| 6 | +| **Organization:** `AboutCode <https://aboutcode.org>`_ |
| 7 | +| **Project:** `Scancode Toolkit <https://github.com/aboutcode-org/scancode-toolkit>`_ |
| 8 | +| **Mentee:** `Swastik Sharma (swastkk) <https://github.com/swastkk>`_ |
| 9 | +| **Mentors:** `Philippe Ombredanne`_, `Ayan Sinha Mahapatra`_, `Avishrant Sharma`_, |
| 10 | + `Jonathan Yang`_, `Jay Kumar`_ |
| 11 | +
|
| 12 | +Overview |
| 13 | +-------- |
| 14 | + |
| 15 | +Previously, we computed the summary at the codebase level, which included elements like the |
| 16 | +`license_clarity_score`, `declared_holder`, `other_license_expressions`, and more. |
| 17 | +This project aims to improve scanning accuracy by computing summaries and license clarity scores for |
| 18 | +each package and its files, rather than for the entire scan. This involves enhancing package models |
| 19 | +and ensuring accurate attribute collection across all package ecosystems. |
| 20 | + |
| 21 | +Implementation |
| 22 | +-------------- |
| 23 | + |
| 24 | +- **Added a new command-line option called** ``--package-summary``: |
| 25 | + |
| 26 | + - Provides a package-level summary within a single codebase. |
| 27 | + - Involves the ``license_clarity_score`` calculation. |
| 28 | + - Populates package attributes like ``copyright``, ``holder``, |
| 29 | + ``other_license_expression``, ``notice_text``. |
| 30 | + |
| 31 | +- **The** ``--package-summary`` **option must be used with:** |
| 32 | + |
| 33 | + - ``--classify``: Helps ScanCode further classify scanned files/directories into |
| 34 | + categories like ``legal``, ``readme``, ``top-level``, ``manifest``. |
| 35 | + - ``--package`` or ``-p``: Detects various package manifests, lockfiles, and |
| 36 | + package-like data, assembles codebase-level packages and dependencies, and tags |
| 37 | + files as part of the packages. |
| 38 | + |
| 39 | +- **Benefits of the change:** |
| 40 | + |
| 41 | + - Allows users to obtain a more refined summary for each individual package in |
| 42 | + a codebase. |
| 43 | + - Improves package assembly for various package ecosystems like npm, python-whl, |
| 44 | + rust, rubygems, etc. Since the package-level summary heavily depends on the |
| 45 | + package assembly, there were several scenarios where key files for top-level |
| 46 | + packages were not properly tagged. To address this, a method called |
| 47 | + ``get_top_level_resources`` was implemented. This method retrieves the resources |
| 48 | + for top-level packages, which helps in correctly tagging the key files. |
| 49 | + |
| 50 | +- **Testing:** |
| 51 | + |
| 52 | + - All changes are tested through multiple full scan tests. |
| 53 | + - Validated both correct behavior and error handling. |
| 54 | + |
| 55 | +Linked Pull Requests |
| 56 | +-------------------- |
| 57 | + |
| 58 | +.. list-table:: |
| 59 | + :widths: 10 60 30 |
| 60 | + :header-rows: 1 |
| 61 | + |
| 62 | + * - Sr. no |
| 63 | + - Link |
| 64 | + - Status |
| 65 | + * - 1 |
| 66 | + - https://github.com/aboutcode-org/scancode-toolkit/pull/3792 |
| 67 | + - Open |
| 68 | + |
| 69 | +Related Issues |
| 70 | +-------------- |
| 71 | + |
| 72 | +.. list-table:: |
| 73 | + :widths: 10 60 30 |
| 74 | + :header-rows: 1 |
| 75 | + |
| 76 | + * - Sr. no |
| 77 | + - Name |
| 78 | + - Link |
| 79 | + * - 1 |
| 80 | + - Improve Package models to have license_clarity_score |
| 81 | + - `#3817 <https://github.com/aboutcode-org/scancode-toolkit/issues/3817>`_ |
| 82 | + * - 2 |
| 83 | + - Post Scan option --package-summary |
| 84 | + - `#3802 <https://github.com/aboutcode-org/scancode-toolkit/issues/3802>`_ |
| 85 | + * - 3 |
| 86 | + - Look in package-ecosystem specific key-files for referenced licenses |
| 87 | + - `#3707 <https://github.com/aboutcode-org/scancode-toolkit/issues/3707>`_ |
| 88 | + * - 4 |
| 89 | + - Compute summary and clarity for EACH package in a codebase |
| 90 | + - `#3287 <https://github.com/aboutcode-org/scancode-toolkit/issues/3287>`_ |
| 91 | + * - 5 |
| 92 | + - Provide data values in scan results to correspond with license_clarity_score elements |
| 93 | + - `#1395 <https://github.com/aboutcode-org/scancode-toolkit/issues/1395>`_ |
| 94 | + * - 6 |
| 95 | + - Populate package instance attributes from it's files |
| 96 | + - `#3862 <https://github.com/aboutcode-org/scancode-toolkit/issues/3862>`_ |
| 97 | + * - 7 |
| 98 | + - Improve Ruby Package Ecosystem/Datafile Handler to tag key_files properly |
| 99 | + - `#3881 <https://github.com/aboutcode-org/scancode-toolkit/issues/3881>`_ |
| 100 | + * - 8 |
| 101 | + - Rust Members files are not detected properly |
| 102 | + - `#3895 <https://github.com/aboutcode-org/scancode-toolkit/issues/3895>`_ |
| 103 | + * - 9 |
| 104 | + - Add Tests for Package Level Summary computation |
| 105 | + - `#3889 <https://github.com/aboutcode-org/scancode-toolkit/issues/3889>`_ |
| 106 | + |
| 107 | + |
| 108 | +Post GSoC |
| 109 | +--------- |
| 110 | + |
| 111 | +I would like to merge this PR into Scancode Toolkit, hopefully allowing users to leverage |
| 112 | +this feature to expand their package/codebase scanning capabilities. |
| 113 | + |
| 114 | +Links |
| 115 | +----- |
| 116 | + |
| 117 | +* `Project Idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2024-Project-Ideas#compute-summary-for-all-detected-packages>`_ |
| 118 | + |
| 119 | +* `Official GSoC project page <https://summerofcode.withgoogle.com/programs/2024/projects/JzMlDtnM>`_ |
| 120 | + |
| 121 | +* `GSoC Proposal <https://docs.google.com/document/d/1TcGqQVzXhTkz6Pmu9UaXAr4R4q1rlT4tof7H7dsVG0o/edit?usp=sharing>`_ |
| 122 | + |
| 123 | +* `Project Board <https://github.com/orgs/aboutcode-org/projects/4>`_ |
| 124 | + |
| 125 | +* `Reference Issue <https://github.com/aboutcode-org/scancode-toolkit/issues/3287>`_ |
| 126 | + |
| 127 | +Acknowledgements |
| 128 | +---------------- |
| 129 | + |
| 130 | +I would like to thank my mentors: |
| 131 | + |
| 132 | +- `Philippe Ombredanne`_ |
| 133 | +- `Ayan Sinha Mahapatra`_ |
| 134 | +- `Avishrant Sharma`_ |
| 135 | +- `Jay Kumar`_ |
| 136 | +- `Jonathan Yang`_ |
| 137 | + |
| 138 | +Weekly Status calls were greatly helpful and those special 1:1 calls with |
| 139 | +`Ayan Sinha Mahapatra`_ and `Philippe Ombredanne`_ were so amazing. |
| 140 | +Thank you for your time and your patience! |
| 141 | + |
| 142 | + |
| 143 | +.. _Philippe Ombredanne: https://github.com/pombredanne |
| 144 | +.. _Ayan Sinha Mahapatra: https://github.com/AyanSinhaMahapatra |
| 145 | +.. _Avishrant Sharma: https://github.com/AvishrantSsh |
| 146 | +.. _Jay Kumar: https://github.com/35C4n0r |
| 147 | +.. _Jonathan Yang: https://github.com/JonoYang |
0 commit comments