Skip to content

Add 2024 GSoC report Compute Summary for all detected packages #143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion docs/source/archive/gsoc-toc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,15 @@ GSoC -- Google Summer of Code
open source software development. GSoC is completely online designed to encourage university
student participation in open source software development.
It was started by Google in 2005.
More about GSoc - <https://summerofcode.withgoogle.com/about/>_
More about GSoC - `<https://summerofcode.withgoogle.com/about/>`_

GSoC 2024
---------

.. toctree::
:maxdepth: 2

gsoc/reports/2024/scancode_toolkit_swastkk

GSoC 2022
---------
Expand Down
147 changes: 147 additions & 0 deletions docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
========================================================================
Compute summary for all detected packages.
========================================================================


| **Organization:** `AboutCode <https://aboutcode.org>`_
| **Project:** `Scancode Toolkit <https://github.com/aboutcode-org/scancode-toolkit>`_
| **Mentee:** `Swastik Sharma (swastkk) <https://github.com/swastkk>`_
| **Mentors:** `Philippe Ombredanne`_, `Ayan Sinha Mahapatra`_, `Avishrant Sharma`_,
`Jonathan Yang`_, `Jay Kumar`_

Overview
--------

Previously, we computed the summary at the codebase level, which included elements like the
`license_clarity_score`, `declared_holder`, `other_license_expressions`, and more.
This project aims to improve scanning accuracy by computing summaries and license clarity scores for
each package and its files, rather than for the entire scan. This involves enhancing package models
and ensuring accurate attribute collection across all package ecosystems.

Implementation
--------------

- **Added a new command-line option called** ``--package-summary``:

- Provides a package-level summary within a single codebase.
- Involves the ``license_clarity_score`` calculation.
- Populates package attributes like ``copyright``, ``holder``,
``other_license_expression``, ``notice_text``.

- **The** ``--package-summary`` **option must be used with:**

- ``--classify``: Helps ScanCode further classify scanned files/directories into
categories like ``legal``, ``readme``, ``top-level``, ``manifest``.
- ``--package`` or ``-p``: Detects various package manifests, lockfiles, and
package-like data, assembles codebase-level packages and dependencies, and tags
files as part of the packages.

- **Benefits of the change:**

- Allows users to obtain a more refined summary for each individual package in
a codebase.
- Improves package assembly for various package ecosystems like npm, python-whl,
rust, rubygems, etc. Since the package-level summary heavily depends on the
package assembly, there were several scenarios where key files for top-level
packages were not properly tagged. To address this, a method called
``get_top_level_resources`` was implemented. This method retrieves the resources
for top-level packages, which helps in correctly tagging the key files.

- **Testing:**

- All changes are tested through multiple full scan tests.
- Validated both correct behavior and error handling.

Linked Pull Requests
--------------------

.. list-table::
:widths: 10 60 30
:header-rows: 1

* - Sr. no
- Link
- Status
* - 1
- https://github.com/aboutcode-org/scancode-toolkit/pull/3792
- Open

Related Issues
--------------

.. list-table::
:widths: 10 60 30
:header-rows: 1

* - Sr. no
- Name
- Link
* - 1
- Improve Package models to have license_clarity_score
- `#3817 <https://github.com/aboutcode-org/scancode-toolkit/issues/3817>`_
* - 2
- Post Scan option --package-summary
- `#3802 <https://github.com/aboutcode-org/scancode-toolkit/issues/3802>`_
* - 3
- Look in package-ecosystem specific key-files for referenced licenses
- `#3707 <https://github.com/aboutcode-org/scancode-toolkit/issues/3707>`_
* - 4
- Compute summary and clarity for EACH package in a codebase
- `#3287 <https://github.com/aboutcode-org/scancode-toolkit/issues/3287>`_
* - 5
- Provide data values in scan results to correspond with license_clarity_score elements
- `#1395 <https://github.com/aboutcode-org/scancode-toolkit/issues/1395>`_
* - 6
- Populate package instance attributes from it's files
- `#3862 <https://github.com/aboutcode-org/scancode-toolkit/issues/3862>`_
* - 7
- Improve Ruby Package Ecosystem/Datafile Handler to tag key_files properly
- `#3881 <https://github.com/aboutcode-org/scancode-toolkit/issues/3881>`_
* - 8
- Rust Members files are not detected properly
- `#3895 <https://github.com/aboutcode-org/scancode-toolkit/issues/3895>`_
* - 9
- Add Tests for Package Level Summary computation
- `#3889 <https://github.com/aboutcode-org/scancode-toolkit/issues/3889>`_


Post GSoC
---------

I would like to merge this PR into Scancode Toolkit, hopefully allowing users to leverage
this feature to expand their package/codebase scanning capabilities.

Links
-----

* `Project Idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2024-Project-Ideas#compute-summary-for-all-detected-packages>`_

* `Official GSoC project page <https://summerofcode.withgoogle.com/programs/2024/projects/JzMlDtnM>`_

* `GSoC Proposal <https://docs.google.com/document/d/1TcGqQVzXhTkz6Pmu9UaXAr4R4q1rlT4tof7H7dsVG0o/edit?usp=sharing>`_

* `Project Board <https://github.com/orgs/aboutcode-org/projects/4>`_

* `Reference Issue <https://github.com/aboutcode-org/scancode-toolkit/issues/3287>`_

Acknowledgements
----------------

I would like to thank my mentors:

- `Philippe Ombredanne`_
- `Ayan Sinha Mahapatra`_
- `Avishrant Sharma`_
- `Jay Kumar`_
- `Jonathan Yang`_

Weekly Status calls were greatly helpful and those special 1:1 calls with
`Ayan Sinha Mahapatra`_ and `Philippe Ombredanne`_ were so amazing.
Thank you for your time and your patience!


.. _Philippe Ombredanne: https://github.com/pombredanne
.. _Ayan Sinha Mahapatra: https://github.com/AyanSinhaMahapatra
.. _Avishrant Sharma: https://github.com/AvishrantSsh
.. _Jay Kumar: https://github.com/35C4n0r
.. _Jonathan Yang: https://github.com/JonoYang
10 changes: 7 additions & 3 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,17 @@
Contributing to AboutCode
#########################

We welcome you and your interest in contributing to open source software! AboutCode is always looking for enthusiatic contributors and we are happy to help with any questions or comments. Here a few resources to get started:
We welcome you and your interest in contributing to open source software! AboutCode
is always looking for enthusiatic contributors and we are happy to help with any questions
or comments. Here a few resources to get started:

1) Take a look through our public repos here: https://github.com/aboutcode-org/
* Find one you are interested in and check out its open **Issues**
2) If you have specific questions browse through our documentation here: https://aboutcode.readthedocs.io/en/latest/
2) If you have specific questions browse through our documentation here:
https://aboutcode.readthedocs.io/en/latest/
* Depending on the project, there may be a separate ReadTheDocs website
* Not finding what you were looking for or still have questions? Open an issue on the relevant repository or ask directly via Gitter or Slack
* Not finding what you were looking for or still have questions?
Open an issue on the relevant repository or ask directly via Gitter or Slack

You can always interact with the AboutCode community on Gitter_ and Slack_.

Expand Down
Loading