Skip to content

Commit ca99049

Browse files
Merge pull request #143 from swastkk/gsoc24-swastkk
Add 2024 GSoC report Compute Summary for all detected packages
2 parents c158429 + ee253d6 commit ca99049

File tree

3 files changed

+163
-4
lines changed

3 files changed

+163
-4
lines changed

docs/source/archive/gsoc-toc.rst

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,15 @@ GSoC -- Google Summer of Code
66
open source software development. GSoC is completely online designed to encourage university
77
student participation in open source software development.
88
It was started by Google in 2005.
9-
More about GSoc - <https://summerofcode.withgoogle.com/about/>_
9+
More about GSoC - `<https://summerofcode.withgoogle.com/about/>`_
10+
11+
GSoC 2024
12+
---------
13+
14+
.. toctree::
15+
:maxdepth: 2
16+
17+
gsoc/reports/2024/scancode_toolkit_swastkk
1018

1119
GSoC 2022
1220
---------
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
========================================================================
2+
Compute summary for all detected packages.
3+
========================================================================
4+
5+
6+
| **Organization:** `AboutCode <https://aboutcode.org>`_
7+
| **Project:** `Scancode Toolkit <https://github.com/aboutcode-org/scancode-toolkit>`_
8+
| **Mentee:** `Swastik Sharma (swastkk) <https://github.com/swastkk>`_
9+
| **Mentors:** `Philippe Ombredanne`_, `Ayan Sinha Mahapatra`_, `Avishrant Sharma`_,
10+
`Jonathan Yang`_, `Jay Kumar`_
11+
12+
Overview
13+
--------
14+
15+
Previously, we computed the summary at the codebase level, which included elements like the
16+
`license_clarity_score`, `declared_holder`, `other_license_expressions`, and more.
17+
This project aims to improve scanning accuracy by computing summaries and license clarity scores for
18+
each package and its files, rather than for the entire scan. This involves enhancing package models
19+
and ensuring accurate attribute collection across all package ecosystems.
20+
21+
Implementation
22+
--------------
23+
24+
- **Added a new command-line option called** ``--package-summary``:
25+
26+
- Provides a package-level summary within a single codebase.
27+
- Involves the ``license_clarity_score`` calculation.
28+
- Populates package attributes like ``copyright``, ``holder``,
29+
``other_license_expression``, ``notice_text``.
30+
31+
- **The** ``--package-summary`` **option must be used with:**
32+
33+
- ``--classify``: Helps ScanCode further classify scanned files/directories into
34+
categories like ``legal``, ``readme``, ``top-level``, ``manifest``.
35+
- ``--package`` or ``-p``: Detects various package manifests, lockfiles, and
36+
package-like data, assembles codebase-level packages and dependencies, and tags
37+
files as part of the packages.
38+
39+
- **Benefits of the change:**
40+
41+
- Allows users to obtain a more refined summary for each individual package in
42+
a codebase.
43+
- Improves package assembly for various package ecosystems like npm, python-whl,
44+
rust, rubygems, etc. Since the package-level summary heavily depends on the
45+
package assembly, there were several scenarios where key files for top-level
46+
packages were not properly tagged. To address this, a method called
47+
``get_top_level_resources`` was implemented. This method retrieves the resources
48+
for top-level packages, which helps in correctly tagging the key files.
49+
50+
- **Testing:**
51+
52+
- All changes are tested through multiple full scan tests.
53+
- Validated both correct behavior and error handling.
54+
55+
Linked Pull Requests
56+
--------------------
57+
58+
.. list-table::
59+
:widths: 10 60 30
60+
:header-rows: 1
61+
62+
* - Sr. no
63+
- Link
64+
- Status
65+
* - 1
66+
- https://github.com/aboutcode-org/scancode-toolkit/pull/3792
67+
- Open
68+
69+
Related Issues
70+
--------------
71+
72+
.. list-table::
73+
:widths: 10 60 30
74+
:header-rows: 1
75+
76+
* - Sr. no
77+
- Name
78+
- Link
79+
* - 1
80+
- Improve Package models to have license_clarity_score
81+
- `#3817 <https://github.com/aboutcode-org/scancode-toolkit/issues/3817>`_
82+
* - 2
83+
- Post Scan option --package-summary
84+
- `#3802 <https://github.com/aboutcode-org/scancode-toolkit/issues/3802>`_
85+
* - 3
86+
- Look in package-ecosystem specific key-files for referenced licenses
87+
- `#3707 <https://github.com/aboutcode-org/scancode-toolkit/issues/3707>`_
88+
* - 4
89+
- Compute summary and clarity for EACH package in a codebase
90+
- `#3287 <https://github.com/aboutcode-org/scancode-toolkit/issues/3287>`_
91+
* - 5
92+
- Provide data values in scan results to correspond with license_clarity_score elements
93+
- `#1395 <https://github.com/aboutcode-org/scancode-toolkit/issues/1395>`_
94+
* - 6
95+
- Populate package instance attributes from it's files
96+
- `#3862 <https://github.com/aboutcode-org/scancode-toolkit/issues/3862>`_
97+
* - 7
98+
- Improve Ruby Package Ecosystem/Datafile Handler to tag key_files properly
99+
- `#3881 <https://github.com/aboutcode-org/scancode-toolkit/issues/3881>`_
100+
* - 8
101+
- Rust Members files are not detected properly
102+
- `#3895 <https://github.com/aboutcode-org/scancode-toolkit/issues/3895>`_
103+
* - 9
104+
- Add Tests for Package Level Summary computation
105+
- `#3889 <https://github.com/aboutcode-org/scancode-toolkit/issues/3889>`_
106+
107+
108+
Post GSoC
109+
---------
110+
111+
I would like to merge this PR into Scancode Toolkit, hopefully allowing users to leverage
112+
this feature to expand their package/codebase scanning capabilities.
113+
114+
Links
115+
-----
116+
117+
* `Project Idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2024-Project-Ideas#compute-summary-for-all-detected-packages>`_
118+
119+
* `Official GSoC project page <https://summerofcode.withgoogle.com/programs/2024/projects/JzMlDtnM>`_
120+
121+
* `GSoC Proposal <https://docs.google.com/document/d/1TcGqQVzXhTkz6Pmu9UaXAr4R4q1rlT4tof7H7dsVG0o/edit?usp=sharing>`_
122+
123+
* `Project Board <https://github.com/orgs/aboutcode-org/projects/4>`_
124+
125+
* `Reference Issue <https://github.com/aboutcode-org/scancode-toolkit/issues/3287>`_
126+
127+
Acknowledgements
128+
----------------
129+
130+
I would like to thank my mentors:
131+
132+
- `Philippe Ombredanne`_
133+
- `Ayan Sinha Mahapatra`_
134+
- `Avishrant Sharma`_
135+
- `Jay Kumar`_
136+
- `Jonathan Yang`_
137+
138+
Weekly Status calls were greatly helpful and those special 1:1 calls with
139+
`Ayan Sinha Mahapatra`_ and `Philippe Ombredanne`_ were so amazing.
140+
Thank you for your time and your patience!
141+
142+
143+
.. _Philippe Ombredanne: https://github.com/pombredanne
144+
.. _Ayan Sinha Mahapatra: https://github.com/AyanSinhaMahapatra
145+
.. _Avishrant Sharma: https://github.com/AvishrantSsh
146+
.. _Jay Kumar: https://github.com/35C4n0r
147+
.. _Jonathan Yang: https://github.com/JonoYang

docs/source/contributing.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,17 @@
22
Contributing to AboutCode
33
#########################
44

5-
We welcome you and your interest in contributing to open source software! AboutCode is always looking for enthusiatic contributors and we are happy to help with any questions or comments. Here a few resources to get started:
5+
We welcome you and your interest in contributing to open source software! AboutCode
6+
is always looking for enthusiatic contributors and we are happy to help with any questions
7+
or comments. Here a few resources to get started:
68

79
1) Take a look through our public repos here: https://github.com/aboutcode-org/
810
* Find one you are interested in and check out its open **Issues**
9-
2) If you have specific questions browse through our documentation here: https://aboutcode.readthedocs.io/en/latest/
11+
2) If you have specific questions browse through our documentation here:
12+
https://aboutcode.readthedocs.io/en/latest/
1013
* Depending on the project, there may be a separate ReadTheDocs website
11-
* Not finding what you were looking for or still have questions? Open an issue on the relevant repository or ask directly via Gitter or Slack
14+
* Not finding what you were looking for or still have questions?
15+
Open an issue on the relevant repository or ask directly via Gitter or Slack
1216

1317
You can always interact with the AboutCode community on Gitter_ and Slack_.
1418

0 commit comments

Comments
 (0)