Skip to content

Improve License Clarity at Top Package Level #3792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 59 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
7322dc2
Add license-clarity attribute in Package model
swastkk Jun 1, 2024
3ca4308
Add Post Scan Plugin --package-summary option
swastkk Jun 7, 2024
fdca15e
Add a unit test for package summary
swastkk Jun 7, 2024
64957ec
Update Failing tests with SCANCODE_REGEN_TEST_FIXTURES
swastkk Jun 14, 2024
0c93d6d
update: Package model to have license_clarity_score
swastkk Jun 24, 2024
703d7cc
update: package models have license_clarity_score only if package-sum…
swastkk Jun 24, 2024
1667ea1
Update data_type of license_clarity_score attribute to dict and few m…
swastkk Jun 26, 2024
7838da7
Add test for package summary plugin
swastkk Jun 26, 2024
9da7f6e
Update REGEN_TESTS with latest changes
swastkk Jun 26, 2024
946864a
Update headers & other test regenerations, fix #3802
swastkk Jul 1, 2024
d2f5b30
update: Stripdown the test data for package_summary, #3817
swastkk Jul 2, 2024
e60fca8
update: Remove unnecessary data in testfiles that are irrelevant, #3817
swastkk Jul 2, 2024
9a1ee50
Fix: Resolve minor review comments #3802
swastkk Jul 4, 2024
65de49a
Update: Add dummy function to get the scoring elements for license_cl…
swastkk Jul 4, 2024
7dd309f
Update score.py to calculate license_clarity_score at package instanc…
swastkk Jul 4, 2024
6144567
Update: Made --classify as required plugin for --package-summary & up…
swastkk Jul 7, 2024
c775280
Merge branch 'develop' into improve-license-clarity
swastkk Jul 10, 2024
be4b61d
REGEN_Test: Regenerate package_summary test
swastkk Jul 10, 2024
2500f8f
Update: Create a deep copy of package attribute of codebase to have r…
swastkk Jul 12, 2024
4bce6d1
Refactor PackageSummary:process_codebase func code
swastkk Jul 14, 2024
b40abf8
Add get_field_values_from_package_resources method for package resour…
swastkk Jul 14, 2024
f76cee9
Update: Add --license & --copyright to package_summary test & test REGEN
swastkk Jul 14, 2024
818bf6f
Refactor: Combined both methods(Codebase level & Package Level) to on…
swastkk Jul 14, 2024
a63a04c
Refactor: Use single method compute_license_score for both codebase a…
swastkk Jul 14, 2024
27c2cc1
Update: Test Failure fix in test_score.py test cases
swastkk Jul 15, 2024
69a9d31
Add: Populate package attributes in reference to package-summary #3862
swastkk Jul 16, 2024
11cf15e
Update: Add other_license_expression attribute to PackageSummaryAttri…
swastkk Jul 16, 2024
fa6ecce
Update: Add package_attributes_map to ensure attributes are collected…
swastkk Jul 17, 2024
0ba0468
Update: Add Licenses to packagae_summary test-case & minor refactorin…
swastkk Jul 17, 2024
92bf3f4
Update: Add other_license_expression_spdx & populate the same #3862
swastkk Jul 17, 2024
10f4634
TEST_REGEN: Updates in expected.json wrt other_license_expression_spd…
swastkk Jul 17, 2024
d69faa1
Add: Implement get_top_level_resources to pypi whl DataFileHandler, #…
swastkk Jul 19, 2024
eab6c2e
Refactor: Use attributes_to_update list to attribute assignment & get…
swastkk Jul 23, 2024
b852acd
REGEN_Test: Regenerate scancode/test_cli.py::test_scan_cli_help test
swastkk Jul 23, 2024
3545cc8
Refactor: Trim down package_summary test unnecessary files
swastkk Jul 23, 2024
0100310
REGEN_Test: Regenerate package_summary test with updated test data
swastkk Jul 23, 2024
c1de27f
Refactor: Remove double for loop iteration for package&package_copy
swastkk Jul 30, 2024
33e72d0
Refactor: Update compute_license_score & get_field_values_from_resour…
swastkk Jul 30, 2024
b8a2c9b
REFACTOR: Unify compute_license_score func to be used in summary and …
swastkk Aug 2, 2024
67e702b
Resolve failing test due to code refactoring
swastkk Aug 2, 2024
73c6664
Trim down test package_summary data
swastkk Aug 2, 2024
7eedcca
REGEN: Regenerated package_summary test with new testing data
swastkk Aug 2, 2024
b84bec4
Add test for python whl ecosystem #3707 #3862
swastkk Aug 4, 2024
4b121e4
Merge branch 'develop' into improve-license-clarity
swastkk Aug 4, 2024
4416144
Add test for copyright & holder populated package attributes
swastkk Aug 4, 2024
8374263
Move Package Summary Plugin Tests to new test_package_summary #3889
swastkk Aug 8, 2024
150967c
Add Python whl package ecosystem test in test_package_summary #3889
swastkk Aug 8, 2024
eb4cbb8
Add package_summary test for npm package ecosystem #3889
swastkk Aug 8, 2024
c2240f8
Add PackageSummary Test for rubygems ecosystem, #3889
swastkk Aug 9, 2024
dc4a75d
Fix rubygems test (package_summary)
swastkk Aug 9, 2024
391c224
Fix up rubygems(expected json result)
swastkk Aug 9, 2024
bd78ad5
Add get_codebase_resources func to get codebase resources
swastkk Aug 15, 2024
96873dd
Refactor get_field_values_from_resources func and fixup the summary &…
swastkk Aug 15, 2024
30f80ba
Minor nits in get_top_level_resources in pypi whl
swastkk Aug 15, 2024
341fe89
Add package_summary test for rust ecosystem #3889
swastkk Aug 16, 2024
3bff56c
Fix license-clarity-score plugin compute_license_score func input
swastkk Aug 24, 2024
0274044
Send resource objects instead of dicts in the summary & package-summary
swastkk Aug 24, 2024
e5146fc
Minor linting fixed in score.py
swastkk Aug 24, 2024
a3dce3f
Trim down python-whl/rubygem test data #3889
swastkk Aug 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ scancode_scan =
# scan plugins and before the output plugins. See also plugincode.post_scan
# module for details and doc.
scancode_post_scan =
package_summary = packagedcode.plugin_package:PackageSummary
summary = summarycode.summarizer:ScanSummary
tallies = summarycode.tallies:Tallies
tallies-with-details = summarycode.tallies:TalliesWithDetails
Expand Down
11 changes: 8 additions & 3 deletions src/packagedcode/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1563,15 +1563,20 @@ class Package(PackageData):
label='datasource ids',
help='List of the datasource ids used to create this package.'
)


license_clarity_score = attr.ib(default=attr.Factory(dict))

def __attrs_post_init__(self, *args, **kwargs):
if not self.purl:
self.purl = self.set_purl()
if not self.package_uid:
self.package_uid = build_package_uid(self.purl)

def to_dict(self):
return super().to_dict(with_details=False)
def to_dict(self, package_summary=False):
data = super().to_dict(with_details=False)
if not package_summary:
data.pop("license_clarity_score")
return data

def to_package_data(self):
mapping = super().to_dict(with_details=True)
Expand Down
91 changes: 86 additions & 5 deletions src/packagedcode/plugin_package.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,18 @@

import attr
import click
import copy

from commoncode.cliutils import PluggableCommandLineOption
from commoncode.cliutils import DOC_GROUP
from commoncode.cliutils import SCAN_GROUP
from commoncode.cliutils import POST_SCAN_GROUP
from commoncode.resource import Resource
from commoncode.resource import strip_first_path_segment
from plugincode.scan import scan_impl
from plugincode.post_scan import post_scan_impl
from plugincode.scan import ScanPlugin
from plugincode.post_scan import PostScanPlugin

from licensedcode.cache import build_spdx_license_expression
from licensedcode.cache import get_cache
Expand All @@ -36,6 +40,8 @@
from packagedcode.models import Package
from packagedcode.models import PackageData
from packagedcode.models import PackageWithResources
from packagedcode.models import get_files_for_packages
from summarycode.score import compute_license_score

TRACE = os.environ.get('SCANCODE_DEBUG_PACKAGE_API', False)
TRACE_ASSEMBLY = os.environ.get('SCANCODE_DEBUG_PACKAGE_ASSEMBLY', False)
Expand Down Expand Up @@ -200,7 +206,7 @@ def get_scanner(self, package=True, system_package=False, package_only=False, **
package_only=package_only,
)

def process_codebase(self, codebase, strip_root=False, package_only=False, **kwargs):
def process_codebase(self, codebase, strip_root=False, package_only=False, package_summary=False, **kwargs):
"""
Populate the ``codebase`` top level ``packages`` and ``dependencies``
with package and dependency instances, assembling parsed package data
Expand Down Expand Up @@ -260,7 +266,7 @@ def process_codebase(self, codebase, strip_root=False, package_only=False, **kwa
logger_debug(f'packagedcode: process_codebase: add_license_from_sibling_file: modified: {modified}')

# Create codebase-level packages and dependencies
create_package_and_deps(codebase, strip_root=strip_root, **kwargs)
create_package_and_deps(codebase, package_summary, strip_root=strip_root, **kwargs)
#raise Exception()

if has_licenses:
Expand All @@ -272,7 +278,80 @@ def process_codebase(self, codebase, strip_root=False, package_only=False, **kwa
if TRACE_LICENSE and modified:
logger_debug(f'packagedcode: process_codebase: add_referenced_license_matches_from_package: modified: {modified}')

def get_package_resources(codebase):
"""
Get resources for each package in the codebase.
"""
resource_for_packages = list(get_files_for_packages(codebase))
package_resources = {}

for resource, package_uid in resource_for_packages:
if package_uid not in package_resources:
package_resources[package_uid] = []
package_resources[package_uid].append(resource)

return package_resources

@post_scan_impl
class PackageSummary(PostScanPlugin):
"""
Summary at the Package Level.
"""
run_order = 11
sort_order = 11

options = [
PluggableCommandLineOption(('--package-summary',),
is_flag=True,
default=False,
help='Summarize scans by providing License Clarity Score '
'and populating other license/copyright attributes '
'for package instances from their key files and other files.',
required_options=['classify', 'package'],
help_group=POST_SCAN_GROUP)
]

def is_enabled(self, package_summary, **kwargs):
return package_summary

def process_codebase(self, codebase, package_summary, **kwargs):
"""
Process the codebase.
"""

packages = codebase.attributes.packages
package_resources = get_package_resources(codebase)
package_attributes_map = {}
attributes_to_update = [
'license_clarity_score',
'copyright',
'holder',
'notice_text',
'other_license_expression',
'other_license_expression_spdx'
]

for package in packages:
package_uid = package['package_uid']
if package_uid in package_resources:
package_resource = [resource for resource in package_resources[package_uid]]

scoring_elements, package_attrs, _= compute_license_score(resources=package_resource)
license_clarity_score= scoring_elements.to_dict()
package_attributes_map[package_uid] = {
'license_clarity_score': license_clarity_score,
'copyright': package_attrs.copyright,
'holder': package_attrs.holder,
'notice_text': package_attrs.notice_text,
'other_license_expression': package_attrs.other_license_expression,
'other_license_expression_spdx': package_attrs.other_license_expression_spdx
}
if package_uid in package_attributes_map:
package_attrs = package_attributes_map[package_uid]
for attribute in attributes_to_update:
package[attribute] = package_attrs[attribute]


def add_license_from_file(resource, codebase):
"""
Given a Resource, check if the detected package_data doesn't have license detections
Expand Down Expand Up @@ -352,7 +431,7 @@ def get_installed_packages(root_dir, processes=2, **kwargs):
yield from packages_by_uid.values()


def create_package_and_deps(codebase, package_adder=add_to_package, strip_root=False, **kwargs):
def create_package_and_deps(codebase, package_summary=False , package_adder=add_to_package, strip_root=False, **kwargs):
"""
Create and save top-level Package and Dependency from the parsed
package data present in the codebase.
Expand All @@ -363,8 +442,10 @@ def create_package_and_deps(codebase, package_adder=add_to_package, strip_root=F
strip_root=strip_root,
**kwargs
)

codebase.attributes.packages.extend(package.to_dict() for package in packages)
codebase.attributes.packages.extend(
package.to_dict(package_summary= package_summary)
for package in packages
)
codebase.attributes.dependencies.extend(dep.to_dict() for dep in dependencies)


Expand Down
13 changes: 11 additions & 2 deletions src/packagedcode/pypi.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,8 +435,17 @@ def assign_package_to_resources(cls, package, resource, codebase, package_adder)
)
if ref_resource and package_uid:
package_adder(package_uid, ref_resource, codebase)


@classmethod
def get_top_level_resources(cls, manifest_resource, codebase):
if '.dist-info' in manifest_resource.path:
path_segments = manifest_resource.path.split('.dist-info')
leading_segment = path_segments[0].strip()
dist_info_dir_path = f'{leading_segment}.dist-info'
meta_inf_resource = codebase.get_resource(dist_info_dir_path)
if meta_inf_resource:
yield meta_inf_resource
yield from meta_inf_resource.walk(codebase)

def get_resource_for_path(path, root, codebase):
"""
Return a resource in ``codebase`` that has a ``path`` relative to the
Expand Down
Loading
Loading