Skip to content

Enhancement request: Automatically create PackageSets for Packages in DejaCode #276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mjherzog opened this issue Mar 6, 2025 · 8 comments
Assignees
Labels
design needed Design details needed to complete the issue enhancement New feature or request PackageSet Related to AboutCode PackageSets

Comments

@mjherzog
Copy link
Member

mjherzog commented Mar 6, 2025

As the volume of Packages and Package Versions is rapidly increasing, the value of a DejaCode Component as a summary record for packages increases accordingly - both as a summary for a project and as a way to group Packages for a Component.
This seems to be an opportunity for machine learning to periodically create or update Component records for Packages stored in DejaCode. It will require design work and an iterative process based on patterns for data package types.

@mjherzog mjherzog added design needed Design details needed to complete the issue enhancement New feature or request labels Mar 6, 2025
@tdruez
Copy link
Contributor

tdruez commented Mar 17, 2025

Note that if you select multiple packages from the package list, and use the "Add to > Component" action, the "Add Component from Package data" feature is available.
This allows the creation of a Component from one or more Packages, combining the common data.
It lacks proper documentation though.

There's also a componentfrompackage management command available to run this Component creation on a whole Dataspace.

@mjherzog
Copy link
Member Author

Philippe reminded me recently about package-sets so it may be the case that it would be more valuable to add package-sets to DejaCode than create Component records.

@mjherzog mjherzog changed the title Enhancement request: Automatically create Component Records for Packages Enhancement request: Automatically create Component or PackageSet Records for Packages Mar 17, 2025
@tdruez
Copy link
Contributor

tdruez commented Mar 18, 2025

@JonoYang Could you provide some resources about package-set implementations and usages so we have a base to look at how we could bring those features into DejaCode? Thanks!

@JonoYang
Copy link
Member

@tdruez

Here's the package set model: https://github.com/aboutcode-org/purldb/blob/main/packagedb/models.py#L1408

It is used at https://github.com/aboutcode-org/purldb/blob/main/packagedb/models.py#L535

This model is used to get sets of packages that are related to one another, e.g. for log4j, this could include packages for the binary(log4j.jar), sources (log4j-sources.jar), and the development repo source package (archive from github). We would put those three packages into the same package set to show that they are related, even though they have different purls.

These packages have an attribute package_content, which categorizes the package into one of seven categories:

    CURATION = 1, "curation"
    PATCH = 2, "patch"
    SOURCE_REPO = 3, "source_repo"
    SOURCE_ARCHIVE = 4, "source_archive"
    BINARY = 5, "binary"
    TEST = 6, "test"
    DOC = 7, "doc"

The package_content field is used when running get_enhanced_package (https://github.com/aboutcode-org/purldb/blob/main/packagedb/api.py#L647). The idea is that for a given package, we can fill in missing fields (license, copyright, etc) from other packages in that package's package set. There is a precedence in which package's fields to use, where the lower the package content int value, the greater precedence it has. For example, given a binary package, we would be able to enhance that data with packages in the package set whose package content are below BINARY (1, 2, 3, 4). 6 and 7 would not be used.

package_content is set by the different collectors that handle a purl input and looks up that package data on the upstream repo. For example, in the case of handling maven purls, the package_content field is set here: https://github.com/aboutcode-org/purldb/blob/main/minecode/collectors/maven.py#L381

We create package sets and add packages to package sets in merge_or_create_package (https://github.com/aboutcode-org/purldb/blob/main/minecode/model_utils.py#L410), where we see if there is an existing package with the same type, namespace, name, version. If there is, we add the new package we are creating to the existing package set. Package sets for packages are not created until there is another package with the same type, namespace, name, version that is going to be added.

We have this management command that creates package for source repos of purls listed in an xlsx sheet: https://github.com/aboutcode-org/purldb/blob/main/packagedb/management/commands/create_source_repo_packages.py

tdruez added a commit that referenced this issue Apr 29, 2025
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Apr 29, 2025
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Apr 29, 2025
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 2, 2025
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 8, 2025
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 9, 2025
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 9, 2025
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue May 9, 2025
@tdruez
Copy link
Contributor

tdruez commented May 9, 2025

First iteration on Package Set merged in #305

It adds a new "Package Set" tab to the Package details view.
This tab displays related packages grouped by their normalized ("plain") Package URL.

Let's keep this ticket open for designing the next steps.

@DennisClark DennisClark added the PackageSet Related to AboutCode PackageSets label May 9, 2025
@DennisClark DennisClark changed the title Enhancement request: Automatically create Component or PackageSet Records for Packages Enhancement request: Automatically create PackageSets for Packages May 9, 2025
@DennisClark DennisClark changed the title Enhancement request: Automatically create PackageSets for Packages Enhancement request: Automatically create PackageSets for Packages in DejaCode May 9, 2025
@DennisClark
Copy link
Member

@mjherzog
Copy link
Member Author

mjherzog commented May 9, 2025

@DennisClark The overall design makes sense. Although there are cases where packages of different types may be related, it makes sense to limit package sets to one type per set and deal elsewhere with the relationship between/among package types.

@DennisClark
Copy link
Member

@mjherzog thanks for the feedback. I still have the big task of suggesting UI improvements in the design; I have some ides, but it will take a little while to articulate them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design needed Design details needed to complete the issue enhancement New feature or request PackageSet Related to AboutCode PackageSets
Projects
None yet
Development

No branches or pull requests

4 participants