Skip to content

License cache not always reused #4273

Open
@MarkusObendrauf

Description

@MarkusObendrauf

Description

The licensedcode cache is not always utilized when running multiple processes in parallel. This was noticed while running stress-tests on scancode. We observed that, when multiple tests were started in separate processes at the same time, each process would separately build its own cache instead of using the existing one. This had a considerable performance cost, eventually leading to a LockTimeout.

The root cause is in licensedcode/cache.py: After a process obtains a lock, it should check if another thread has already built the cache, but it does not.

How To Reproduce

This was noticed when stress-testing a local test:

        with NamedTemporaryFile() as test_file:
            test_contents = bytes(MIT_LICENSE_TEXT.encode("utf-8"))
            test_file.write(test_contents)
            test_file.seek(0)
            results = get_licenses(test_file.name)  # slow
            license_expression = results["detected_license_expression"]
            self.assertEqual(license_expression, "mit") 

We ran this on 100 processes in parallel.

Traceback (most recent call last):
  File "scancode/api.py", line 200, in get_licenses
    for detection in detections:
  File "licensedcode/detection.py", line 1947, in detect_licenses
    index = cache.get_index()
  File "licensedcode/cache.py", line 459, in get_index
    return get_cache(
  File "licensedcode/cache.py", line 399, in get_cache
    return populate_cache(
  File "licensedcode/cache.py", line 419, in populate_cache
    _LICENSE_CACHE = LicenseCache.load_or_build(
  File "licensedcode/cache.py", line 136, in load_or_build
    with lockfile.FileLock(lock_file).locked(timeout=timeout):
  File "runtime/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "scancode/lockfile.py", line 29, in locked
    raise LockTimeout(timeout)
scancode.lockfile.LockTimeout: 360

System configuration

  • OS: Linux
  • What version of scancode-toolkit was used to generate the scan file? scancode-toolkit-mini 32.3.2
  • What installation method was used to install/run scancode? pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions