Include usage of filter_size for importing logs, user, computer #6889

agoscinski · 2025-05-21T13:04:55Z

Sqlite only accepts a limited number of parameters so we reduce the query into chunks. For that the parameter filter_size is already provided but is not used in every use case.

Building the query for the matching nodes from the backend can for surpass the sqlite limit of allowed query parameters and thus result in failure when importing the archive. In this PR we implement the usage of the filter_size which limits the number of parameters in the query for importing logs, user and computers.

Fixes #6876 (I think both are only due to _import_logs not using filter_size)

Since this does not fully cover the usage of filter_size in all queries but seems to already fix #6876 I opened a new issue #6907 to keep this in mind.

src/aiida/cmdline/commands/cmd_archive.py

codecov · 2025-05-21T15:38:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.59%. Comparing base (8dd0948) to head (6ce00ca).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6889      +/-   ##
==========================================
+ Coverage   78.58%   78.59%   +0.01%     
==========================================
  Files         564      564              
  Lines       43130    43128       -2     
==========================================
+ Hits        33890    33891       +1     
+ Misses       9240     9237       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

giovannipizzi · 2025-05-22T08:55:16Z

I suggest to put a batch of ~950 (in any case < 999), see my discussion here (good to maybe copy the discussion also here):

https://github.com/aiidateam/disk-objectstore/blob/73f14e3953c276268b7f7395ad385e693798e247/disk_objectstore/container.py#L97-L107

agoscinski · 2025-05-22T09:02:23Z

I suggest to put a batch of ~950 (in any case < 999), see my discussion here (good to maybe copy the discussion also here):

This was quick-and-dirty to try out. My idea was to determine the limit beforehand and use this. This script I got from ChatGPT but did not verified it yet.

import sqlite3

conn = sqlite3.connect(":memory:")
cur = conn.cursor()

# Try increasing numbers until an error occurs
max_vars = 1
try:
    while True:
        placeholders = ",".join(["?"] * max_vars)
        query = f"SELECT {placeholders}"
        cur.execute(query, [None] * max_vars)
        max_vars += 1
except sqlite3.OperationalError as e:
    print(f"Maximum number of variables supported: {max_vars - 1}")

EDIT: The script seems not really nice. I think the global variable with the doc that @giovannipizzi posted is nicer solution.

EDIT2: There is the filter_size: int = 999, parameter in our code already implementing this kind of logic. Need to check if we can unify.

agoscinski · 2025-06-05T14:53:17Z

The usage of of the filter_size also needs to be implemented in _merge_node_extras, all the other _import_* functions.

agoscinski · 2025-06-05T17:05:34Z

src/aiida/tools/archive/imports.py

+                    .append(orm.User, filters={'email': {'in': chunk}}, project=['email', 'id'])
+                    .all(batch_size=query_params.batch_size)
+                )
+                for _, chunk in batch_iter(set(input_id_email.values()), query_params.filter_size)


Note that I use set to only get a set of unique uuids, so that the query and thus the results will not contain duplicates. I don't think this will ever happen but I wanted to be better safe here. Also I guess the subsequent code can also deal with duplicates but I am not sure

agoscinski · 2025-06-05T17:06:37Z

src/aiida/tools/archive/imports.py

+                for _, chunk in batch_iter(set(input_id_uuid.values()), query_params.filter_size)
+            ]
+            for key, value in query_results.items()
+        }


for import nodes, it is just a rewrite, list comprehension should be faster

eimrek

Thanks a lot @agoscinski. Tested also on my end, and works well. 👍

Building the query for the matching nodes from the backend can for surpass the sqlite limit of allowed query parameters and thus result in failure when importing the archive. In this PR we impement the usage of the `filter_size` which limits the number of parameters in the query for importing logs, user and computers.

agoscinski commented May 21, 2025

View reviewed changes

agoscinski added this to aiida-core v2.7.0 May 22, 2025

agoscinski force-pushed the fix/sqlite-too-large-query-parameters branch 4 times, most recently from df574cb to 462e01d Compare June 5, 2025 14:50

agoscinski force-pushed the fix/sqlite-too-large-query-parameters branch from 462e01d to 4105c61 Compare June 5, 2025 17:01

agoscinski commented Jun 5, 2025

View reviewed changes

agoscinski changed the title ~~Reduce query parameter size for sqlite~~ Include usage of filter_size for importing logs, user, computer Jun 5, 2025

agoscinski marked this pull request as ready for review June 5, 2025 17:08

agoscinski mentioned this pull request Jun 5, 2025

Implement usage of filter_size for all import archive functions #6907

Open

4 tasks

agoscinski requested a review from eimrek June 5, 2025 17:12

eimrek approved these changes Jun 6, 2025

View reviewed changes

agoscinski force-pushed the fix/sqlite-too-large-query-parameters branch from 4105c61 to 6ce00ca Compare June 6, 2025 18:11

agoscinski merged commit ee87c79 into main Jun 6, 2025
18 checks passed

agoscinski deleted the fix/sqlite-too-large-query-parameters branch June 6, 2025 19:24

github-project-automation bot moved this to Done in aiida-core v2.7.0 Jun 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include usage of filter_size for importing logs, user, computer #6889

Include usage of filter_size for importing logs, user, computer #6889

Uh oh!

agoscinski commented May 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 21, 2025 •

edited

Loading

Uh oh!

giovannipizzi commented May 22, 2025

Uh oh!

agoscinski commented May 22, 2025 •

edited

Loading

Uh oh!

agoscinski commented Jun 5, 2025

Uh oh!

agoscinski Jun 5, 2025 •

edited

Loading

Uh oh!

agoscinski Jun 5, 2025

Uh oh!

eimrek left a comment

Uh oh!

Uh oh!

Uh oh!

Include usage of filter_size for importing logs, user, computer #6889

Include usage of filter_size for importing logs, user, computer #6889

Uh oh!

Conversation

agoscinski commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

giovannipizzi commented May 22, 2025

Uh oh!

agoscinski commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agoscinski commented Jun 5, 2025

Uh oh!

agoscinski Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agoscinski Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

eimrek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

agoscinski commented May 21, 2025 •

edited

Loading

codecov bot commented May 21, 2025 •

edited

Loading

agoscinski commented May 22, 2025 •

edited

Loading

agoscinski Jun 5, 2025 •

edited

Loading