-
Notifications
You must be signed in to change notification settings - Fork 226
Include usage of filter_size for importing logs, user, computer #6889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6889 +/- ##
==========================================
+ Coverage 78.58% 78.59% +0.01%
==========================================
Files 564 564
Lines 43130 43128 -2
==========================================
+ Hits 33890 33891 +1
+ Misses 9240 9237 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I suggest to put a batch of ~950 (in any case < 999), see my discussion here (good to maybe copy the discussion also here): |
This was quick-and-dirty to try out. My idea was to determine the limit beforehand and use this. This script I got from ChatGPT but did not verified it yet. import sqlite3
conn = sqlite3.connect(":memory:")
cur = conn.cursor()
# Try increasing numbers until an error occurs
max_vars = 1
try:
while True:
placeholders = ",".join(["?"] * max_vars)
query = f"SELECT {placeholders}"
cur.execute(query, [None] * max_vars)
max_vars += 1
except sqlite3.OperationalError as e:
print(f"Maximum number of variables supported: {max_vars - 1}") EDIT: The script seems not really nice. I think the global variable with the doc that @giovannipizzi posted is nicer solution. EDIT2: There is the |
df574cb
to
462e01d
Compare
The usage of of the |
462e01d
to
4105c61
Compare
.append(orm.User, filters={'email': {'in': chunk}}, project=['email', 'id']) | ||
.all(batch_size=query_params.batch_size) | ||
) | ||
for _, chunk in batch_iter(set(input_id_email.values()), query_params.filter_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I use set
to only get a set of unique uuids
, so that the query and thus the results will not contain duplicates. I don't think this will ever happen but I wanted to be better safe here. Also I guess the subsequent code can also deal with duplicates but I am not sure
for _, chunk in batch_iter(set(input_id_uuid.values()), query_params.filter_size) | ||
] | ||
for key, value in query_results.items() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for import nodes, it is just a rewrite, list comprehension should be faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @agoscinski. Tested also on my end, and works well. 👍
Building the query for the matching nodes from the backend can for surpass the sqlite limit of allowed query parameters and thus result in failure when importing the archive. In this PR we impement the usage of the `filter_size` which limits the number of parameters in the query for importing logs, user and computers.
4105c61
to
6ce00ca
Compare
Sqlite only accepts a limited number of parameters so we reduce the query into chunks. For that the parameter
filter_size
is already provided but is not used in every use case.Building the query for the matching nodes from the backend can for surpass the sqlite limit of allowed query parameters and thus result in failure when importing the archive. In this PR we implement the usage of the
filter_size
which limits the number of parameters in the query for importing logs, user and computers.Fixes #6876 (I think both are only due to
_import_logs
not usingfilter_size
)Since this does not fully cover the usage of
filter_size
in all queries but seems to already fix #6876 I opened a new issue #6907 to keep this in mind.