perf: Change query-exporting to use generators instead of expanding fully into memory #4729

raunakab · 2025-05-18T02:44:36Z

Description

Query history exporting fetches the entire history and populates it into memory. This causes significant pressure on memory, and can lead to OOMs for very large datasets.

This PR updates the logic to incrementally fetch pages of data, transform them, and write them to file, instead of fetching the entire dataset at once, transforming each row, and then writing it to file.

I.e., we don't fully collect / materialize the entire dataset, we do it in pages instead.

This PR also parallelizes the chat-session fetching logic (given that chat-session reading does not have any cross-thread dependencies [I think?]).

Addresses: https://linear.app/danswer/issue/DAN-1984/improve-performance-of-query-history-exporting.

How Has This Been Tested?

This is performance change, not a feature-addition. We don't have much (if any) performance testing / benchmarking in our code. Therefore, this is not tested.

The feature still outputs the same output for the same input.

…nto memory

vercel · 2025-05-18T02:44:42Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
internal-search	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 19, 2025 7:13pm

greptile-apps

PR Summary

This PR optimizes query history export by implementing generator-based pagination and parallel processing to reduce memory usage.

Refactored fetch_and_process_chat_session_history in /backend/ee/onyx/server/query_history/api.py to use paginated fetching with PAGE_SIZE=100 instead of loading all data at once
Added parallel processing of chat session snapshots using parallel_yield to improve performance while maintaining thread safety
Modified CSV writing in /backend/ee/onyx/background/celery/apps/heavy.py to process data incrementally as it's fetched rather than materializing entire dataset
Fixed pagination logic bug where the break condition was incorrectly checking for equal instead of less than PAGE_SIZE
Removed redundant list comprehensions and memory-intensive operations throughout the codebase

_{2 file(s) reviewed, 1 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

backend/ee/onyx/server/query_history/api.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

backend/ee/onyx/server/query_history/api.py

backend/ee/onyx/background/celery/apps/heavy.py

…ully into memory (onyx-dot-app#4729) * Change query-exporting to use generators instead of expanding fully into memory * Fix pagination logic Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add type annotation * Add early break if list of chat_sessions is empty --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Change query-exporting to use generators instead of expanding fully i…

b2e71e2

…nto memory

raunakab requested a review from a team as a code owner May 18, 2025 02:44

greptile-apps bot reviewed May 18, 2025

View reviewed changes

backend/ee/onyx/server/query_history/api.py Outdated Show resolved Hide resolved

Fix pagination logic

01dc05c

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

vercel bot deployed to Preview May 18, 2025 02:51 View deployment

raunakab linked an issue May 18, 2025 that may be closed by this pull request

Severe Memory Spike/OOM in export_query_history_task During Chat History Export #4708

Closed

Add type annotation

b701fde

vercel bot deployed to Preview May 18, 2025 04:42 View deployment

raunakab requested review from Weves, rkuo-danswer and evan-onyx May 18, 2025 18:49

rkuo-danswer reviewed May 19, 2025

View reviewed changes

backend/ee/onyx/server/query_history/api.py Show resolved Hide resolved

backend/ee/onyx/background/celery/apps/heavy.py Show resolved Hide resolved

Add early break if list of chat_sessions is empty

5b68f0d

raunakab requested a review from rkuo-danswer May 19, 2025 19:13

vercel bot deployed to Preview May 19, 2025 19:13 View deployment

rkuo-danswer approved these changes May 19, 2025

View reviewed changes

raunakab enabled auto-merge May 19, 2025 19:52

raunakab added this pull request to the merge queue May 19, 2025

Merged via the queue into main with commit fd735c9 May 19, 2025
10 of 11 checks passed

raunakab deleted the perf/query-history-export branch May 19, 2025 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Change query-exporting to use generators instead of expanding fully into memory #4729

perf: Change query-exporting to use generators instead of expanding fully into memory #4729

Uh oh!

raunakab commented May 18, 2025

Uh oh!

vercel bot commented May 18, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perf: Change query-exporting to use generators instead of expanding fully into memory #4729

perf: Change query-exporting to use generators instead of expanding fully into memory #4729

Uh oh!

Conversation

raunakab commented May 18, 2025

Description

How Has This Been Tested?

Uh oh!

vercel bot commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vercel bot commented May 18, 2025 •

edited

Loading