Skip to content

Severe Memory Spike/OOM in export_query_history_task During Chat History Export #4708

@ThomaciousD

Description

@ThomaciousD

Hi,

There is a critical memory issue in the export_query_history_task defined in backend/ee/onyx/background/celery/apps/heavy.py

Triggering the query history export (via /admin/query-history/start-export) or clicking the UI admin "Kickoff Export" button in the "Query History" section

Image

causes the entire history of chat sessions to be loaded and processed in memory, which results in:

  • Massive memory spikes (up to 50 GB in our case)
  • Consistent Out-Of-Memory crashes of the Celery background pod
  • Re-triggering of this job hourly (likely due to Celery retries), compounding the problem if not flushed manually

Probable root cause is In the task export_query_history_task:

fetch_and_process_chat_session_history(

  • limit=None causes the entire database of chat sessions to be fetched.
  • fetch_and_process_chat_session_history() calls create_chat_chain() for every chat session, which is explicitly noted in the code to be very slow and memory-heavy.
  • It builds multiple large Python data structures before streaming to CSV.

We also suspect that:

If the task fails due to OOM, it is retried automatically by Celery every hour (based on logs and pod memory graphs).
This creates a loop of instability and memory crashes of the background process.

So currently, the "Query History" exports for large tenants are currently not feasible.

Temporary Mitigation:
For others encountering this issue, you can flush the csv_generation queue from Redis manually (if using Redis as Celery broker) to stop re-triggering. If the export task was already started, look at the unacked keys in Redis for the "export_query_history_task" and delete it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions