-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Hi,
There is a critical memory issue in the export_query_history_task defined in backend/ee/onyx/background/celery/apps/heavy.py
Triggering the query history export (via /admin/query-history/start-export) or clicking the UI admin "Kickoff Export" button in the "Query History" section
causes the entire history of chat sessions to be loaded and processed in memory, which results in:
- Massive memory spikes (up to 50 GB in our case)
- Consistent Out-Of-Memory crashes of the Celery background pod
- Re-triggering of this job hourly (likely due to Celery retries), compounding the problem if not flushed manually
Probable root cause is In the task export_query_history_task:
fetch_and_process_chat_session_history( |
- limit=None causes the entire database of chat sessions to be fetched.
- fetch_and_process_chat_session_history() calls create_chat_chain() for every chat session, which is explicitly noted in the code to be very slow and memory-heavy.
- It builds multiple large Python data structures before streaming to CSV.
We also suspect that:
If the task fails due to OOM, it is retried automatically by Celery every hour (based on logs and pod memory graphs).
This creates a loop of instability and memory crashes of the background process.
So currently, the "Query History" exports for large tenants are currently not feasible.
Temporary Mitigation:
For others encountering this issue, you can flush the csv_generation queue from Redis manually (if using Redis as Celery broker) to stop re-triggering. If the export task was already started, look at the unacked keys in Redis for the "export_query_history_task" and delete it.