feat: plumb auto sync permission attempts to celery tasks #5407

trial-danswer · 2025-09-12T18:26:31Z

Description

Partial completion of: https://linear.app/danswer/issue/DAN-2494/sync-permissions-database-and-backend-updates

Modify existing permission sync tasks to create/update attempt records. This just plumbs in the attempt tracking logic and adds some integration tests.

PR Stack:

feat: tables/migration for permission syncing attempts
feat: basic db methods to create and update permission sync attempts
feat: plumb auto sync permission attempts to celery tasks -> you are here
feat: read latest permission sync from the frontend

How Has This Been Tested?

integration tests and some manual testing.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

This PR should be backported (make sure to check that the backport attempt succeeds)
[Optional] Override Linear Check

Summary by cubic

Adds attempt-level tracking for permission syncs (documents and external groups) and wires it into Celery tasks. Improves observability with clear statuses and progress metrics, aligning with Linear DAN-2494.

New Features
- Added PermissionSyncStatus enum and two tables: doc_permission_sync_attempt and external_group_permission_sync_attempt.
- Implemented CRUD helpers to create, update progress, and mark attempts (in_progress, success, failed, completed_with_errors) with row locks and telemetry.
- Integrated with connector_permission_sync_generator_task and external group sync to create attempts, track progress (docs/errors, users/groups/memberships), and set final status on success/failure.
- Added unit tests for CRUD and integration tests to verify attempt lifecycle.
Migration
- Run Alembic migration 03d710ccf29c to add tables and indexes. No manual backfill required.

vercel · 2025-09-12T18:26:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
internal-search	Ready	Preview	Comment	Sep 13, 2025 0:08am

greptile-apps

Greptile Summary

This PR adds comprehensive attempt-level tracking for permission synchronization operations as part of Linear issue DAN-2494. The changes integrate new database tables and CRUD operations into existing Celery tasks to provide visibility into sync progress and status.

The implementation follows a consistent pattern across both document permission syncing and external group syncing tasks. For each sync operation, the code now:

Creates a sync attempt record at the start with create_external_group_sync_attempt() or similar functions
Marks the attempt as in-progress using mark_external_group_sync_attempt_in_progress()
Tracks progress metrics throughout execution (documents synced, errors encountered, users/groups processed)
Sets final status as SUCCESS, FAILED, or COMPLETED_WITH_ERRORS based on outcome

The changes integrate with the existing permission sync infrastructure in ee/onyx/background/celery/tasks/ without disrupting the core sync logic. The attempt tracking uses separate database sessions for record management and includes proper error formatting via format_error_for_logging(). Three comprehensive integration tests validate the full lifecycle of attempt records, including successful syncs, failure scenarios, and status transitions.

This enhancement significantly improves observability for enterprise customers who need to monitor and troubleshoot permission synchronization issues, providing detailed metrics and clear status reporting for all sync operations.

Confidence score: 3/5

This PR requires careful review due to potential database transaction and exception handling issues
Score reflects concerns about orphaned attempt records and database session management patterns
Pay close attention to the exception handling logic in external_group_syncing/tasks.py lines 467-575

_{2 files reviewed, 4 comments}

_{Edit Code Review Bot Settings | Greptile}

greptile-apps · 2025-09-13T00:52:08Z

backend/ee/onyx/background/celery/tasks/external_group_syncing/tasks.py

+    # Create attempt record at the start
+    with get_session_with_current_tenant() as db_session:
+        attempt_id = create_external_group_sync_attempt(
+            connector_credential_pair_id=cc_pair_id,
+            db_session=db_session,
+        )
+        logger.info(
+            f"Created external group sync attempt: {attempt_id} for cc_pair={cc_pair_id}"
+        )
+


logic: Creating the attempt record in a separate transaction could lead to orphaned records if the main sync fails before marking as in_progress. Consider creating and marking as in_progress in the same transaction.

greptile-apps · 2025-09-13T00:52:09Z

backend/ee/onyx/background/celery/tasks/external_group_syncing/tasks.py

+                total_groups_processed += 1
+                total_users_processed += len(external_user_group.user_emails)
+                total_group_memberships_synced += len(external_user_group.user_emails)
+


logic: Progress tracking counts each user multiple times - once in total_users_processed and again in total_group_memberships_synced. This creates confusing metrics where memberships equal users.

greptile-apps · 2025-09-13T00:52:09Z

backend/tests/integration/tests/indexing/test_initial_permission_sync.py

+        except Exception:
+            pass


style: Broad exception handling may mask specific test failures. Consider catching more specific exceptions or at least logging the exception details.

cubic-dev-ai

4 issues found across 3 files

Prompt for AI agents (all 4 issues)


Understand the root cause of the following 4 issues and fix them.


<file name="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py">

<violation number="1" location="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:395">
Creating the attempt before acquiring the sync lock and validating fencing causes orphan NOT_STARTED attempts when the task exits early (e.g., lock not acquired or early return paths). Create the attempt only after acquiring the lock and confirming the task will proceed, or ensure early-return/failure paths update the attempt status.</violation>

<violation number="2" location="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:574">
Avoid logging raw exception strings in messages; they can expose sensitive data. Use a sanitized message or omit the exception text since logger.exception includes the traceback.

(Based on your team&#39;s feedback about avoiding logging raw exception strings that may include temporary auth tokens.)</violation>

<violation number="3" location="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:601">
Marking the attempt as succeeded/completed-with-errors before finalizing remaining operations risks overwriting the status to FAILED if a subsequent step throws. Move the status update after all non-trivial operations (e.g., after generator_complete) or guard the exception path from overriding a terminal status.</violation>
</file>

<file name="backend/tests/integration/tests/indexing/test_initial_permission_sync.py">

<violation number="1" location="backend/tests/integration/tests/indexing/test_initial_permission_sync.py:281">
Failure-path test uses the default long timeout and full sync waiting, increasing runtime and flakiness; set a short timeout to fail fast.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-dev-ai · 2025-09-13T01:03:33Z

backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py

+                    f"Marked doc permission sync attempt {attempt_id} as completed with errors"
+                )
+            else:
+                mark_doc_permission_sync_attempt_succeeded(attempt_id, db_session)


Marking the attempt as succeeded/completed-with-errors before finalizing remaining operations risks overwriting the status to FAILED if a subsequent step throws. Move the status update after all non-trivial operations (e.g., after generator_complete) or guard the exception path from overriding a terminal status.

Prompt for AI agents

Address the following comment on backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py at line 601: <comment>Marking the attempt as succeeded/completed-with-errors before finalizing remaining operations risks overwriting the status to FAILED if a subsequent step throws. Move the status update after all non-trivial operations (e.g., after generator_complete) or guard the exception path from overriding a terminal status.</comment> <file context> @@ -533,22 +556,53 @@ def fetch_all_existing_docs_ids_fn() -> list[str]: + f"Marked doc permission sync attempt {attempt_id} as completed with errors" + ) + else: + mark_doc_permission_sync_attempt_succeeded(attempt_id, db_session) + task_logger.info( + f"Marked doc permission sync attempt {attempt_id} as succeeded" </file context>

cubic-dev-ai · 2025-09-13T01:03:33Z

backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py

+                except Exception as e:
+                    docs_with_errors += 1
+                    task_logger.exception(
+                        f"Error updating permissions for doc {doc_external_access.doc_id}: {e}"


Avoid logging raw exception strings in messages; they can expose sensitive data. Use a sanitized message or omit the exception text since logger.exception includes the traceback.

(Based on your team's feedback about avoiding logging raw exception strings that may include temporary auth tokens.)

Prompt for AI agents

Address the following comment on backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py at line 574: <comment>Avoid logging raw exception strings in messages; they can expose sensitive data. Use a sanitized message or omit the exception text since logger.exception includes the traceback. (Based on your team's feedback about avoiding logging raw exception strings that may include temporary auth tokens.)</comment> <file context> @@ -533,22 +556,53 @@ def fetch_all_existing_docs_ids_fn() -> list[str]: + except Exception as e: + docs_with_errors += 1 + task_logger.exception( + f"Error updating permissions for doc {doc_external_access.doc_id}: {e}" + ) + # Continue processing other documents </file context>

cubic-dev-ai · 2025-09-13T01:03:33Z

backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py

    doc_permission_sync_ctx.set(doc_permission_sync_ctx_dict)

+    with get_session_with_current_tenant() as db_session:
+        attempt_id = create_doc_permission_sync_attempt(


Creating the attempt before acquiring the sync lock and validating fencing causes orphan NOT_STARTED attempts when the task exits early (e.g., lock not acquired or early return paths). Create the attempt only after acquiring the lock and confirming the task will proceed, or ensure early-return/failure paths update the attempt status.

Prompt for AI agents

Address the following comment on backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py at line 395: <comment>Creating the attempt before acquiring the sync lock and validating fencing causes orphan NOT_STARTED attempts when the task exits early (e.g., lock not acquired or early return paths). Create the attempt only after acquiring the lock and confirming the task will proceed, or ensure early-return/failure paths update the attempt status.</comment> <file context> @@ -379,6 +391,15 @@ def connector_permission_sync_generator_task( doc_permission_sync_ctx.set(doc_permission_sync_ctx_dict) + with get_session_with_current_tenant() as db_session: + attempt_id = create_doc_permission_sync_attempt( + connector_credential_pair_id=cc_pair_id, + db_session=db_session, </file context>

cubic-dev-ai · 2025-09-13T01:03:33Z

backend/tests/integration/tests/indexing/test_initial_permission_sync.py

+            CCPairManager.wait_for_sync(
+                cc_pair=cc_pair,
+                after=index_attempt.time_created,
+                number_of_updated_docs=0,


Failure-path test uses the default long timeout and full sync waiting, increasing runtime and flakiness; set a short timeout to fail fast.

Prompt for AI agents

Address the following comment on backend/tests/integration/tests/indexing/test_initial_permission_sync.py at line 281: <comment>Failure-path test uses the default long timeout and full sync waiting, increasing runtime and flakiness; set a short timeout to fail fast.</comment> <file context> @@ -128,3 +132,237 @@ def test_mock_connector_initial_permission_sync( + CCPairManager.wait_for_sync( + cc_pair=cc_pair, + after=index_attempt.time_created, + number_of_updated_docs=0, + user_performing_action=admin_user, + ) </file context>

trial-danswer · 2025-09-13T01:22:05Z

backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py

+            docs_with_errors = 0
            for doc_external_access in document_external_accesses:
-                redis_connector.permissions.update_db(
-                    lock=lock,
-                    new_permissions=[doc_external_access],
-                    source_string=source_type,
-                    connector_id=cc_pair.connector.id,
-                    credential_id=cc_pair.credential.id,
-                    task_logger=task_logger,
-                )
-                tasks_generated += 1
+                try:
+                    redis_connector.permissions.update_db(
+                        lock=lock,
+                        new_permissions=[doc_external_access],
+                        source_string=source_type,
+                        connector_id=cc_pair.connector.id,
+                        credential_id=cc_pair.credential.id,
+                        task_logger=task_logger,
+                    )
+                    tasks_generated += 1
+                except Exception as e:
+                    docs_with_errors += 1
+                    task_logger.exception(
+                        f"Error updating permissions for doc {doc_external_access.doc_id}: {e}"


wrap try catch around for loop for doc permission syncing

trial-danswer changed the base branch from main to nikg/permission-sync/backend-crud September 12, 2025 18:27

trial-danswer force-pushed the nikg/permission-sync/backend-crud branch from 77e8987 to 7e65811 Compare September 12, 2025 22:07

trial-danswer force-pushed the nikg/permision-sync/add-attempts-to-perm-sync-tasks branch from a69ab93 to 2e80684 Compare September 12, 2025 22:18

vercel bot deployed to Preview September 12, 2025 22:21 View deployment

trial-danswer mentioned this pull request Sep 12, 2025

feat: read latest permission sync from the frontend #5411

Open

2 tasks

trial-danswer changed the title ~~feat: hook up auto sync permission attempts to celery tasks~~ feat: plumb auto sync permission attempts to celery tasks Sep 12, 2025

trial-danswer force-pushed the nikg/permission-sync/backend-crud branch from 7e65811 to a8b4292 Compare September 12, 2025 23:40

trial-danswer force-pushed the nikg/permision-sync/add-attempts-to-perm-sync-tasks branch from 2e80684 to 298563f Compare September 12, 2025 23:43

vercel bot deployed to Preview September 12, 2025 23:46 View deployment

hook up permission syncs to celery task

3994120

trial-danswer force-pushed the nikg/permision-sync/add-attempts-to-perm-sync-tasks branch from 298563f to 3994120 Compare September 13, 2025 00:04

vercel bot deployed to Preview September 13, 2025 00:08 View deployment

trial-danswer marked this pull request as ready for review September 13, 2025 00:50

trial-danswer requested a review from a team as a code owner September 13, 2025 00:50

greptile-apps bot reviewed Sep 13, 2025

View reviewed changes

cubic-dev-ai bot reviewed Sep 13, 2025

View reviewed changes

trial-danswer commented Sep 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: plumb auto sync permission attempts to celery tasks #5407

feat: plumb auto sync permission attempts to celery tasks #5407

Uh oh!

trial-danswer commented Sep 12, 2025 •

edited

Loading

Uh oh!

vercel bot commented Sep 12, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Sep 13, 2025

Uh oh!

greptile-apps bot Sep 13, 2025

Uh oh!

greptile-apps bot Sep 13, 2025

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Sep 13, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Sep 13, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Sep 13, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Sep 13, 2025 •

edited

Loading

Uh oh!

trial-danswer Sep 13, 2025

Uh oh!

Uh oh!

feat: plumb auto sync permission attempts to celery tasks #5407

Are you sure you want to change the base?

feat: plumb auto sync permission attempts to celery tasks #5407

Uh oh!

Conversation

trial-danswer commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Backporting (check the box to trigger backport action)

Summary by cubic

Uh oh!

vercel bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 3/5

Uh oh!

greptile-apps bot Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trial-danswer Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

trial-danswer commented Sep 12, 2025 •

edited

Loading

vercel bot commented Sep 12, 2025 •

edited

Loading

cubic-dev-ai bot Sep 13, 2025 •

edited

Loading

cubic-dev-ai bot Sep 13, 2025 •

edited

Loading

cubic-dev-ai bot Sep 13, 2025 •

edited

Loading

cubic-dev-ai bot Sep 13, 2025 •

edited

Loading