Skip to content

Conversation

trial-danswer
Copy link
Collaborator

@trial-danswer trial-danswer commented Sep 12, 2025

Description

Partial completion of: https://linear.app/danswer/issue/DAN-2494/sync-permissions-database-and-backend-updates

Modify existing permission sync tasks to create/update attempt records. This just plumbs in the attempt tracking logic and adds some integration tests.

PR Stack:

  1. feat: tables/migration for permission syncing attempts
  2. feat: basic db methods to create and update permission sync attempts
  3. feat: plumb auto sync permission attempts to celery tasks -> you are here
  4. feat: read latest permission sync from the frontend

How Has This Been Tested?

integration tests and some manual testing.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Adds attempt-level tracking for permission syncs (documents and external groups) and wires it into Celery tasks. Improves observability with clear statuses and progress metrics, aligning with Linear DAN-2494.

  • New Features

    • Added PermissionSyncStatus enum and two tables: doc_permission_sync_attempt and external_group_permission_sync_attempt.
    • Implemented CRUD helpers to create, update progress, and mark attempts (in_progress, success, failed, completed_with_errors) with row locks and telemetry.
    • Integrated with connector_permission_sync_generator_task and external group sync to create attempts, track progress (docs/errors, users/groups/memberships), and set final status on success/failure.
    • Added unit tests for CRUD and integration tests to verify attempt lifecycle.
  • Migration

    • Run Alembic migration 03d710ccf29c to add tables and indexes. No manual backfill required.

Copy link

vercel bot commented Sep 12, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
internal-search Ready Ready Preview Comment Sep 13, 2025 0:08am

@trial-danswer trial-danswer changed the base branch from main to nikg/permission-sync/backend-crud September 12, 2025 18:27
@trial-danswer trial-danswer force-pushed the nikg/permission-sync/backend-crud branch from 77e8987 to 7e65811 Compare September 12, 2025 22:07
@trial-danswer trial-danswer force-pushed the nikg/permision-sync/add-attempts-to-perm-sync-tasks branch from a69ab93 to 2e80684 Compare September 12, 2025 22:18
@trial-danswer trial-danswer changed the title feat: hook up auto sync permission attempts to celery tasks feat: plumb auto sync permission attempts to celery tasks Sep 12, 2025
@trial-danswer trial-danswer force-pushed the nikg/permission-sync/backend-crud branch from 7e65811 to a8b4292 Compare September 12, 2025 23:40
@trial-danswer trial-danswer force-pushed the nikg/permision-sync/add-attempts-to-perm-sync-tasks branch from 2e80684 to 298563f Compare September 12, 2025 23:43
@trial-danswer trial-danswer force-pushed the nikg/permision-sync/add-attempts-to-perm-sync-tasks branch from 298563f to 3994120 Compare September 13, 2025 00:04
@trial-danswer trial-danswer marked this pull request as ready for review September 13, 2025 00:50
@trial-danswer trial-danswer requested a review from a team as a code owner September 13, 2025 00:50
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR adds comprehensive attempt-level tracking for permission synchronization operations as part of Linear issue DAN-2494. The changes integrate new database tables and CRUD operations into existing Celery tasks to provide visibility into sync progress and status.

The implementation follows a consistent pattern across both document permission syncing and external group syncing tasks. For each sync operation, the code now:

  1. Creates a sync attempt record at the start with create_external_group_sync_attempt() or similar functions
  2. Marks the attempt as in-progress using mark_external_group_sync_attempt_in_progress()
  3. Tracks progress metrics throughout execution (documents synced, errors encountered, users/groups processed)
  4. Sets final status as SUCCESS, FAILED, or COMPLETED_WITH_ERRORS based on outcome

The changes integrate with the existing permission sync infrastructure in ee/onyx/background/celery/tasks/ without disrupting the core sync logic. The attempt tracking uses separate database sessions for record management and includes proper error formatting via format_error_for_logging(). Three comprehensive integration tests validate the full lifecycle of attempt records, including successful syncs, failure scenarios, and status transitions.

This enhancement significantly improves observability for enterprise customers who need to monitor and troubleshoot permission synchronization issues, providing detailed metrics and clear status reporting for all sync operations.

Confidence score: 3/5

  • This PR requires careful review due to potential database transaction and exception handling issues
  • Score reflects concerns about orphaned attempt records and database session management patterns
  • Pay close attention to the exception handling logic in external_group_syncing/tasks.py lines 467-575

2 files reviewed, 4 comments

Edit Code Review Bot Settings | Greptile

Comment on lines +467 to +476
# Create attempt record at the start
with get_session_with_current_tenant() as db_session:
attempt_id = create_external_group_sync_attempt(
connector_credential_pair_id=cc_pair_id,
db_session=db_session,
)
logger.info(
f"Created external group sync attempt: {attempt_id} for cc_pair={cc_pair_id}"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Creating the attempt record in a separate transaction could lead to orphaned records if the main sync fails before marking as in_progress. Consider creating and marking as in_progress in the same transaction.

Comment on lines +522 to +525
total_groups_processed += 1
total_users_processed += len(external_user_group.user_emails)
total_group_memberships_synced += len(external_user_group.user_emails)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Progress tracking counts each user multiple times - once in total_users_processed and again in total_group_memberships_synced. This creates confusing metrics where memberships equal users.

Comment on lines +284 to +285
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Broad exception handling may mask specific test failures. Consider catching more specific exceptions or at least logging the exception details.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 3 files

Prompt for AI agents (all 4 issues)

Understand the root cause of the following 4 issues and fix them.


<file name="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py">

<violation number="1" location="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:395">
Creating the attempt before acquiring the sync lock and validating fencing causes orphan NOT_STARTED attempts when the task exits early (e.g., lock not acquired or early return paths). Create the attempt only after acquiring the lock and confirming the task will proceed, or ensure early-return/failure paths update the attempt status.</violation>

<violation number="2" location="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:574">
Avoid logging raw exception strings in messages; they can expose sensitive data. Use a sanitized message or omit the exception text since logger.exception includes the traceback.

(Based on your team&#39;s feedback about avoiding logging raw exception strings that may include temporary auth tokens.)</violation>

<violation number="3" location="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:601">
Marking the attempt as succeeded/completed-with-errors before finalizing remaining operations risks overwriting the status to FAILED if a subsequent step throws. Move the status update after all non-trivial operations (e.g., after generator_complete) or guard the exception path from overriding a terminal status.</violation>
</file>

<file name="backend/tests/integration/tests/indexing/test_initial_permission_sync.py">

<violation number="1" location="backend/tests/integration/tests/indexing/test_initial_permission_sync.py:281">
Failure-path test uses the default long timeout and full sync waiting, increasing runtime and flakiness; set a short timeout to fail fast.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

f"Marked doc permission sync attempt {attempt_id} as completed with errors"
)
else:
mark_doc_permission_sync_attempt_succeeded(attempt_id, db_session)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking the attempt as succeeded/completed-with-errors before finalizing remaining operations risks overwriting the status to FAILED if a subsequent step throws. Move the status update after all non-trivial operations (e.g., after generator_complete) or guard the exception path from overriding a terminal status.

Prompt for AI agents
Address the following comment on backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py at line 601:

<comment>Marking the attempt as succeeded/completed-with-errors before finalizing remaining operations risks overwriting the status to FAILED if a subsequent step throws. Move the status update after all non-trivial operations (e.g., after generator_complete) or guard the exception path from overriding a terminal status.</comment>

<file context>
@@ -533,22 +556,53 @@ def fetch_all_existing_docs_ids_fn() -&gt; list[str]:
+                    f&quot;Marked doc permission sync attempt {attempt_id} as completed with errors&quot;
+                )
+            else:
+                mark_doc_permission_sync_attempt_succeeded(attempt_id, db_session)
+                task_logger.info(
+                    f&quot;Marked doc permission sync attempt {attempt_id} as succeeded&quot;
</file context>
Fix with Cubic

except Exception as e:
docs_with_errors += 1
task_logger.exception(
f"Error updating permissions for doc {doc_external_access.doc_id}: {e}"
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid logging raw exception strings in messages; they can expose sensitive data. Use a sanitized message or omit the exception text since logger.exception includes the traceback.

(Based on your team's feedback about avoiding logging raw exception strings that may include temporary auth tokens.)

Prompt for AI agents
Address the following comment on backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py at line 574:

<comment>Avoid logging raw exception strings in messages; they can expose sensitive data. Use a sanitized message or omit the exception text since logger.exception includes the traceback.

(Based on your team&#39;s feedback about avoiding logging raw exception strings that may include temporary auth tokens.)</comment>

<file context>
@@ -533,22 +556,53 @@ def fetch_all_existing_docs_ids_fn() -&gt; list[str]:
+                except Exception as e:
+                    docs_with_errors += 1
+                    task_logger.exception(
+                        f&quot;Error updating permissions for doc {doc_external_access.doc_id}: {e}&quot;
+                    )
+                    # Continue processing other documents
</file context>
Fix with Cubic

doc_permission_sync_ctx.set(doc_permission_sync_ctx_dict)

with get_session_with_current_tenant() as db_session:
attempt_id = create_doc_permission_sync_attempt(
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the attempt before acquiring the sync lock and validating fencing causes orphan NOT_STARTED attempts when the task exits early (e.g., lock not acquired or early return paths). Create the attempt only after acquiring the lock and confirming the task will proceed, or ensure early-return/failure paths update the attempt status.

Prompt for AI agents
Address the following comment on backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py at line 395:

<comment>Creating the attempt before acquiring the sync lock and validating fencing causes orphan NOT_STARTED attempts when the task exits early (e.g., lock not acquired or early return paths). Create the attempt only after acquiring the lock and confirming the task will proceed, or ensure early-return/failure paths update the attempt status.</comment>

<file context>
@@ -379,6 +391,15 @@ def connector_permission_sync_generator_task(
     doc_permission_sync_ctx.set(doc_permission_sync_ctx_dict)
 
+    with get_session_with_current_tenant() as db_session:
+        attempt_id = create_doc_permission_sync_attempt(
+            connector_credential_pair_id=cc_pair_id,
+            db_session=db_session,
</file context>
Fix with Cubic

CCPairManager.wait_for_sync(
cc_pair=cc_pair,
after=index_attempt.time_created,
number_of_updated_docs=0,
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure-path test uses the default long timeout and full sync waiting, increasing runtime and flakiness; set a short timeout to fail fast.

Prompt for AI agents
Address the following comment on backend/tests/integration/tests/indexing/test_initial_permission_sync.py at line 281:

<comment>Failure-path test uses the default long timeout and full sync waiting, increasing runtime and flakiness; set a short timeout to fail fast.</comment>

<file context>
@@ -128,3 +132,237 @@ def test_mock_connector_initial_permission_sync(
+            CCPairManager.wait_for_sync(
+                cc_pair=cc_pair,
+                after=index_attempt.time_created,
+                number_of_updated_docs=0,
+                user_performing_action=admin_user,
+            )
</file context>
Fix with Cubic

Comment on lines +559 to +574
docs_with_errors = 0
for doc_external_access in document_external_accesses:
redis_connector.permissions.update_db(
lock=lock,
new_permissions=[doc_external_access],
source_string=source_type,
connector_id=cc_pair.connector.id,
credential_id=cc_pair.credential.id,
task_logger=task_logger,
)
tasks_generated += 1
try:
redis_connector.permissions.update_db(
lock=lock,
new_permissions=[doc_external_access],
source_string=source_type,
connector_id=cc_pair.connector.id,
credential_id=cc_pair.credential.id,
task_logger=task_logger,
)
tasks_generated += 1
except Exception as e:
docs_with_errors += 1
task_logger.exception(
f"Error updating permissions for doc {doc_external_access.doc_id}: {e}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap try catch around for loop for doc permission syncing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant