-
Notifications
You must be signed in to change notification settings - Fork 2.1k
feat: plumb auto sync permission attempts to celery tasks #5407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: nikg/permission-sync/backend-crud
Are you sure you want to change the base?
feat: plumb auto sync permission attempts to celery tasks #5407
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
77e8987
to
7e65811
Compare
a69ab93
to
2e80684
Compare
7e65811
to
a8b4292
Compare
2e80684
to
298563f
Compare
298563f
to
3994120
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR adds comprehensive attempt-level tracking for permission synchronization operations as part of Linear issue DAN-2494. The changes integrate new database tables and CRUD operations into existing Celery tasks to provide visibility into sync progress and status.
The implementation follows a consistent pattern across both document permission syncing and external group syncing tasks. For each sync operation, the code now:
- Creates a sync attempt record at the start with
create_external_group_sync_attempt()
or similar functions - Marks the attempt as in-progress using
mark_external_group_sync_attempt_in_progress()
- Tracks progress metrics throughout execution (documents synced, errors encountered, users/groups processed)
- Sets final status as SUCCESS, FAILED, or COMPLETED_WITH_ERRORS based on outcome
The changes integrate with the existing permission sync infrastructure in ee/onyx/background/celery/tasks/
without disrupting the core sync logic. The attempt tracking uses separate database sessions for record management and includes proper error formatting via format_error_for_logging()
. Three comprehensive integration tests validate the full lifecycle of attempt records, including successful syncs, failure scenarios, and status transitions.
This enhancement significantly improves observability for enterprise customers who need to monitor and troubleshoot permission synchronization issues, providing detailed metrics and clear status reporting for all sync operations.
Confidence score: 3/5
- This PR requires careful review due to potential database transaction and exception handling issues
- Score reflects concerns about orphaned attempt records and database session management patterns
- Pay close attention to the exception handling logic in external_group_syncing/tasks.py lines 467-575
2 files reviewed, 4 comments
# Create attempt record at the start | ||
with get_session_with_current_tenant() as db_session: | ||
attempt_id = create_external_group_sync_attempt( | ||
connector_credential_pair_id=cc_pair_id, | ||
db_session=db_session, | ||
) | ||
logger.info( | ||
f"Created external group sync attempt: {attempt_id} for cc_pair={cc_pair_id}" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Creating the attempt record in a separate transaction could lead to orphaned records if the main sync fails before marking as in_progress. Consider creating and marking as in_progress in the same transaction.
total_groups_processed += 1 | ||
total_users_processed += len(external_user_group.user_emails) | ||
total_group_memberships_synced += len(external_user_group.user_emails) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Progress tracking counts each user multiple times - once in total_users_processed
and again in total_group_memberships_synced
. This creates confusing metrics where memberships equal users.
except Exception: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Broad exception handling may mask specific test failures. Consider catching more specific exceptions or at least logging the exception details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 issues found across 3 files
Prompt for AI agents (all 4 issues)
Understand the root cause of the following 4 issues and fix them.
<file name="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py">
<violation number="1" location="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:395">
Creating the attempt before acquiring the sync lock and validating fencing causes orphan NOT_STARTED attempts when the task exits early (e.g., lock not acquired or early return paths). Create the attempt only after acquiring the lock and confirming the task will proceed, or ensure early-return/failure paths update the attempt status.</violation>
<violation number="2" location="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:574">
Avoid logging raw exception strings in messages; they can expose sensitive data. Use a sanitized message or omit the exception text since logger.exception includes the traceback.
(Based on your team's feedback about avoiding logging raw exception strings that may include temporary auth tokens.)</violation>
<violation number="3" location="backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py:601">
Marking the attempt as succeeded/completed-with-errors before finalizing remaining operations risks overwriting the status to FAILED if a subsequent step throws. Move the status update after all non-trivial operations (e.g., after generator_complete) or guard the exception path from overriding a terminal status.</violation>
</file>
<file name="backend/tests/integration/tests/indexing/test_initial_permission_sync.py">
<violation number="1" location="backend/tests/integration/tests/indexing/test_initial_permission_sync.py:281">
Failure-path test uses the default long timeout and full sync waiting, increasing runtime and flakiness; set a short timeout to fail fast.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
f"Marked doc permission sync attempt {attempt_id} as completed with errors" | ||
) | ||
else: | ||
mark_doc_permission_sync_attempt_succeeded(attempt_id, db_session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking the attempt as succeeded/completed-with-errors before finalizing remaining operations risks overwriting the status to FAILED if a subsequent step throws. Move the status update after all non-trivial operations (e.g., after generator_complete) or guard the exception path from overriding a terminal status.
Prompt for AI agents
Address the following comment on backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py at line 601:
<comment>Marking the attempt as succeeded/completed-with-errors before finalizing remaining operations risks overwriting the status to FAILED if a subsequent step throws. Move the status update after all non-trivial operations (e.g., after generator_complete) or guard the exception path from overriding a terminal status.</comment>
<file context>
@@ -533,22 +556,53 @@ def fetch_all_existing_docs_ids_fn() -> list[str]:
+ f"Marked doc permission sync attempt {attempt_id} as completed with errors"
+ )
+ else:
+ mark_doc_permission_sync_attempt_succeeded(attempt_id, db_session)
+ task_logger.info(
+ f"Marked doc permission sync attempt {attempt_id} as succeeded"
</file context>
except Exception as e: | ||
docs_with_errors += 1 | ||
task_logger.exception( | ||
f"Error updating permissions for doc {doc_external_access.doc_id}: {e}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid logging raw exception strings in messages; they can expose sensitive data. Use a sanitized message or omit the exception text since logger.exception includes the traceback.
(Based on your team's feedback about avoiding logging raw exception strings that may include temporary auth tokens.)
Prompt for AI agents
Address the following comment on backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py at line 574:
<comment>Avoid logging raw exception strings in messages; they can expose sensitive data. Use a sanitized message or omit the exception text since logger.exception includes the traceback.
(Based on your team's feedback about avoiding logging raw exception strings that may include temporary auth tokens.)</comment>
<file context>
@@ -533,22 +556,53 @@ def fetch_all_existing_docs_ids_fn() -> list[str]:
+ except Exception as e:
+ docs_with_errors += 1
+ task_logger.exception(
+ f"Error updating permissions for doc {doc_external_access.doc_id}: {e}"
+ )
+ # Continue processing other documents
</file context>
doc_permission_sync_ctx.set(doc_permission_sync_ctx_dict) | ||
|
||
with get_session_with_current_tenant() as db_session: | ||
attempt_id = create_doc_permission_sync_attempt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating the attempt before acquiring the sync lock and validating fencing causes orphan NOT_STARTED attempts when the task exits early (e.g., lock not acquired or early return paths). Create the attempt only after acquiring the lock and confirming the task will proceed, or ensure early-return/failure paths update the attempt status.
Prompt for AI agents
Address the following comment on backend/ee/onyx/background/celery/tasks/doc_permission_syncing/tasks.py at line 395:
<comment>Creating the attempt before acquiring the sync lock and validating fencing causes orphan NOT_STARTED attempts when the task exits early (e.g., lock not acquired or early return paths). Create the attempt only after acquiring the lock and confirming the task will proceed, or ensure early-return/failure paths update the attempt status.</comment>
<file context>
@@ -379,6 +391,15 @@ def connector_permission_sync_generator_task(
doc_permission_sync_ctx.set(doc_permission_sync_ctx_dict)
+ with get_session_with_current_tenant() as db_session:
+ attempt_id = create_doc_permission_sync_attempt(
+ connector_credential_pair_id=cc_pair_id,
+ db_session=db_session,
</file context>
CCPairManager.wait_for_sync( | ||
cc_pair=cc_pair, | ||
after=index_attempt.time_created, | ||
number_of_updated_docs=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failure-path test uses the default long timeout and full sync waiting, increasing runtime and flakiness; set a short timeout to fail fast.
Prompt for AI agents
Address the following comment on backend/tests/integration/tests/indexing/test_initial_permission_sync.py at line 281:
<comment>Failure-path test uses the default long timeout and full sync waiting, increasing runtime and flakiness; set a short timeout to fail fast.</comment>
<file context>
@@ -128,3 +132,237 @@ def test_mock_connector_initial_permission_sync(
+ CCPairManager.wait_for_sync(
+ cc_pair=cc_pair,
+ after=index_attempt.time_created,
+ number_of_updated_docs=0,
+ user_performing_action=admin_user,
+ )
</file context>
docs_with_errors = 0 | ||
for doc_external_access in document_external_accesses: | ||
redis_connector.permissions.update_db( | ||
lock=lock, | ||
new_permissions=[doc_external_access], | ||
source_string=source_type, | ||
connector_id=cc_pair.connector.id, | ||
credential_id=cc_pair.credential.id, | ||
task_logger=task_logger, | ||
) | ||
tasks_generated += 1 | ||
try: | ||
redis_connector.permissions.update_db( | ||
lock=lock, | ||
new_permissions=[doc_external_access], | ||
source_string=source_type, | ||
connector_id=cc_pair.connector.id, | ||
credential_id=cc_pair.credential.id, | ||
task_logger=task_logger, | ||
) | ||
tasks_generated += 1 | ||
except Exception as e: | ||
docs_with_errors += 1 | ||
task_logger.exception( | ||
f"Error updating permissions for doc {doc_external_access.doc_id}: {e}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrap try catch around for loop for doc permission syncing
Description
Partial completion of: https://linear.app/danswer/issue/DAN-2494/sync-permissions-database-and-backend-updates
Modify existing permission sync tasks to create/update attempt records. This just plumbs in the attempt tracking logic and adds some integration tests.
PR Stack:
How Has This Been Tested?
integration tests and some manual testing.
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.
Summary by cubic
Adds attempt-level tracking for permission syncs (documents and external groups) and wires it into Celery tasks. Improves observability with clear statuses and progress metrics, aligning with Linear DAN-2494.
New Features
Migration