-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat(projects): add project creation and management #5248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
beb4a21
to
712a86a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
40 issues found across 155 files
Note: This PR contains a large number of files. cubic only reviews up to 150 files per PR, so some files may not have been reviewed.
React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
for user_file_id in user_file_ids: | ||
self.app.send_task( | ||
OnyxCeleryTask.PROCESS_SINGLE_USER_FILE, | ||
kwargs={"user_file_id": user_file_id, "tenant_id": tenant_id}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing uuid.UUID in Celery kwargs will fail with default JSON serializer; convert to str before enqueueing.
Prompt for AI agents
Address the following comment on backend/onyx/background/celery/tasks/user_file_processing/tasks.py at line 80:
<comment>Passing uuid.UUID in Celery kwargs will fail with default JSON serializer; convert to str before enqueueing.</comment>
<file context>
@@ -0,0 +1,248 @@
+ for user_file_id in user_file_ids:
+ self.app.send_task(
+ OnyxCeleryTask.PROCESS_SINGLE_USER_FILE,
+ kwargs={"user_file_id": user_file_id, "tenant_id": tenant_id},
+ queue=OnyxCeleryQueues.USER_FILE_PROCESSING,
+ priority=OnyxCeleryPriority.HIGH,
</file context>
kwargs={"user_file_id": user_file_id, "tenant_id": tenant_id}, | |
kwargs={"user_file_id": str(user_file_id), "tenant_id": tenant_id}, |
"id": str(self.file_id), | ||
"type": self.file_type, | ||
"name": self.filename, | ||
"user_file_id": str(self.file_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning user_file_id to str(self.file_id) is incorrect; it should be the UserFile UUID or not set
Prompt for AI agents
Address the following comment on backend/onyx/file_store/models.py at line 61:
<comment>Assigning user_file_id to str(self.file_id) is incorrect; it should be the UserFile UUID or not set</comment>
<file context>
@@ -56,4 +58,5 @@ def to_file_descriptor(self) -> FileDescriptor:
"id": str(self.file_id),
"type": self.file_type,
"name": self.filename,
+ "user_file_id": str(self.file_id),
}
</file context>
connector_id=cc_pair.connector_id, | ||
credential_id=cc_pair.credential_id, | ||
), | ||
request_id=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_indexing_pipeline now requires adapter; adding request_id without supplying adapter will raise a TypeError at runtime. Construct and pass a DocumentIndexingBatchAdapter here.
Prompt for AI agents
Address the following comment on backend/onyx/server/onyx_api/ingestion.py at line 124:
<comment>run_indexing_pipeline now requires adapter; adding request_id without supplying adapter will raise a TypeError at runtime. Construct and pass a DocumentIndexingBatchAdapter here.</comment>
<file context>
@@ -121,10 +121,7 @@ def upsert_ingestion_doc(
- connector_id=cc_pair.connector_id,
- credential_id=cc_pair.credential_id,
- ),
+ request_id=None,
)
</file context>
try: | ||
|
||
# Use our consolidated function that handles indexing properly | ||
categorized_files_result = upload_files_to_user_files_with_indexing( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing ownership validation: project_id is accepted and used to associate files without verifying the project belongs to the authenticated user, enabling cross-project linkage.
Prompt for AI agents
Address the following comment on backend/onyx/server/features/projects/api.py at line 70:
<comment>Missing ownership validation: project_id is accepted and used to associate files without verifying the project belongs to the authenticated user, enabling cross-project linkage.</comment>
<file context>
@@ -0,0 +1,347 @@
+ try:
+
+ # Use our consolidated function that handles indexing properly
+ categorized_files_result = upload_files_to_user_files_with_indexing(
+ files=files, project_id=project_id, user=user, db_session=db_session
+ )
</file context>
) | ||
project_id = chat_session_creation_request.project_id | ||
if project_id: | ||
if not check_project_ownership(project_id, user.id, db_session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing user.id without a None check can raise AttributeError for anonymous access; handle unauthenticated users before calling check_project_ownership.
Prompt for AI agents
Address the following comment on backend/onyx/server/query_and_chat/chat_backend.py at line 280:
<comment>Accessing user.id without a None check can raise AttributeError for anonymous access; handle unauthenticated users before calling check_project_ownership.</comment>
<file context>
@@ -282,6 +272,15 @@ def create_new_chat_session(
+ )
+ project_id = chat_session_creation_request.project_id
+ if project_id:
+ if not check_project_ownership(project_id, user.id, db_session):
+ raise HTTPException(
+ status_code=403, detail="User does not have access to project"
</file context>
// Show temporary uploading files immediately | ||
const tempFiles: ProjectFile[] = Array.from(files).map( | ||
(file) => ({ | ||
id: file.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using file.name as id for temporary items risks duplicate React keys when multiple uploads share the same filename; generate a unique temp id.
Prompt for AI agents
Address the following comment on web/src/app/chat/components/projects/ProjectContextPanel.tsx at line 175:
<comment>Using file.name as id for temporary items risks duplicate React keys when multiple uploads share the same filename; generate a unique temp id.</comment>
<file context>
@@ -0,0 +1,301 @@
+ // Show temporary uploading files immediately
+ const tempFiles: ProjectFile[] = Array.from(files).map(
+ (file) => ({
+ id: file.name,
+ file_id: file.name,
+ name: file.name,
</file context>
results.non_accepted.append(upload.filename or "") | ||
else: | ||
results.acceptable.append(upload) | ||
results.acceptable_file_to_token_count[upload.filename or ""] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an empty string as the dict key risks collisions for multiple unnamed uploads; use a unique placeholder for missing filenames.
Prompt for AI agents
Address the following comment on backend/onyx/server/features/projects/projects_file_utils.py at line 135:
<comment>Using an empty string as the dict key risks collisions for multiple unnamed uploads; use a unique placeholder for missing filenames.</comment>
<file context>
@@ -0,0 +1,178 @@
+ results.non_accepted.append(upload.filename or "")
+ else:
+ results.acceptable.append(upload)
+ results.acceptable_file_to_token_count[upload.filename or ""] = (
+ token_count
+ )
</file context>
if (!currentProjectId) { | ||
throw new Error("No project selected"); | ||
} | ||
console.log("upserting instructions", instructions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid logging potentially sensitive user content (instructions) to the console; remove this debug log.
Prompt for AI agents
Address the following comment on web/src/app/chat/projects/ProjectsContext.tsx at line 112:
<comment>Avoid logging potentially sensitive user content (instructions) to the console; remove this debug log.</comment>
<file context>
@@ -0,0 +1,419 @@
+ if (!currentProjectId) {
+ throw new Error("No project selected");
+ }
+ console.log("upserting instructions", instructions);
+ await svcUpsertProjectInstructions(currentProjectId, instructions);
+ await refreshCurrentProjectDetails();
</file context>
db_session.query(UserFile) | ||
.filter(UserFile.user_id == user.id) | ||
.order_by(UserFile.last_accessed_at.desc()) | ||
.all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unbounded query for 'recent' files can return excessive rows; add a LIMIT to avoid large memory/latency impact.
Prompt for AI agents
Address the following comment on backend/onyx/server/manage/users.py at line 906:
<comment>Unbounded query for 'recent' files can return excessive rows; add a LIMIT to avoid large memory/latency impact.</comment>
<file context>
@@ -889,3 +891,19 @@ def update_assistant_preferences_for_user_api(
+ db_session.query(UserFile)
+ .filter(UserFile.user_id == user.id)
+ .order_by(UserFile.last_accessed_at.desc())
+ .all()
+ )
+
</file context>
.all() | |
.limit(50).all() |
retry_delay = 0.5 | ||
|
||
|
||
def _acquire_user_file_locks(db_session: Session, user_file_ids: list[int]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect type annotation for user_file_ids; IDs are UUID/str in this codebase, not int. Update the hint to reflect the actual type.
Prompt for AI agents
Address the following comment on backend/onyx/indexing/adapters/user_file_indexing_adapter.py at line 33:
<comment>Incorrect type annotation for user_file_ids; IDs are UUID/str in this codebase, not int. Update the hint to reflect the actual type.</comment>
<file context>
@@ -0,0 +1,210 @@
+retry_delay = 0.5
+
+
+def _acquire_user_file_locks(db_session: Session, user_file_ids: list[int]) -> bool:
+ """Acquire locks for the specified user files."""
+ stmt = (
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
40 issues found across 155 files
Note: This PR contains a large number of files. cubic only reviews up to 150 files per PR, so some files may not have been reviewed.
React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
for user_file_id in user_file_ids: | ||
self.app.send_task( | ||
OnyxCeleryTask.PROCESS_SINGLE_USER_FILE, | ||
kwargs={"user_file_id": user_file_id, "tenant_id": tenant_id}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing uuid.UUID in Celery kwargs will fail with default JSON serializer; convert to str before enqueueing.
Prompt for AI agents
Address the following comment on backend/onyx/background/celery/tasks/user_file_processing/tasks.py at line 80:
<comment>Passing uuid.UUID in Celery kwargs will fail with default JSON serializer; convert to str before enqueueing.</comment>
<file context>
@@ -0,0 +1,248 @@
+ for user_file_id in user_file_ids:
+ self.app.send_task(
+ OnyxCeleryTask.PROCESS_SINGLE_USER_FILE,
+ kwargs={"user_file_id": user_file_id, "tenant_id": tenant_id},
+ queue=OnyxCeleryQueues.USER_FILE_PROCESSING,
+ priority=OnyxCeleryPriority.HIGH,
</file context>
kwargs={"user_file_id": user_file_id, "tenant_id": tenant_id}, | |
kwargs={"user_file_id": str(user_file_id), "tenant_id": tenant_id}, |
"id": str(self.file_id), | ||
"type": self.file_type, | ||
"name": self.filename, | ||
"user_file_id": str(self.file_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning user_file_id to str(self.file_id) is incorrect; it should be the UserFile UUID or not set
Prompt for AI agents
Address the following comment on backend/onyx/file_store/models.py at line 61:
<comment>Assigning user_file_id to str(self.file_id) is incorrect; it should be the UserFile UUID or not set</comment>
<file context>
@@ -56,4 +58,5 @@ def to_file_descriptor(self) -> FileDescriptor:
"id": str(self.file_id),
"type": self.file_type,
"name": self.filename,
+ "user_file_id": str(self.file_id),
}
</file context>
connector_id=cc_pair.connector_id, | ||
credential_id=cc_pair.credential_id, | ||
), | ||
request_id=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_indexing_pipeline now requires adapter; adding request_id without supplying adapter will raise a TypeError at runtime. Construct and pass a DocumentIndexingBatchAdapter here.
Prompt for AI agents
Address the following comment on backend/onyx/server/onyx_api/ingestion.py at line 124:
<comment>run_indexing_pipeline now requires adapter; adding request_id without supplying adapter will raise a TypeError at runtime. Construct and pass a DocumentIndexingBatchAdapter here.</comment>
<file context>
@@ -121,10 +121,7 @@ def upsert_ingestion_doc(
- connector_id=cc_pair.connector_id,
- credential_id=cc_pair.credential_id,
- ),
+ request_id=None,
)
</file context>
✅ Addressed in 714a152
try: | ||
|
||
# Use our consolidated function that handles indexing properly | ||
categorized_files_result = upload_files_to_user_files_with_indexing( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing ownership validation: project_id is accepted and used to associate files without verifying the project belongs to the authenticated user, enabling cross-project linkage.
Prompt for AI agents
Address the following comment on backend/onyx/server/features/projects/api.py at line 70:
<comment>Missing ownership validation: project_id is accepted and used to associate files without verifying the project belongs to the authenticated user, enabling cross-project linkage.</comment>
<file context>
@@ -0,0 +1,347 @@
+ try:
+
+ # Use our consolidated function that handles indexing properly
+ categorized_files_result = upload_files_to_user_files_with_indexing(
+ files=files, project_id=project_id, user=user, db_session=db_session
+ )
</file context>
✅ Addressed in c78d522
) | ||
project_id = chat_session_creation_request.project_id | ||
if project_id: | ||
if not check_project_ownership(project_id, user.id, db_session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing user.id without a None check can raise AttributeError for anonymous access; handle unauthenticated users before calling check_project_ownership.
Prompt for AI agents
Address the following comment on backend/onyx/server/query_and_chat/chat_backend.py at line 280:
<comment>Accessing user.id without a None check can raise AttributeError for anonymous access; handle unauthenticated users before calling check_project_ownership.</comment>
<file context>
@@ -282,6 +272,15 @@ def create_new_chat_session(
+ )
+ project_id = chat_session_creation_request.project_id
+ if project_id:
+ if not check_project_ownership(project_id, user.id, db_session):
+ raise HTTPException(
+ status_code=403, detail="User does not have access to project"
</file context>
// Show temporary uploading files immediately | ||
const tempFiles: ProjectFile[] = Array.from(files).map( | ||
(file) => ({ | ||
id: file.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using file.name as id for temporary items risks duplicate React keys when multiple uploads share the same filename; generate a unique temp id.
Prompt for AI agents
Address the following comment on web/src/app/chat/components/projects/ProjectContextPanel.tsx at line 175:
<comment>Using file.name as id for temporary items risks duplicate React keys when multiple uploads share the same filename; generate a unique temp id.</comment>
<file context>
@@ -0,0 +1,301 @@
+ // Show temporary uploading files immediately
+ const tempFiles: ProjectFile[] = Array.from(files).map(
+ (file) => ({
+ id: file.name,
+ file_id: file.name,
+ name: file.name,
</file context>
results.non_accepted.append(upload.filename or "") | ||
else: | ||
results.acceptable.append(upload) | ||
results.acceptable_file_to_token_count[upload.filename or ""] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an empty string as the dict key risks collisions for multiple unnamed uploads; use a unique placeholder for missing filenames.
Prompt for AI agents
Address the following comment on backend/onyx/server/features/projects/projects_file_utils.py at line 135:
<comment>Using an empty string as the dict key risks collisions for multiple unnamed uploads; use a unique placeholder for missing filenames.</comment>
<file context>
@@ -0,0 +1,178 @@
+ results.non_accepted.append(upload.filename or "")
+ else:
+ results.acceptable.append(upload)
+ results.acceptable_file_to_token_count[upload.filename or ""] = (
+ token_count
+ )
</file context>
if (!currentProjectId) { | ||
throw new Error("No project selected"); | ||
} | ||
console.log("upserting instructions", instructions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid logging potentially sensitive user content (instructions) to the console; remove this debug log.
Prompt for AI agents
Address the following comment on web/src/app/chat/projects/ProjectsContext.tsx at line 112:
<comment>Avoid logging potentially sensitive user content (instructions) to the console; remove this debug log.</comment>
<file context>
@@ -0,0 +1,419 @@
+ if (!currentProjectId) {
+ throw new Error("No project selected");
+ }
+ console.log("upserting instructions", instructions);
+ await svcUpsertProjectInstructions(currentProjectId, instructions);
+ await refreshCurrentProjectDetails();
</file context>
db_session.query(UserFile) | ||
.filter(UserFile.user_id == user.id) | ||
.order_by(UserFile.last_accessed_at.desc()) | ||
.all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unbounded query for 'recent' files can return excessive rows; add a LIMIT to avoid large memory/latency impact.
Prompt for AI agents
Address the following comment on backend/onyx/server/manage/users.py at line 906:
<comment>Unbounded query for 'recent' files can return excessive rows; add a LIMIT to avoid large memory/latency impact.</comment>
<file context>
@@ -889,3 +891,19 @@ def update_assistant_preferences_for_user_api(
+ db_session.query(UserFile)
+ .filter(UserFile.user_id == user.id)
+ .order_by(UserFile.last_accessed_at.desc())
+ .all()
+ )
+
</file context>
.all() | |
.limit(50).all() |
retry_delay = 0.5 | ||
|
||
|
||
def _acquire_user_file_locks(db_session: Session, user_file_ids: list[int]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect type annotation for user_file_ids; IDs are UUID/str in this codebase, not int. Update the hint to reflect the actual type.
Prompt for AI agents
Address the following comment on backend/onyx/indexing/adapters/user_file_indexing_adapter.py at line 33:
<comment>Incorrect type annotation for user_file_ids; IDs are UUID/str in this codebase, not int. Update the hint to reflect the actual type.</comment>
<file context>
@@ -0,0 +1,210 @@
+retry_delay = 0.5
+
+
+def _acquire_user_file_locks(db_session: Session, user_file_ids: list[int]) -> bool:
+ """Acquire locks for the specified user files."""
+ stmt = (
</file context>
✅ Addressed in af2fc3d
56e4174
to
5023355
Compare
50 Jobs Failed: Run Integration Tests v2 / integration-tests (connector_job_tests/google, connector-google) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (connector_job_tests/jira, connector-jira) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (connector_job_tests/sharepoint, connector-sharepoint) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (connector_job_tests/slack, connector-slack) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/anonymous_user, tests-anonymous_user) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/api_key, tests-api_key) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/auth, tests-auth) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/chat, tests-chat) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/chat_retention, tests-chat_retention) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/connector, tests-connector) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/dev_apis, tests-dev_apis) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/document_set, tests-document_set) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/image_indexing, tests-image_indexing) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/index_attempt, tests-index_attempt) failed on "Pull Docker images"
Run Integration Tests v2 / integration-tests (tests/indexing, tests-indexing) failed on "Pull Docker images"
25 more jobs failed (See summary below for more details) 10 jobs failed running on non-Blacksmith runners. Summary: 2 successful workflows, 8 failed workflows
Last updated: 2025-09-23 13:49:12 UTC |
8ea509d
to
d3e20a6
Compare
717df27
to
71a523d
Compare
… related tables and foreign keys for project integration.
…ch deletions, and ensure foreign key integrity during project integration updates
…IDs to UUID scheme, ensuring document_id_migrated defaults to TRUE, and normalizing S3 object keys for plaintext records. This improves data integrity and prepares for future migrations.
…es, ensuring consistent handling of legacy data and enhancing error reporting during batch deletions. This update streamlines the migration process and reinforces data integrity
…ocument IDs, ensuring better data integrity. Adjust fetchChatData to correctly reference the available tools array, improving chat functionality.
3fbd569
to
f4bc5fd
Compare
… indexing process
Description
Draft PR to the indexing abstraction with initial worker setup.
How Has This Been Tested?
[Describe the tests you ran to verify your changes]
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.
Summary by cubic
Introduces an indexing adapter abstraction to separate orchestration from side effects and wires it into the pipeline. Adds a dedicated Celery worker for user file processing with configurable concurrency.
Refactors
New Features