Skip to content

Conversation

emerzon
Copy link
Contributor

@emerzon emerzon commented Jun 30, 2025

Description

PDFs currently always have their images extracted. This will make use of the "Enable Image Extraction and Analysis" workspace configuration instead.

PDFs currently always have their images extracted.
This will make use of the "Enable Image Extraction and Analysis" workspace configuration instead.
@emerzon emerzon requested a review from a team as a code owner June 30, 2025 16:57
Copy link

vercel bot commented Jun 30, 2025

@emerzon is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Improves PDF processing configuration by removing hardcoded image extraction flag, now respecting the workspace's 'Enable Image Extraction and Analysis' setting.

  • Modified backend/onyx/file_processing/extract_file_text.py to use dynamic configuration instead of forcing extract_images=True for PDFs
  • Ensures consistent image extraction behavior across document types based on workspace settings
  • Provides better resource utilization by only extracting images when explicitly enabled

1 file reviewed, no comments
Edit PR Review Bot Settings | Greptile

@Weves Weves merged commit 8272482 into onyx-dot-app:main Jul 1, 2025
4 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants