Skip to content

Commit 8272482

Browse files
emerzonWeves
authored andcommitted
Remove hardcoded image extraction flag for PDFs
PDFs currently always have their images extracted. This will make use of the "Enable Image Extraction and Analysis" workspace configuration instead.
1 parent f9e0619 commit 8272482

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

backend/onyx/file_processing/extract_file_text.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828

2929
from onyx.configs.constants import FileOrigin
3030
from onyx.configs.constants import ONYX_METADATA_FILENAME
31+
from onyx.configs.llm_configs import get_image_extraction_and_analysis_enabled
3132
from onyx.file_processing.html_utils import parse_html_page_basic
3233
from onyx.file_processing.unstructured import get_unstructured_api_key
3334
from onyx.file_processing.unstructured import unstructured_to_text
@@ -533,7 +534,7 @@ def extract_text_and_images(
533534
if extension == ".pdf":
534535
file.seek(0)
535536
text_content, pdf_metadata, images = read_pdf_file(
536-
file, pdf_pass, extract_images=True
537+
file, pdf_pass, extract_images=get_image_extraction_and_analysis_enabled()
537538
)
538539
return ExtractionResult(
539540
text_content=text_content, embedded_images=images, metadata=pdf_metadata

0 commit comments

Comments
 (0)