-
Hi, I am working on PDF to text conversion. But while using OCR argument in PDFToTextConverter, I am getting an error, i.e
I have checked the api documentation of PDFToTextConverter and found this argument in the documentation. My haystack version is
Does the argument exist in the latest version or is this an error? |
Beta Was this translation helpful? Give feedback.
Answered by
anakin87
Jun 1, 2023
Replies: 1 comment 2 replies
-
Hello @JaisVJ! I am on Google Colab and the following commands work for me. ! pip install farm-haystack[pdf,ocr]==1.17.0 # You should select the right dependencies groups.
! apt-get install tesseract-ocr # Tesseract is needed for OCR
# I set the environment variable that points at Tesseract data
# The following command works in Ubuntu. For other operating systems, you should use different commands.
import os
os.environ["TESSDATA_PREFIX"]="/usr/share/tesseract-ocr/4.00/tessdata"
converter = PDFToTextConverter(valid_languages=['en'], ocr='auto') |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
jais001
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello @JaisVJ!
I am on Google Colab and the following commands work for me.
For more information, please take a look at the installation guide.