Description
Is your feature request related to a problem? Please describe.
I am working on large text-based PDF files, and would like to parse them as fats as possible, while keeping a high resolution (strategy="hi_res"
). I am interested in extracting tables, and tables from pictures, however I would like to deactivate OCR for images detected as such. The first obvious reason is speed. But also images do not matter much.
Describe the solution you'd like
A keyword argument that would enable or disable OCR would be the most easy thing to code I guess and would be a nice additional feature, especially if it can differentiate between tables and images. Another nice feature would be to not perform OCR on tables for text-based regions, in order to speed the partition process.
Describe alternatives you've considered
Adding a new OCRMode.NO_OCR
Additional context
I am not interested in images s.a. graphs or photos.