-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Labels
P2Medium priority, add to the next sprint if no P1 availableMedium priority, add to the next sprint if no P1 available
Description
Is your feature request related to a problem? Please describe.
It would be helpful if we could access the list of failed files so we can send them to another converter, such as OCR or similar. Ideally, this new feature would work for both PyPDFToDocument
and PDFMinerToDocument
.
Describe the solution you'd like
Basically, when there is an exception, the failed files would be appended to a list, something like this:
try:
pdf_reader = PdfReader(io.BytesIO(bytestream.data))
text = self._default_convert(pdf_reader)
except Exception as e:
logger.warning(
"Could not read {source} and convert it to Document, skipping. {error}", source=source, error=e
)
failed_files.append(source) # return this list along with `documents`
continue
Metadata
Metadata
Assignees
Labels
P2Medium priority, add to the next sprint if no P1 availableMedium priority, add to the next sprint if no P1 available