Skip to content

Releases: alephdata/ingest-file

3.18.4-rc3

06 Apr 07:28
ee4311a

Choose a tag to compare

3.18.4-rc3 Pre-release
Pre-release

What's Changed

  • Do full page OCR for PDF pages with Type3 fonts by @stchris in #449

Dependency upgrades

Full Changelog: 3.18.4-rc1...3.18.4-rc3

3.18.4-rc1

24 Mar 11:14
d94c44d

Choose a tag to compare

3.18.4-rc1 Pre-release
Pre-release

What's Changed

  • Use PyMuPDF instead of pikepdf + pdfminer.six for PDF ingestion (text and image extraction). #441

Dependency upgrades

Full Changelog: 3.18.2...3.18.4-rc1

3.18.3-rc2

13 Mar 10:31
2afe740

Choose a tag to compare

3.18.3-rc2 Pre-release
Pre-release

What's Changed

Dependency upgrades

Full Changelog: 3.18.2...3.18.3-rc2

3.18.2

19 Jan 10:14
79bc5d2

Choose a tag to compare

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Update public error message for password protected PDFs by @catileptic in #422

Dependency upgrades

New Contributors

Full Changelog: 3.18.0...3.18.2

3.18.1

18 Jan 08:23
762845f

Choose a tag to compare

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Handle TIFFs in PDFs by converting to PNG by @stchris in #419

  • PDF ingest: ignore unsupported image file formats

  • PDF ingest: normalize text using unicode.normalize

  • Change dependabot schedules to monthly by @stchris in #414

Full Changelog: 3.18.0...3.18.1

3.18.1-rc3

17 Jan 11:13
f60f7de

Choose a tag to compare

3.18.1-rc3 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • PDF ingest: ignore unsupported image file formats
  • PDF ingest: normalize text using unicode.normalize

Full Changelog: 3.18.0...3.18.1-rc3

3.18.1-rc2

17 Jan 08:14
f0b705d

Choose a tag to compare

3.18.1-rc2 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Handle TIFFs in PDFs by converting to PNG by @stchris in #419
  • Change dependabot schedules to monthly by @stchris in #414

Full Changelog: 3.18.0...3.18.1-rc2

3.18.0

09 Jan 14:20
08f5533

Choose a tag to compare

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

Major PDF library change

We are hereby deprecating pdflib, replacing it with well maintained, performant libraries. This enables local development on hardware with Apple Silicon CPUs. This also enables support for JBIG2 images in PDF files.

  • Replace pdflib with pdfminersix (for text) & pikpedf (for images) by @stchris in #380
  • Properly link page entities to the Pages entity they belong to by @stchris in #410
  • Remove poppler by @stchris in #393
  • Better word recognition with large spaces between letters by @stchris in #402
  • Preference towards small text as opposed to spaced apart one by @stchris in #403

Integrating convert-document into ingest-file

Smaller changes

Dependency updates

Full Changelog: 3.17.1...3.18.0

3.18.0-rc4

06 Jan 13:38
f9a3a64

Choose a tag to compare

3.18.0-rc4 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Properly link page entities to the Pages entity they belong to (which fixes #398) by @stchris in #410

Dependency updates

Full Changelog: 3.18.0-rc3...3.18.0-rc4

3.18.0-rc3

02 Jan 12:27
01940ed

Choose a tag to compare

3.18.0-rc3 Pre-release
Pre-release

IMPORTANT NOTE: this release was pulled. At this time 3.17.1 is the latest release.

What's Changed

  • Better word recognition with large spaces between letters (#401) by @stchris in #402

Version bumps

Full Changelog: 3.18.0-rc2...3.18.0-rc3