Version 1.7.3
What's Changed
-
Table linearization improvements by @Belval in #313
- Add
.get_text(),.to_html()and.to_markdown()functions toLinearizablewhich is now implemented byDocument,Page,DocumentEntityandEntityList - Add
HTMLLinearizationConfigandMarkdownLinearizationConfigas pre-configuredTextLinearizationConfig - Add the follow parameters to
TextLinearizationConfigduplicate_text_in_merged_cellsduplicates the text in merge cells to preserve row-level alignmenttable_flatten_headerscombines multi-row headers into a single row, duplicating the merged cells horizontally as neededtable_tabulate_remove_extra_hyphensremoves extra hyphens '-' in markdown tables to reduce context lengthmax_number_of_consecutive_spacesdefines the maximum number of contiguous whitespace characters, similar tomax_number_of_consecutive_new_lines
- Add
-
Fixes:
New Contributors
Full Changelog: v1.7.2...v1.7.3