Skip to content

Releases: PaddlePaddle/PaddleOCR

v3.3.0

16 Oct 12:58
c2b0390
Compare
Choose a tag to compare

2025.10.16 v3.3.0 released

  • Released PaddleOCR-VL:

    • Model Introduction:

      • PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios. The model has been released on HuggingFace. Everyone is welcome to download and use it!
    • Core Features:

      • Compact yet Powerful VLM Architecture: We present a novel vision-language model that is specifically designed for resource-efficient inference, achieving outstanding performance in element recognition. By integrating a NaViT-style dynamic high-resolution visual encoder with the lightweight ERNIE-4.5-0.3B language model, we significantly enhance the model’s recognition capabilities and decoding efficiency. This integration maintains high accuracy while reducing computational demands, making it well-suited for efficient and practical document processing applications.
      • SOTA Performance on Document Parsing: PaddleOCR-VL achieves state-of-the-art performance in both page-level document parsing and element-level recognition. It significantly outperforms existing pipeline-based solutions and exhibiting strong competitiveness against leading vision-language models (VLMs) in document parsing. Moreover, it excels in recognizing complex document elements, such as text, tables, formulas, and charts, making it suitable for a wide range of challenging content types, including handwritten text and historical documents. This makes it highly versatile and suitable for a wide range of document types and scenarios.
      • Multilingual Support: PaddleOCR-VL Supports 109 languages, covering major global languages, including but not limited to Chinese, English, Japanese, Latin, and Korean, as well as languages with different scripts and structures, such as Russian (Cyrillic script), Arabic, Hindi (Devanagari script), and Thai. This broad language coverage substantially enhances the applicability of our system to multilingual and globalized document processing scenarios.
  • Released PP-OCRv5 Multilingual Recognition Model:

    • Improved the accuracy and coverage of Latin script recognition; added support for Cyrillic, Arabic, Devanagari, Telugu, Tamil, and other language systems, covering recognition of 109 languages. The model has only 2M parameters, and the accuracy of some models has increased by over 40% compared to the previous generation.

2025.10.16 v3.3.0 发布

  • 发布PaddleOCR-VL

    • 模型介绍:

      • PaddleOCR-VL 是一款先进、高效的文档解析模型,专为文档中的元素识别设计。其核心组件为 PaddleOCR-VL-0.9B,这是一种紧凑而强大的视觉语言模型(VLM),它由 NaViT 风格的动态分辨率视觉编码器与 ERNIE-4.5-0.3B 语言模型组成,能够实现精准的元素识别。该模型支持 109 种语言,并在识别复杂元素(如文本、表格、公式和图表)方面表现出色,同时保持极低的资源消耗。通过在广泛使用的公开基准与内部基准上的全面评测,PaddleOCR-VL 在页级级文档解析与元素级识别均达到 SOTA 表现。它显著优于现有的基于Pipeline方案和文档解析多模态方案以及先进的通用多模态大模型,并具备更快的推理速度。这些优势使其非常适合在真实场景中落地部署。模型已发布至HuggingFace,欢迎大家下载使用!
    • 特性:

      • 紧凑而强大的视觉语言模型架构:我们提出了一种新的视觉语言模型,专为资源高效的推理而设计,在元素识别方面表现出色。通过将NaViT风格的动态高分辨率视觉编码器与轻量级的ERNIE-4.5-0.3B语言模型结合,我们显著增强了模型的识别能力和解码效率。这种集成在保持高准确率的同时降低了计算需求,使其非常适合高效且实用的文档处理应用。
      • 文档解析的SOTA性能:PaddleOCR-VL在页面级文档解析和元素级识别中达到了最先进的性能。它显著优于现有的基于流水线的解决方案,并在文档解析中展现出与领先的视觉语言模型(VLMs)竞争的强劲实力。此外,它在识别复杂的文档元素(如文本、表格、公式和图表)方面表现出色,使其适用于包括手写文本和历史文献在内的各种具有挑战性的内容类型。这使得它具有高度的多功能性,适用于广泛的文档类型和场景。
      • 多语言支持:PaddleOCR-VL支持109种语言,覆盖了主要的全球语言,包括但不限于中文、英文、日文、拉丁文和韩文,以及使用不同文字和结构的语言,如俄语(西里尔字母)、阿拉伯语、印地语(天城文)和泰语。这种广泛的语言覆盖大大增强了我们系统在多语言和全球化文档处理场景中的适用性。
  • 发布PP-OCRv5小语种识别模型

    • 优化拉丁文识别的准度和广度,新增西里尔文、阿拉伯文、天城文、泰卢固语、泰米尔语等语系,覆盖109种语言文字的识别。模型参数量仅为2M,部分模型精度较上一代提升40%以上。

v3.2.0

21 Aug 11:11
4e6eab9
Compare
Choose a tag to compare

2025.8.21 v3.2.0 released

  • Significant Model Additions:

    • Introduced training, inference, and deployment for PP-OCRv5 recognition models in English, Thai, and Greek. The PP-OCRv5 English model delivers an 11% improvement in English scenarios compared to the main PP-OCRv5 model, with the Thai and Greek recognition models achieving accuracies of 82.68% and 89.28%, respectively.
  • Deployment Capability Upgrades:

    • Full support for PaddlePaddle framework versions 3.1.0 and 3.1.1.
    • Comprehensive upgrade of the PP-OCRv5 C++ local deployment solution, now supporting both Linux and Windows, with feature parity and identical accuracy to the Python implementation.
    • High-performance inference now supports CUDA 12, and inference can be performed using either the Paddle Inference or ONNX Runtime backends.
    • The high-stability service-oriented deployment solution is now fully open-sourced, allowing users to customize Docker images and SDKs as required.
    • The high-stability service-oriented deployment solution also supports invocation via manually constructed HTTP requests, enabling client-side code development in any programming language.
  • Benchmark Support:

    • All production lines now support fine-grained benchmarking, enabling measurement of end-to-end inference time as well as per-layer and per-module latency data to assist with performance analysis.
    • Documentation has been updated to include key metrics for commonly used configurations on mainstream hardware, such as inference latency and memory usage, providing deployment references for users.
  • Bug Fixes:

    • Resolved the issue of failed log saving during model training.
    • Upgraded the data augmentation component for formula models for compatibility with newer versions of the albumentations dependency, and fixed deadlock warnings when using the tokenizers package in multi-process scenarios.
    • Fixed inconsistencies in switch behaviors (e.g., use_chart_parsing) in the PP-StructureV3 configuration files compared to other pipelines.
  • Other Enhancements:

    • Separated core and optional dependencies. Only minimal core dependencies are required for basic text recognition; additional dependencies for document parsing and information extraction can be installed as needed.
    • Enabled support for NVIDIA RTX 50 series graphics cards on Windows; users can refer to the installation guide for the corresponding PaddlePaddle framework versions.
    • PP-OCR series models now support returning single-character coordinates.
    • Added AIStudio, ModelScope, and other model download sources, allowing users to specify the source for model downloads.
    • Added support for chart-to-table conversion via the PP-Chart2Table module.
    • Optimized documentation descriptions to improve usability.

2025.8.21 v3.2.0 发布

  • 重要模型新增:

    • 新增 PP-OCRv5 英文、泰文、希腊文识别模型的训练、推理、部署。其中 PP-OCRv5 英文模型较 PP-OCRv5 主模型在英文场景提升 11%,泰文识别模型精度 82.68%,希腊文识别模型精度 89.28%。
  • 部署能力升级:

    • 全面支持飞桨框架 3.1.0 和 3.1.1 版本。
    • 全面升级 PP-OCRv5 C++ 本地部署方案,支持 Linux、Windows,功能及精度效果与 Python 方案保持一致。
    • 高性能推理支持 CUDA 12,可使用 Paddle Inference、ONNX Runtime 后端推理。
    • 高稳定性服务化部署方案全面开源,支持用户根据需求对 Docker 镜像和 SDK 进行定制化修改。
    • 高稳定性服务化部署方案支持通过手动构造HTTP请求的方式调用,该方式允许客户端代码使用任意编程语言编写。
  • Benchmark支持

    • 全部产线支持产线细粒度 benchmark,能够测量产线端到端推理时间以及逐层、逐模块的耗时数据,可用于辅助产线性能分析。
    • 文档中补充各产线常用配置在主流硬件上的关键指标,包括推理耗时和内存占用等,为用户部署提供参考。
  • Bug修复:

    • 修复模型训练时训练日志保存失败的问题。
    • 对公式模型的数据增强部分进行了版本兼容性升级,以适应新版本的 albumentations 依赖,并修复了在多进程使用 tokenizers 依赖包时出现的死锁警告。
    • 修复 PP-StructureV3 配置文件中的 use_chart_parsing 等开关行为与其他产线不统一的问题。
  • 其他升级:

    • 分离必要依赖与可选依赖。使用基础文字识别功能时,仅需安装少量核心依赖;若需文档解析、信息抽取等功能,用户可按需选择安装额外依赖。
    • 支持 Windows 用户使用英伟达 50 系显卡,可根据安装文档安装对应版本的 paddle 框架。
    • PP-OCR 系列模型支持返回单文字坐标。
    • 模型新增 AIStudio、ModelScope 等下载源。可指定相关下载源下载对应的模型。
    • 支持图表转表 PP-Chart2Table 单功能模块推理能力。
    • 优化部分使用文档中的描述,提升易用性。

New Contributors

Full Changelog: v3.1.1...v3.2.0

v3.1.1

15 Aug 08:55
376035d
Compare
Choose a tag to compare

2025.8.15 v3.1.1 released

  • Bug Fixes:

    • Added the missing methods save_vector, save_visual_info_list, load_vector, and load_visual_info_list in the PP-ChatOCRv4 class.
    • Added the missing parameters glossary and llm_request_interval to the translate method in the PPDocTranslation class.
  • Documentation Improvements:

    • Added a demo to the MCP documentation.
    • Added information about the PaddlePaddle and PaddleOCR version used for performance metrics testing in the documentation.
    • Fixed errors and omissions in the production line document translation.
  • Others:

    • Changed the MCP server dependency to use the pure Python library puremagic instead of python-magic to reduce installation issues.
    • Retested PP-OCRv5 performance metrics with PaddleOCR version 3.1.0 and updated the documentation.

2025.8.15 v3.1.1 发布

  • bug修复:

    • 补充 PP-ChatOCRv4 类缺失的save_vectorsave_visual_info_listload_vector、load_visual_info_list 方法。
    • 补充 PPDocTranslation 类的 translate 方法缺失的 glossary 和 llm_request_interval 参数。
  • 文档优化:

    • 补充 MCP 文档中的 demo。
    • 补充文档中测试性能指标使用的飞桨框架与 PaddleOCR 版本。
    • 修复文档翻译产线文档中的错漏。
  • 其他:

    • 修改 MCP 服务器依赖,使用纯 Python 库 puremagic 代替 python-magic,减少安装问题。
    • 使用 3.1.0 版本 PaddleOCR 重新测试 PP-OCRv5 性能指标,更新文档。

Full Changelog: v3.1.0...v3.1.1

v3.1.0

29 Jun 06:57
31dd198
Compare
Choose a tag to compare

2025.6.29 v3.1.0 released

  • Key Models and Pipelines:

    • Added PP-OCRv5 Multilingual Text Recognition Model, which supports the training and inference process for text recognition models in 37 languages, including French, Spanish, Portuguese, Russian, Korean, etc. Average accuracy improved by over 30%. Details
    • Upgraded the PP-Chart2Table model in PP-StructureV3, further enhancing the capability of converting charts to tables. On internal custom evaluation sets, the metric (RMS-F1) increased by 9.36 percentage points (71.24% -> 80.60%).
    • Newly launched document translation pipeline, PP-DocTranslation, based on PP-StructureV3 and ERNIE 4.5 Turbo, which supports the translation of Markdown format documents, various complex-layout PDF documents, and document images, with the results saved as Markdown format documents. Details
  • New MCP server: Details

    • Supports both OCR and PP-StructureV3 pipelines.
    • Supports three working modes: local Python library, AIStudio Community Cloud Service, and self-hosted service.
    • Supports invoking local services via stdio and remote services via Streamable HTTP.
  • Documentation Optimization: Improved the descriptions in some user guides for a smoother reading experience.

2025.6.29 v3.1.0 发布

  • 重要模型和产线:

    • 新增 PP-OCRv5 多语种文本识别模型,支持法语、西班牙语、葡萄牙语、俄语、韩语等 37 种语言的文字识别模型的训推流程。平均精度涨幅超30%。详情
    • 升级 PP-StructureV3 中的 PP-Chart2Table 模型,图表转表能力进一步升级,在内部自建测评集合上指标(RMS-F1)提升 9.36 个百分点(71.24% -> 80.60%)。
    • 新增基于 PP-StructureV3 和 ERNIE 4.5 Turbo 的文档翻译产线 PP-DocTranslation,支持翻译 Markdown 格式文档、各种复杂版式的 PDF 文档和文档图像,结果保存为 Markdown 格式文档。详情
  • 新增MCP server:详情

    • 支持 OCR 和 PP-StructureV3 两种工具;
    • 支持本地Python库、星河社区云服务、自托管服务三种工作模式;
    • 支持通过 stdio 调用本地服务,通过 Streamable HTTP 调用远程服务。
  • 文档优化: 优化了部分使用文档描述,提升阅读体验。

v3.0.3

26 Jun 10:04
a00f633
Compare
Choose a tag to compare

v3.0.2

18 Jun 16:38
db3bf4a
Compare
Choose a tag to compare
  • 功能新增:

    • 模型默认下载源从BOS改为HuggingFace,同时也支持用户通过更改环境变量PADDLE_PDX_MODEL_SOURCEBOS,将模型下载源设置为百度云对象存储BOS。
    • PP-OCRv5、PP-StructureV3、PP-ChatOCRv4等pipeline新增C++、Java、Go、C#、Node.js、PHP 6种语言的服务调用示例。
    • 优化PP-StructureV3产线中版面分区排序算法,对复杂竖版版面排序逻辑进行完善,进一步提升了复杂版面排序效果。
    • 优化模型选择逻辑,当指定语言、未指定模型版本时,自动选择支持该语言的最新版本的模型。
    • 为MKL-DNN缓存大小设置默认上界,防止缓存无限增长。同时,支持用户配置缓存容量。
    • 更新高性能推理默认配置,支持Paddle MKL-DNN加速。优化高性能推理自动配置逻辑,支持更智能的配置选择。
    • 调整默认设备获取逻辑,考虑环境中安装的Paddle框架对计算设备的实际支持情况,使程序行为更符合直觉。
    • 新增PP-OCRv5的Android端示例,详情
  • Bug修复:

    • 修复PP-StructureV3部分CLI参数不生效的问题。
    • 修复部分情况下export_paddlex_config_to_yaml无法正常工作的问题。
    • 修复save_path实际行为与文档描述不符的问题。
    • 修复基础服务化部署在使用MKL-DNN时可能出现的多线程错误。
    • 修复Latex-OCR模型的图像预处理的通道顺序错误。
    • 修复文本识别模块保存可视化图像的通道顺序错误。
    • 修复PP-StructureV3中表格可视化结果通道顺序错误。
    • 修复PP-StructureV3产线中极特殊的情况下,计算overlap_ratio时,变量溢出问题。
  • 文档优化:

    • 更新文档中对enable_mkldnn参数的说明,使其更准确地描述程序的实际行为。
    • 修复文档中对langocr_version参数描述的错误。
    • 补充通过CLI导出产线配置文件的说明。
    • 修复PP-OCRv5性能数据表格中的列缺失问题。
    • 润色PP-StructureV3在不同配置下的benchmark指标。
  • 其他:

    • 放松numpy、pandas等依赖的版本限制,恢复对Python 3.12的支持。

v3.0.1

05 Jun 03:27
18936e0
Compare
Choose a tag to compare
  • 优化部分模型和模型配置:
    • 更新 PP-OCRv5默认模型配置,检测和识别均由mobile改为server模型。为了改善大多数的场景默认效果,配置中的参数limit_side_len由736改为64
    • 新增文本行方向分类PP-LCNet_x1_0_textline_ori模型,精度99.42%,OCR、PP-StructureV3、PP-ChatOCRv4产线的默认文本行方向分类器改为该模型
    • 优化文本行方向分类PP-LCNet_x0_25_textline_ori模型,精度提升3.3个百分点,当前精度98.85%
  • 优化3.0.0版本部分存在的问题
    • 优化CLI使用体验: 当使用PaddleOCR CLI不传入任何参数时,给出用法提示。
    • 新增参数: PP-ChatOCRv3、PP-StructureV3支持use_textline_orientation参数。
    • CPU推理速度优化: 所有产线CPU推理默认开启MKL-DNN。
    • C++推理支持: PP-OCRv5的检测和识别串联部分支持C++推理
  • 修复3.0.0版本部分存在的问题
    • 修复由于公式识别、表格识别模型无法使用MKL-DNN导致PP-StructureV3在部分cpu推理报错的问题
    • 修复在部分GPU环境中推理报FatalError: Process abort signal is detected by the operating system错误的问题
    • 修复部分Python3.8环境的type hint的问题
    • 修复PPStructureV3.concatenate_markdown_pages方法不存在的问题。
    • 修复实例化paddleocr.PaddleOCR时同时指定langmodel_namemodel_name不生效的问题。

v3.0.0

20 May 12:16
a847428
Compare
Choose a tag to compare
  • 发布全场景文字识别模型PP-OCRv5: 单模型支持五种文字类型和复杂手写体识别;整体识别精度相比上一代提升13个百分点。

  • 发布通用文档解析方案PP-StructureV3: 支持多场景、多版式 PDF 高精度解析,在公开评测集中领先众多开源和闭源方案。

  • 发布智能文档理解方案PP-ChatOCRv4: 原生支持文心大模型4.5 Turbo,精度相比上一代提升15个百分点。

  • 重构部署能力,统一推理接口: PaddleOCR 3.0 融合了飞桨 PaddleX3.0 工具的底层能力,全面升级推理、部署模块,优化 2.x 版本的设计,统一并优化了 Python API 和命令行接口(CLI)。部署能力现覆盖高性能推理、服务化部署及端侧部署三大场景。

  • 适配飞桨框架 3.0,优化训练流程: 新版本已兼容飞桨 3.0 的 CINN 编译器等最新特性,静态图模型存储文件名由 xxx.pdmodel 改为 xxx.json

  • 统一模型名称: 对PaddleOCR3.0支持的模型命名体系进行了更新,采用更规范、统一的命名规则,为后续迭代与维护奠定基础。

v2.10.0

07 Mar 07:03
e218c0a
Compare
Choose a tag to compare

What's Changed

Read more

v2.9.1

22 Oct 05:57
0760342
Compare
Choose a tag to compare

What's Changed

  • [cherry-pick] update paddle2onnx doc by @inisis in #14051
  • [cherry-pick] fix isnan is not supported and update pyproject by @GreatV in #14061
  • [cherry-pick] ci: Fixed docs multi version error by @GreatV in #14063

Full Changelog: v2.9.0...v2.9.1