OCR transformer model

Textline recognition model, implemented using PyTorch, specialised for the recognition of multi-script and multi-language lines containing Polytonic Greek and other scripts/languages.

This custom model was trained with ~6.2M of articially generated lines, as well as 350k real-world lines. It reaches a character-level accuracy of 98.2% on lines containing mixed Latin and Greek alphabets (+8% improvement with respect to our Tesseract baseline).

This model is only the core of a broader wrapper which allows it to ingest lines of any length. The modules are described ajmc_pipeline/ocr/pytorch and a few example usages can be found in ajmc_pipeline/ocr/_scripts. A more user-friendly API will be released as soon as possible.

Acknowledgements

Code & data in this repository were produced in the context of the Ajax Multi-Commentary project, funded by the Swiss National Science Foundation under an Ambizione grant PZ00P1_186033.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
1A_withbackbone_new.json		1A_withbackbone_new.json
LICENSE		LICENSE
README.md		README.md
best_model.pt		best_model.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR transformer model

Acknowledgements

About

Releases

Packages

License

AjaxMultiCommentary/OCR-transformer-model

Folders and files

Latest commit

History

Repository files navigation

OCR transformer model

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages