@@ -35,14 +35,14 @@ Gradio web demos are available! [ (Document Parsing) | 0.7 /<br > 0.7 /<br > 1.2 | 93.9 /<br > 93.6 /<br > 93.5 | [ donut-base-finetuned-cord-v2] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2 ) (1280) /<br > [ donut-base-finetuned-cord-v1] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1 ) (1280) /<br > [ donut-base-finetuned-cord-v1-2560] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1-2560 ) | [ gradio space web demo] ( https://huggingface.co/spaces/naver-clova-ix/donut-base-finetuned-cord-v2 ) ,<br >[ google colab demo] ( https://colab.research.google.com/drive/1o07hty-3OQTvGnc_7lgQFLvvKQuLjqiw?usp=sharing ) |
39
- | [ Train Ticket] ( https://github.com/beacandler/EATEN ) (Document Parsing) | 0.6 | 98.8 | [ donut-base-finetuned-zhtrainticket] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-zhtrainticket ) | [ google colab demo] ( https://colab.research.google.com/drive/16O-hMvGiXrYZnlXA_tfJ9_q760YcLoOj?usp=sharing ) |
40
- | [ RVL-CDIP] ( https://www.cs.cmu.edu/~aharley/rvl-cdip ) (Document Classification) | 0.75 | 95.3 | [ donut-base-finetuned-rvlcdip] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-rvlcdip ) | [ google colab demo] ( https://colab.research.google.com/drive/1xUDmLqlthx8A8rWKLMSLThZ7oeRJkDuU?usp=sharing ) |
41
- | [ DocVQA Task1] ( https://rrc.cvc.uab.es/?ch=17 ) (Document VQA) | 0.78 | 67.5 | [ donut-base-finetuned-docvqa] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa ) | [ google colab demo] ( https://colab.research.google.com/drive/1Z4WG8Wunj3HE0CERjt608ALSgSzRC9ig?usp=sharing ) |
38
+ | [ CORD] ( https://github.com/clovaai/cord ) (Document Parsing) | 0.7 /<br > 0.7 /<br > 1.2 | 93.9 /<br > 93.6 /<br > 93.5 | [ donut-base-finetuned-cord-v2] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official ) (1280) /<br > [ donut-base-finetuned-cord-v1] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1/tree/official ) (1280) /<br > [ donut-base-finetuned-cord-v1-2560] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v1-2560/tree/official ) | [ gradio space web demo] ( https://huggingface.co/spaces/naver-clova-ix/donut-base-finetuned-cord-v2 ) ,<br >[ google colab demo] ( https://colab.research.google.com/drive/1o07hty-3OQTvGnc_7lgQFLvvKQuLjqiw?usp=sharing ) |
39
+ | [ Train Ticket] ( https://github.com/beacandler/EATEN ) (Document Parsing) | 0.6 | 98.8 | [ donut-base-finetuned-zhtrainticket] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-zhtrainticket/tree/official ) | [ google colab demo] ( https://colab.research.google.com/drive/16O-hMvGiXrYZnlXA_tfJ9_q760YcLoOj?usp=sharing ) |
40
+ | [ RVL-CDIP] ( https://www.cs.cmu.edu/~aharley/rvl-cdip ) (Document Classification) | 0.75 | 95.3 | [ donut-base-finetuned-rvlcdip] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-rvlcdip/tree/official ) | [ google colab demo] ( https://colab.research.google.com/drive/1xUDmLqlthx8A8rWKLMSLThZ7oeRJkDuU?usp=sharing ) |
41
+ | [ DocVQA Task1] ( https://rrc.cvc.uab.es/?ch=17 ) (Document VQA) | 0.78 | 67.5 | [ donut-base-finetuned-docvqa] ( https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa/tree/official ) | [ google colab demo] ( https://colab.research.google.com/drive/1Z4WG8Wunj3HE0CERjt608ALSgSzRC9ig?usp=sharing ) |
42
42
43
43
The links to the pre-trained backbones are here:
44
- - [ ` donut-base ` ] ( https://huggingface.co/naver-clova-ix/donut-base ) : trained with 64 A100 GPUs (~ 2.5 days), number of layers (encoder: {2,2,14,2}, decoder: 4), input size 2560x1920, swin window size 10, IIT-CDIP (11M) and SynthDoG (ECJK, 0.5M x 4).
45
- - [ ` donut-proto ` ] ( https://huggingface.co/naver-clova-ix/donut-proto ) : (preliminary model) trained with 8 V100 GPUs (~ 5 days), number of layers (encoder: {2,2,18,2}, decoder: 4), input size 2048x1536, swin window size 8, and SynthDoG (EJK, 0.4M x 3).
44
+ - [ ` donut-base ` ] ( https://huggingface.co/naver-clova-ix/donut-base/tree/official ) : trained with 64 A100 GPUs (~ 2.5 days), number of layers (encoder: {2,2,14,2}, decoder: 4), input size 2560x1920, swin window size 10, IIT-CDIP (11M) and SynthDoG (ECJK, 0.5M x 4).
45
+ - [ ` donut-proto ` ] ( https://huggingface.co/naver-clova-ix/donut-proto/tree/official ) : (preliminary model) trained with 8 V100 GPUs (~ 5 days), number of layers (encoder: {2,2,18,2}, decoder: 4), input size 2048x1536, swin window size 8, and SynthDoG (EJK, 0.4M x 3).
46
46
47
47
Please see [ our paper] ( #how-to-cite ) for more details.
48
48
0 commit comments