What image input size to use for training? #1895

haimat · 2025-03-12T15:10:03Z

haimat
Mar 12, 2025

I have a custom dataset which I want to train docTR models on. These images all contain just a single line of text, hence they are about 2000 pixels wide and only 400 pixels in height. Now I saw that the default model input_size (W = H) is 1024 for the training script.

When I call a docTR model on an image, is this image then also resized to a fixed size?
What is the typical approach to find out what image size I should use during custom model training?
Last but not least, is it possible to train a model with different width and heigth sizes?

Answered by felixT2K

Mar 12, 2025

I have a custom dataset which I want to train docTR models on. These images all contain just a single line of text, hence they are about 2000 pixels wide and only 400 pixels in height. Now I saw that the default model input_size (W = H) is 1024 for the training script.

When I call a docTR model on an image, is this image then also resized to a fixed size? What is the typical approach to find out what image size I should use during custom model training? Last but not least, is it possible to train a model with different width and heigth sizes?

For inference it's also resized to 1024x1024 by keeping aspect ratio & symmetric padding (by default - the last two can be disabled only the size…

View full answer

felixT2K · 2025-03-12T15:32:17Z

felixT2K
Mar 12, 2025

I have a custom dataset which I want to train docTR models on. These images all contain just a single line of text, hence they are about 2000 pixels wide and only 400 pixels in height. Now I saw that the default model input_size (W = H) is 1024 for the training script.

When I call a docTR model on an image, is this image then also resized to a fixed size? What is the typical approach to find out what image size I should use during custom model training? Last but not least, is it possible to train a model with different width and heigth sizes?

For inference it's also resized to 1024x1024 by keeping aspect ratio & symmetric padding (by default - the last two can be disabled only the size is fix)
See point 3 ^^ - in your case it would be 2048x2048 and I think this will slow down a lot and I expect the results to be also worse than with the default 1024 .. If you run the training with --show-samples how does it look still "natural" or "compressed" ?
It still is but was already planned to remove because it wouldn't work out of the box with the inference pipeline - and makes things a lot more complicated - and especially for detection the default 1024x1024 are really valid if we compare quality and inference latency

1 reply

haimat Mar 12, 2025
Author

This is the output from --show-samples. I'd say they look "natural" to me.
So I guess I will stick to the default 1024 then.
Thanks for your quick response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What image input size to use for training? #1895

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What image input size to use for training? #1895

Uh oh!

Uh oh!

haimat Mar 12, 2025

Replies: 1 comment · 1 reply

Uh oh!

felixT2K Mar 12, 2025

Uh oh!

haimat Mar 12, 2025 Author

haimat
Mar 12, 2025

Replies: 1 comment 1 reply

felixT2K
Mar 12, 2025

haimat Mar 12, 2025
Author