Skip to content

Failed to continue from: ~/tesstutorial/trainplusminus/eng.lstm #1069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
honeytidy opened this issue Aug 7, 2017 · 7 comments
Closed

Failed to continue from: ~/tesstutorial/trainplusminus/eng.lstm #1069

honeytidy opened this issue Aug 7, 2017 · 7 comments

Comments

@honeytidy
Copy link

honeytidy commented Aug 7, 2017

Hi, I want to finetune tesseract4.0 following this: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters
but when i run the following code:

scripts/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
    --fontlist "DejaVu Sans" -noextract_font_properties --langdata_dir ./langdata \
    --tessdata_dir ./tessdata --output_dir ~/tesstutorial/trainplusminus
scripts/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
    --noextract_font_properties --langdata_dir ./langdata \
    --tessdata_dir ./tessdata \
    --fontlist "DejaVu Sans" --output_dir ~/tesstutorial/evalplusminus

combine_tessdata -e baseline/eng.traineddata \
    ~/tesstutorial/trainplusminus/eng.lstm
lstmtraining --model_output ~/tesstutorial/trainplusminus/plusminus \
    --continue_from ~/tesstutorial/trainplusminus/eng.lstm\
    --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \
    --old_traineddata baseline/eng.traineddata \
    --train_listfile ~/tesstutorial/trainplusminus/eng.training_files.txt \
    --max_iterations 3600

I got this error:

14:bigram-dawg:size=16109842, offset=3221531
17:lstm:size=5390718, offset=19331373
18:lstm-punc-dawg:size=4322, offset=24722091
19:lstm-word-dawg:size=7143578, offset=24726413
20:lstm-number-dawg:size=3530, offset=31869991
23:version:size=9, offset=31873521
Loaded file ~/tesstutorial/trainplusminus/eng.lstm, unpacking...
Code range changed from 461 to 461!!
Failed to continue from: ~/tesstutorial/trainplusminus/eng.lstm

update:
when i remove --old_traineddata baseline/eng.traineddata like this it will work:

lstmtraining --model_output ~/tesstutorial/trainplusminus/plusminus \
    --continue_from ~/tesstutorial/trainplusminus/eng.lstm\
    --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \
    --train_listfile ~/tesstutorial/trainplusminus/eng.training_files.txt \
    --max_iterations 3600

but got lots of this waring:

Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /root/tesstutorial/trainplusminus/eng.lstm
Loaded 85/85 pages (1-85) of document /root/tesstutorial/trainplusminus/eng.DejaVu_Sans.exp0.lstmf
Encoding of string failed! Failure bytes: ffffffc2 ffffffb1 36 35 ffffffc2 ffffffb0 43 2c 20 41 45 52 4f 4d 45 58 49 43 4f 20 53 55 4d 4d 4f 4e 45 52 20 3d 20 28 31 39 36 31 29 20 41 62 6f 75 74 20 57 41 53 48 49 4e 47 20 4d 69 73 73 6f 75 72 69
Can't encode transcription: 'VOLVO abdomen, ±65°C, AEROMEXICO SUMMONER = (1961) About WASHING Missouri' in language ''
Encoding of string failed! Failure bytes: ffffffc2 ffffffb1 32 3f 20 61 63 74 69 76 69 74 79 20 50 52 4f 50 45 52 54 59 20 4d 41 49 4e 54 41 49 4e 45 44
Can't encode transcription: 'netting Bookmark of WE MORE) STRENGTH IDENTICAL ±2? activity PROPERTY MAINTAINED' in language ''
Encoding of string failed! Failure bytes: ffffffc2 ffffffb1 38 35 ffffffc2 ffffffa2 20 2c 20 72 65 6c 69 61 62 6c 65 20 45 76 65 6e 74 73 20 54 48 4f 55 53 41 4e 44 53 20 54 52 41 44 49 54 49 4f 4e 53 2e 20 41 4e 54 49 2d 55 53 20 42 65 64 72 6f 6f 6d 20 4c 65 61 64 65 72 73 68 69 70
Can't encode transcription: 'TRAVELED ±85¢ , reliable Events THOUSANDS TRADITIONS. ANTI-US Bedroom Leadership' in language ''
...

Why?

@Shreeshrii
Copy link
Collaborator

Please see #1060 (comment)

Try again after making sure that you are using the latest source code as well as traineddata from github.

@ahmed-alaa
Copy link

Hey Shreeshrii,
I pulled and rebuilt the solution but I'm facing an issue with this command
training/lstmtraining --stop_training --continue_from ~/tesstutorial/trainplusminus/plusminus_checkpoint --traineddata ~/tesstutorial/engtrain/ara.traineddata --model_output ~/tesstutorial/trainplusminus/any.traineddata

It shows and the checkpoint generated using "training/lstmtraining":
Failed to read continue from: /trainplusminus/plusminus_checkpoint

Btw, This issues appeared with the new commit on the master branch 2 days ago

@Shreeshrii
Copy link
Collaborator

@ahmed-alaa Please note the commit number since you have isolated the problem.

@theraysmith @stweil FYI.

@ahmed-alaa
Copy link

@Shreeshrii everything was working well until this commit "4572940"

@honeytidy honeytidy changed the title Failed to continue from: ~/tesstutorial/trainplusminus/chi_sim_checkpoint Failed to continue from: ~/tesstutorial/trainplusminus/eng.lstm Aug 8, 2017
@Shreeshrii
Copy link
Collaborator

Failed to continue from: ~/tesstutorial/trainplusminus/eng.lstm

Please make sure that the file is there. Rerun the following and check.

combine_tessdata -e baseline/eng.traineddata
~/tesstutorial/trainplusminus/eng.lstm

@honeytidy
Copy link
Author

It is ok when I get the latest tessdata from https://github.com/tesseract-ocr/tessdata/tree/master/best.
Thanks for you help @Shreeshrii .

@Shreeshrii
Copy link
Collaborator

Please close the issue, since it is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants