Discussion:
[tesseract-ocr] Train Error APPLY_BOXES: unlabelled word at :Bounding box=<>
a***@gmail.com
2017-06-28 05:10:01 UTC
Permalink
Hi !

I come with a strange problem. I use train data as test data. but the test
result is totally wrong. I don't know what's wrong with my training.

My version of tesseract

<Loading Image...>
I follow with the official training doc
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract


1.tesseract eng.test3.exp0.tif -psm 10 eng.test3.exp0 box.train

then i came across with this problem.

<Loading Image...>

*but it still can train in the later*

2 unicharset_extractor eng.test3.exp0.box

3. set_unicharset_properties -U unicharset -O unicharset
--script_dir=./langdata

4. create font file
test3 0 0 0 0 0

5. shapeclustering -F font -U unicharset eng.test3.exp0.tr

6.mftraining -F font -U unicharset -O test3.unicharset eng.test3.exp0.tr

7. cntraining eng.test3.exp0.tr

8. rename file name test3.

9. combine_tessdata test3.

At last. I do get the trainedata at last. But the test result is very
strange. *It totally mixed number 6 and number 8. *

I don't know what's wrong with it. I used the wrong training processure or
there is something wrong with my train data. Could anyone can help?

Here is my tif/box pair.
eng.test3.exp0.tif
<https://drive.google.com/open?id=0B9BaT1tS42KuUVBkbU1TUkdpLXc>
eng.test3.exp0.box
<https://drive.google.com/open?id=0B9BaT1tS42KuX0cteVJrZlJsSG8>

Thanks a lot !!!!
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d7391c4d-cdbd-4a9c-8254-d7f05709f4f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...