a***@gmail.com
2017-06-28 05:10:01 UTC
Hi !
I come with a strange problem. I use train data as test data. but the test
result is totally wrong. I don't know what's wrong with my training.
My version of tesseract
<Loading Image...>
I follow with the official training doc
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract
1.tesseract eng.test3.exp0.tif -psm 10 eng.test3.exp0 box.train
then i came across with this problem.
<Loading Image...>
*but it still can train in the later*
2 unicharset_extractor eng.test3.exp0.box
3. set_unicharset_properties -U unicharset -O unicharset
--script_dir=./langdata
4. create font file
test3 0 0 0 0 0
5. shapeclustering -F font -U unicharset eng.test3.exp0.tr
6.mftraining -F font -U unicharset -O test3.unicharset eng.test3.exp0.tr
7. cntraining eng.test3.exp0.tr
8. rename file name test3.
9. combine_tessdata test3.
At last. I do get the trainedata at last. But the test result is very
strange. *It totally mixed number 6 and number 8. *
I don't know what's wrong with it. I used the wrong training processure or
there is something wrong with my train data. Could anyone can help?
Here is my tif/box pair.
eng.test3.exp0.tif
<https://drive.google.com/open?id=0B9BaT1tS42KuUVBkbU1TUkdpLXc>
eng.test3.exp0.box
<https://drive.google.com/open?id=0B9BaT1tS42KuX0cteVJrZlJsSG8>
Thanks a lot !!!!
I come with a strange problem. I use train data as test data. but the test
result is totally wrong. I don't know what's wrong with my training.
My version of tesseract
<Loading Image...>
I follow with the official training doc
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract
1.tesseract eng.test3.exp0.tif -psm 10 eng.test3.exp0 box.train
then i came across with this problem.
<Loading Image...>
*but it still can train in the later*
2 unicharset_extractor eng.test3.exp0.box
3. set_unicharset_properties -U unicharset -O unicharset
--script_dir=./langdata
4. create font file
test3 0 0 0 0 0
5. shapeclustering -F font -U unicharset eng.test3.exp0.tr
6.mftraining -F font -U unicharset -O test3.unicharset eng.test3.exp0.tr
7. cntraining eng.test3.exp0.tr
8. rename file name test3.
9. combine_tessdata test3.
At last. I do get the trainedata at last. But the test result is very
strange. *It totally mixed number 6 and number 8. *
I don't know what's wrong with it. I used the wrong training processure or
there is something wrong with my train data. Could anyone can help?
Here is my tif/box pair.
eng.test3.exp0.tif
<https://drive.google.com/open?id=0B9BaT1tS42KuUVBkbU1TUkdpLXc>
eng.test3.exp0.box
<https://drive.google.com/open?id=0B9BaT1tS42KuX0cteVJrZlJsSG8>
Thanks a lot !!!!
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d7391c4d-cdbd-4a9c-8254-d7f05709f4f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d7391c4d-cdbd-4a9c-8254-d7f05709f4f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.