Discussion:
[tesseract-ocr] Error in creating LSTM training data using tesstrain.sh
Shandigutt
2018-09-01 22:10:45 UTC
Permalink
Hi,

I was trying to create LSTM training data using tesstrain.sh. I got the
below error. Can somebody explain me what has gone wrong,

*Command I used:*
./src/training/tesstrain.sh --fonts_dir ../Support/font --lang sin
--linedata_only \
--noextract_font_properties --langdata_dir ../langdata \
--tessdata_dir ../tessdata --output_dir ../training/sintrain --fontlist
"BhashitaComplex" --training_text ../langdata/sin/sin.training_text

*Extract of the output:*
=== Phase E: Generating lstmf files ===
Using TESSDATA_PREFIX=../tessdata
[2018 සැඎ්තැඞ්බර් 1 වැනි සෙනසුරාදා 21:41:25 +0300] /usr/local/bin/tesseract
/tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.tif
/tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0 --psm 6 lstm.train
../langdata/sin/sin.config
read_params_file: Can't open lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.4-74-gd8237 with Leptonica
Page 1
Page 2
Page 3
ERROR: /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.lstmf does not
exist or is not readable

*For the complete output please see the attached err.txt*

*After executing the command I checked the tmp directory it created. It was
shown as below,*

***@tharaka-laptop-ubuntu:~$ cd /tmp/sin-2018-09-01.E4T/
***@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ ll
total 776
drwx------ 2 tharaka tharaka 4096 සැඎ් 1 21:41 ./
drwxrwxrwt 50 root root 4096 සැඎ් 2 00:10 ../
-rw-r--r-- 1 tharaka tharaka 249413 සැඎ් 1 21:41
sin.BhashitaComplex.exp0.box
-rw-r--r-- 1 tharaka tharaka 436290 සැඎ් 1 21:41
sin.BhashitaComplex.exp0.tif
-rw-r--r-- 1 tharaka tharaka 9099 සැඎ් 1 23:27
sin.BhashitaComplex.exp0.txt
-rw-r--r-- 1 tharaka tharaka 6543 සැඎ් 1 21:41 sin.unicharset
-rw-r--r-- 1 tharaka tharaka 3053 සැඎ් 1 21:41 sin.xheights
-rw-r--r-- 1 tharaka tharaka 71704 සැඎ් 1 23:27 tesstrain.log
***@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$

*My tesseract version:*
tesseract 4.0.0-beta.4-74-gd8237
leptonica-1.77.0
libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib
1.2.11
Found SSE

*My OS details,*
***@tharaka-laptop-ubuntu:/tmp/sin-2018-09-01.E4T$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic

Appreciate your support on this.
Thanks
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7d771008-c142-4302-8b5e-e1fd130cc140%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Shree Devi Kumar
2018-09-02 03:40:42 UTC
Permalink
Post by Shandigutt
read_params_file: Can't open lstm.train
lstm.train is a config file which is not found.

It is there in tesseract/tessdata/configs

Make sure it is there in your tessdata directory or your path and can be
found.
Post by Shandigutt
Hi,
I was trying to create LSTM training data using tesstrain.sh. I got the
below error. Can somebody explain me what has gone wrong,
*Command I used:*
./src/training/tesstrain.sh --fonts_dir ../Support/font --lang sin
--linedata_only \
--noextract_font_properties --langdata_dir ../langdata \
--tessdata_dir ../tessdata --output_dir ../training/sintrain --fontlist
"BhashitaComplex" --training_text ../langdata/sin/sin.training_text
*Extract of the output:*
=== Phase E: Generating lstmf files ===
Using TESSDATA_PREFIX=../tessdata
[2018 සැඎ්තැඞ්බර් 1 වැනි සෙනසුරාදා 21:41:25 +0300]
/usr/local/bin/tesseract /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.tif
/tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0 --psm 6 lstm.train
../langdata/sin/sin.config
read_params_file: Can't open lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.4-74-gd8237 with Leptonica
Page 1
Page 2
Page 3
ERROR: /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.lstmf does not
exist or is not readable
*For the complete output please see the attached err.txt*
*After executing the command I checked the tmp directory it created. It
was shown as below,*
total 776
drwx------ 2 tharaka tharaka 4096 සැඎ් 1 21:41 ./
drwxrwxrwt 50 root root 4096 සැඎ් 2 00:10 ../
-rw-r--r-- 1 tharaka tharaka 249413 සැඎ් 1 21:41
sin.BhashitaComplex.exp0.box
-rw-r--r-- 1 tharaka tharaka 436290 සැඎ් 1 21:41
sin.BhashitaComplex.exp0.tif
-rw-r--r-- 1 tharaka tharaka 9099 සැඎ් 1 23:27
sin.BhashitaComplex.exp0.txt
-rw-r--r-- 1 tharaka tharaka 6543 සැඎ් 1 21:41 sin.unicharset
-rw-r--r-- 1 tharaka tharaka 3053 සැඎ් 1 21:41 sin.xheights
-rw-r--r-- 1 tharaka tharaka 71704 සැඎ් 1 23:27 tesstrain.log
*My tesseract version:*
tesseract 4.0.0-beta.4-74-gd8237
leptonica-1.77.0
libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib
1.2.11
Found SSE
*My OS details,*
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
Appreciate your support on this.
Thanks
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/
msgid/tesseract-ocr/7d771008-c142-4302-8b5e-e1fd130cc140%
40googlegroups.com
<https://groups.google.com/d/msgid/tesseract-ocr/7d771008-c142-4302-8b5e-e1fd130cc140%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
____________________________________________________________
à€­à€œà€š - à€•à¥€à€°à¥à€€à€š - à€†à€°à€€à¥€ @ http://bhajans.ramparivar.com
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXNYUwcsgMCq7OmNRvEmzewgMVwuLYY_TjOng%2BOcdMDdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Shandigutt
2018-09-02 21:20:11 UTC
Permalink
Thank you Shree. Now it works fine
Post by Shree Devi Kumar
Post by Shandigutt
read_params_file: Can't open lstm.train
lstm.train is a config file which is not found.
It is there in tesseract/tessdata/configs
Make sure it is there in your tessdata directory or your path and can be
found.
Post by Shandigutt
Hi,
I was trying to create LSTM training data using tesstrain.sh. I got the
below error. Can somebody explain me what has gone wrong,
*Command I used:*
./src/training/tesstrain.sh --fonts_dir ../Support/font --lang sin
--linedata_only \
--noextract_font_properties --langdata_dir ../langdata \
--tessdata_dir ../tessdata --output_dir ../training/sintrain --fontlist
"BhashitaComplex" --training_text ../langdata/sin/sin.training_text
*Extract of the output:*
=== Phase E: Generating lstmf files ===
Using TESSDATA_PREFIX=../tessdata
[2018 සැඎ්තැඞ්බර් 1 වැනි සෙනසුරාදා 21:41:25 +0300]
/usr/local/bin/tesseract
/tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.tif
/tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0 --psm 6 lstm.train
../langdata/sin/sin.config
read_params_file: Can't open lstm.train
Tesseract Open Source OCR Engine v4.0.0-beta.4-74-gd8237 with Leptonica
Page 1
Page 2
Page 3
ERROR: /tmp/sin-2018-09-01.E4T/sin.BhashitaComplex.exp0.lstmf does not
exist or is not readable
*For the complete output please see the attached err.txt*
*After executing the command I checked the tmp directory it created. It
was shown as below,*
total 776
drwx------ 2 tharaka tharaka 4096 සැඎ් 1 21:41 ./
drwxrwxrwt 50 root root 4096 සැඎ් 2 00:10 ../
-rw-r--r-- 1 tharaka tharaka 249413 සැඎ් 1 21:41
sin.BhashitaComplex.exp0.box
-rw-r--r-- 1 tharaka tharaka 436290 සැඎ් 1 21:41
sin.BhashitaComplex.exp0.tif
-rw-r--r-- 1 tharaka tharaka 9099 සැඎ් 1 23:27
sin.BhashitaComplex.exp0.txt
-rw-r--r-- 1 tharaka tharaka 6543 සැඎ් 1 21:41 sin.unicharset
-rw-r--r-- 1 tharaka tharaka 3053 සැඎ් 1 21:41 sin.xheights
-rw-r--r-- 1 tharaka tharaka 71704 සැඎ් 1 23:27 tesstrain.log
*My tesseract version:*
tesseract 4.0.0-beta.4-74-gd8237
leptonica-1.77.0
libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib
1.2.11
Found SSE
*My OS details,*
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic
Appreciate your support on this.
Thanks
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/7d771008-c142-4302-8b5e-e1fd130cc140%40googlegroups.com
<https://groups.google.com/d/msgid/tesseract-ocr/7d771008-c142-4302-8b5e-e1fd130cc140%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
____________________________________________________________
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/dae1d474-c6b1-4b26-b796-7ca6c155d9d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...