[tesseract-ocr] Image file not found

Discussion:

2017-07-02 08:12:17 UTC

Through Homebrew, I have installed the Tesseract OCR engine on my Mac.

All the directories (*jpeg, leptonica, libpng, libtiff, openssl, tesseract*)
are now installed in */usr/local/Cellar*

Before putting an image in the *Cellar* directory, when I try the following
at the command line, obviously it fails:

$ tesseract image.png outcome

So, because there is no such image, I get the following messages:

Error in fopenReadStream: file not found

Error in findFileFormat: image file not found

Error during processing.

Where are the programs/scripts that generate these messages? I can only
find *include* files in the installed Tesseract directory...

Where are the files that contain these error messages if the image was not
found, etc...?

Where are the scripts/programs that perform *image pre-processing* (such as
segmentation, binarization, etc...) before Tesseract actually does the OCR
on the image?

Thanks,

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/40c1541c-3ec1-4062-b809-f7305ce0439f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

2017-07-02 09:34:03 UTC

Permalink

These errors are from leptonica.

The image processing within tesseract is limited.

It is preferable to preprocess image before calling tesseract.

ShreeDevi
____________________________________________________________

Post by H
Through Homebrew, I have installed the Tesseract OCR engine on my Mac.
All the directories (*jpeg, leptonica, libpng, libtiff, openssl,
tesseract*) are now installed in */usr/local/Cellar*
Before putting an image in the *Cellar* directory, when I try the
$ tesseract image.png outcome
Error in fopenReadStream: file not found
Error in findFileFormat: image file not found
Error during processing.
Where are the programs/scripts that generate these messages? I can only
find *include* files in the installed Tesseract directory...
Where are the files that contain these error messages if the image was not
found, etc...?
Where are the scripts/programs that perform *image pre-processing* (such
as segmentation, binarization, etc...) before Tesseract actually does the
OCR on the image?
Thanks,
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/
msgid/tesseract-ocr/40c1541c-3ec1-4062-b809-f7305ce0439f%
40googlegroups.com
<https://groups.google.com/d/msgid/tesseract-ocr/40c1541c-3ec1-4062-b809-f7305ce0439f%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVrUe%3DrGB%3DdhJojHNODd8JBJPEwJS-D%2BR_LSd%3DvrpodTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

2017-07-02 09:41:20 UTC

Permalink

Thanks for your reply.

Do you know "where exactly" in Leptonica?
I would like to take a look at its scripts...

I have realized that Image Processing in Tesseract is limited.
So, I would first like to see "what exactly" is done internally (by
default) before Tesseract is called.
Do you know where is the script or program that contains the limited Image
Processing steps within Tesseract?

Thanks,

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/12e7a994-d44e-4eee-bda4-694e89abf7a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

2017-07-02 09:51:24 UTC

Permalink

you can browse source code via doxygen at
https://ub-mannheim.github.io/tesseract/a00113_source.html
for page segmentation,
follow the links.

ShreeDevi
____________________________________________________________

Post by H
Thanks for your reply.
Do you know "where exactly" in Leptonica?
I would like to take a look at its scripts...
I have realized that Image Processing in Tesseract is limited.
So, I would first like to see "what exactly" is done internally (by
default) before Tesseract is called.
Do you know where is the script or program that contains the limited Image
Processing steps within Tesseract?
Thanks,

You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/
msgid/tesseract-ocr/12e7a994-d44e-4eee-bda4-694e89abf7a7%
40googlegroups.com
<https://groups.google.com/d/msgid/tesseract-ocr/12e7a994-d44e-4eee-bda4-694e89abf7a7%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVt%3D3_0uoBcw3vJ6_qkaL6cK-hCmJ29%2B20Y%3DiK5a0-c%3DA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

2017-07-04 12:06:54 UTC

Permalink

Hi shree,

I can see all the programs in the source code. Quite a few...

How do I go about seeing an image transformation at every step of
image-processing done by Tesseract internally?

I have jarred up the standard teas-two in an app which means *it basically
acts like a black box*. It receives the captured frame and carries out
several image-processing steps (for example segmentation) and then outputs
a string, so all I can do is see what goes in and what is recognized out of
what comes out.

So, how can I find out what the initial image goes through (segmentation,
etc...)? For example, if someone said: Show me the *final image* (after
pre-processing is finished) that OCR works on, how do I obtain that? Are
there Terminal (Command Line) commands or functions that produce what the
image looks like, or something to call to save the pre-processed images to
file, so you can see them?

Thanks,

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/02b6efe2-ad0c-40c0-9a28-f02250532ce1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ShreeDevi Kumar

2017-07-04 12:23:33 UTC

Permalink

see

https://groups.google.com/forum/#!topic/tesseract-ocr/l918_ouIH98

https://groups.google.com/forum/#!topic/tesseract-ocr/hOvr20u71dY

https://groups.google.com/forum/#!topic/tesseract-ocr/nr095u8w7iU

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXJQT3%3Db_L0CB0rBVLc%2BnqE2vAT7gX0XdaFYFqBPdmrdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.