Discussion:
[tesseract-ocr] Server performance is 3x as slow versus local machine
David Tran
2018-10-17 19:01:24 UTC
Permalink
Local machine: 3.50Ghz, 16 GB Ram, Windows 7 64 bit
Server: 2.30Ghz, 32 GB Ram, WindowServer2012 64 bit

tesseract v4.0.0-beta.4.20180912 64 bit

Current Behavior: Processing a 64page PDF (2,733KB) on my local machine
takes 286 seconds while on our server it takes a whopping 842 seconds.

Expected Behavior: That it wouldn't be 3x as slow on the server versus on
my local machine.

What could be the root cause of the degradation in performance?
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/56c60ba4-b3f2-40cb-91ef-6b0a75f2d8be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
shree
2018-10-17 23:48:57 UTC
Permalink
Added to issue at
https://github.com/tesseract-ocr/tesseract/issues/1278#issuecomment-430827712
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/33f0aef1-1c33-4b22-9dd2-2dc60894e30d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
shree
2018-10-18 11:01:39 UTC
Permalink
Reply by @stweil in issue tracker. Please continue further discussion there.




It looks like the local machine is rather new hardware, while the server is older. So it could be AVX / SSE none at all. The user can run tesseract --version on both machines to see whether SSE and AVX are found.

The number of CPU cores and the memory bandwidth are also very important.
And of course it makes a difference if there are other processes running in parallel on the server.

The user uses the UB Mannheim installer for Windows. He should update to the latest version
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e76d1522-071d-4d43-88fa-e8d809add54a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Zdenko Podobny
2018-10-18 11:20:02 UTC
Permalink
Why? What is tesseract issue? That tesseract does not have the same speed
on different hw??? That is expected. David started discussion on right
place - forum.

Please use tesseract issue tracker only for issues that can be fixed on
tesseract side. We can not fix user side.

Zdenko
Post by shree
It looks like the local machine is rather new hardware, while the server
is older. So it could be AVX / SSE none at all. The user can run tesseract
--version on both machines to see whether SSE and AVX are found.
The number of CPU cores and the memory bandwidth are also very important.
And of course it makes a difference if there are other processes running
in parallel on the server.
The user uses the UB Mannheim installer for Windows. He should update to the latest version
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/e76d1522-071d-4d43-88fa-e8d809add54a%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yK74_t1KdvbwYR21_U8dg5QZto7XdNehRhWwjzQo9RUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
David Tran
2018-10-19 13:39:25 UTC
Permalink
SSE and AVX were not found for either my local or server machine

tesseract v4.0.0-rc3.20181014
leptonica-1.76.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff
4.0.
9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0

Is there a way to add those? If it will help boost the speed of the engine?
Post by shree
It looks like the local machine is rather new hardware, while the server
is older. So it could be AVX / SSE none at all. The user can run tesseract
--version on both machines to see whether SSE and AVX are found.
The number of CPU cores and the memory bandwidth are also very important.
And of course it makes a difference if there are other processes running
in parallel on the server.
The user uses the UB Mannheim installer for Windows. He should update to the latest version
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/389632b1-f728-4727-a2a8-4e2d39af456c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...