Discussion:
[tesseract-ocr] Tesseract with phone images of receipts
Wayne Rumble
2016-06-23 12:56:31 UTC
Permalink
I am writing a program for my final project and part of it extracts
quantity item name and price from a restaurant receipt using tesseract. I
am using ionic with angular and a rails api to pass the image from a phone
to the rails api where it converts the image and passes back the extracted
information via a server to be displayed via angular and ionic again. The
issue im having is that when testing with restaurant receipts found online,

Receipt image i was using
<Loading Image...>

and cropping the image to contain just the items and total it worked fine.
But when printing out this receipt image and taking a photo of it from my
phone then cropping and passing it to the following methods the results are
basically inconclusive and useless.

Here is the image processing code:


module Converter


def tesseract
system("convert #{Bill.last.image.url} -scale 50% receipt.jpg")
system("convert receipt.jpg -type Grayscale receipt.jpg")
system("tesseract receipt.jpg output")
find_total
create_items
system("rm output.txt")
system("rm receipt.jpg")
end

private

def find_total
a = File.readlines('./output.txt').grep(/TOTAL/)
b = a.map {|x| x[/\d+(?:[.,]\d+)?/].to_f}[0]
Bill.last.update(total:"#{b}")
end

def create_items
File.open './output.txt', 'r' do |file|
file.each_line do |line|
if search_for_words(line).length != 0
Item.create(
name: search_for_words(line),
price: search_for_float(line),
quantity: search_for_integer(line),
bill_id: Bill.last.id
)
end
end
end
end

def search_for_float(line)
line.gsub!(',','.')
line.scan(/(\d+[,.]\d+)/).flatten[0].to_f
end

def search_for_integer(line)
line.gsub!(',','.')
line.scan(/(\d+)/).flatten[0].to_i
end

def search_for_words(line)
line.split(" ").select{|word|word.match(/([a-z])/)}.join(" ")
end
end

I had version and compatability troubles when using the tesseract gem so resorted to using it via the command line instead. Any insights on whether is should be resizing etc the image and so on would be great.

Thanks in advance
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+***@googlegroups.com.
To post to this group, send email to tesseract-***@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0bb2b46d-74fc-43e0-822e-3d7c05df932c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...