Quantcast
Channel: FiveTech Software tech support forums
Viewing all articles
Browse latest Browse all 26233

OCR for scanned documents

$
0
0
Hi Frank; I'm currently using Tesseract API from xharbour. I'm processing thousands of .tif scanned documents. Results are about 80% accurate. I need better than that. For tesseract to be more accurate for the type of documents I'm OCRing I would need to change psm mode to 3 -which is default from command line. Changing PSM to 3 from API causes the OCR engine to break with runtime error. It might work a few times but after a number of runs it breaks causing my Harbour program to stop working. Just FYI- these documents contain a unique identifier that matches an account number for a customer on the database. In this way the documents are automatically indexed and saved into the customer's file without human intervention. Thousands of documents are feed into a commercial scanner each day and they end up stored on a blob field with the customer's account on another indexed char field. 80% accuracy means that 20% of the account numbers weren't read and thus we need a human opening these document to attach them to the correct customer. If you are interested on how to use Tesseract API from (x)Harbour, I will gladly provide source samples for you to try it. I'd love to solve the problem of not being able to change psm mode to 3 for more accuracy with my documents. Maybe you can help. Reinaldo.

Viewing all articles
Browse latest Browse all 26233

Trending Articles