December, 1996
Imaging-OCR
The process of turning data on paper into organized, usable electronic document information is nebulous at best. Describing it can be even worse. Technology vendors sometimes talk in terms so multisyllabic that they obscure the forest for the trees.Some might skip the nitty-gritty altogether. Not a good idea. Try to follow me on this. Look at your present (or future) imaging system as a forest. The scanners, servers, and workstations that inhabit your forest are the trees.
Still with me? Now think of your software and network architecture as the roots and branches that connect the trees making up your forest/imaging system. What keeps the whole thing alive? OCR/ICR technology is akin to the chlorophyll in the leaves.
Like the green stuff that feeds tress by turning sunlight into sugar, character recognition is a microscopic, yet essential, element in any intelligent information processing system. Blaine Owens of BancTec (Irving, TX 214-579-6000) points out in Harvey Spencer's new book "Automated Forms Processing" (Flatiron Publishing, 212-691-8215) that understanding the chlorophyll of imaging is requisite if you are to see a cost-effective forest.
"Within forms processing," Owens writes, "the major objective is to capture the data on a form as fast and as efficiently as possible. The fastest scanner, the best image enhancer and the most efficient data completion subsystem [Editor's note: trees and roots] won't justify the cost if the OCR/ICR does not perform. OCR/ICR technologies should be the deciding factor when justifying a system."
Which technology to use for your application? Hard choice. There's a lot of variety within. Character recognition: OCR and ICR, "voting engines" and "neural networks," zone and full-text recognition. No single approach is best. Comparing them against each other is, if you'll excuse me again, like comparing apples and oranges.
Even definitions of OCR and ICR are confusing. Some people define OCR as reading just machine print and ICR as reading everything else, like handprint. From a forms processing perspective, Blaine writes, OCR technology does its thing, turning images into characters, at the time the image is being captured.
It's usually tightly integrated with your scanner hardware and is most useful in high-volume applications. ICR usually reads data from a stored digitized image that was captured some time before.
The ICR server, sometimes configured as a standalone processor, will take the image from the server, perform recognition functions on it and attach it back to the original image.
OCR/ICR products can get confused when it's time to buy as well. "Engines" can be installed as components by a developer into an existing system. The same engine can be repackaged into a suite or "toolkit" of components. Sometimes the engine will be combined with additional recognition features and sold as an integrated piece of software. Other times, the character recognition aspects are just one piece of a complete "turnkey" system that integrators might offer.
Let's look at some different OCR/ICR technologies, how they are sold, and the applications for which they work best -- in other words, the path best travelled to good character recognition in your imaging system.
Recognize Your Right To Vote
Maxsoft-Ocron (Fremont, CA 510-252-0200) uses OCR voting technology they call MORE (Multiple Optical Recognition Engine) in a number of different products, including their recognition toolkit called Recore 5.0 ($4,000).
MORE uses four different OCR "engines" to store and translate paper-based images into a digital format. If a character or shape is not recognized by one engine, it will be recognized by another. Each of the four separate OCR engines is based on a different OCR technology.
One of the criticisms connected with the voting approach to OCR is that it is slower than other approaches. After all, processing the data four times will take longer than once. Maxsoft's Multiple Engine Manager (MEM) helps mitigate this. It decides how many engines should be turned on, based on the quality of the document.
For a high-quality document with clear characters, the software will only use one or two engines. During the recognition process, if the software encounters a problem, the MEM will turn on another engine for extra "help."
MORE also gives the user control over the process, allowing him or her to decide how many engines to use during recognition. If you want to triple-check all of the characters, you can manually turn on all the engines -- even if the software is unwilling to use them all.
The Maxsoft-Ocron Voting Manager coordinates information from the different engines and uses it to make final decisions. The Voting Manager knows the relative strengths and weaknesses of each engine and how to combine their confidence measures and data.
If two engines come up with one answer and the third shows something different, MORE sides with the majority. If there is another conflict, an Internal Rule Based System (based on the engine most qualified to translate the particular data) will determine which version is used.
The company creates their own consumer products to compete with mid-level OCR products. Two examples are Caere's (San Jose, CA 408-395-7000) OmniPage and Xerox's (Waltham, MA 617-672-7500) Textbridge Pro. In addition, Maxsoft-Ocron works on turnkey solutions, providing hardware and software packages, such as business card readers and complete scanning solutions, to customers.
Prime Recognition (San Carlos, CA 415-637-8382) offers their PrimeOCR 2.1 ($3,000) for high-volume OCR. They provide this product both as a Windows 3.1 and NT-based OCR engine and as an Application Programming Interface (API) toolkit.
Prime combines their proprietary voting algorithms and recognition technology with software licensed from such retail OCR vendors as DataCap (Tarrytown, NY 914-332-7515). DataCap claims their product reduces OCR errors and thus the cost of manual error correction by an average of 65% over conventional OCR products. This lets you save big -- fixing OCR errors by hand often accounts for over 50% of the total annual imaging system cost. PrimeOCR also offers upgrade options that provide 75% and 80% higher recognition accuracy.
The 2.1 version adds several functions, including auto-zoning, which automatically selects areas of text on the page to OCR, batch job management, character training, and a Visual Basic applications programming interface.
OCR/ICR And More
If talk of engines and toolkits sounds too nitty-gritty, other vendors can provide you more integrated applications. These products usually come with added functions, like indexing and image cleanup, that spread into the greater imaging forest. Some competing products will even have the same OCR engine but handle other aspects of the imaging process differently.
These applications are also sometimes scaled to the size of your application. OCR for Forms 2.1 ($9,500) from MTI Enterprises (Tampa, FL 813-222-0414) has added over 30 new features to their OCR software.
Version 2.1 lets users of OCR for Forms gain precise control over many aspects of the OCR and imaging process. Among the functions it automates are data classification, data extraction, system configuration, forms management, quality control, data validation and systems administration.
One new feature, double key entry, addresses the need to provide higher-than-usual quality control. MTI says that OCR for Forms consistently attains accuracy standards that match the performance a human could achieve.
But there are certain procedures, such as transaction processing in banking and finance, that require 100% accuracy.
With this new feature, key operators enter data independently of the automated process. The product automatically compares output -- including check box data -- from OCR for Forms against the manual data entry. The same or another operator reviews exceptions for correction.
This system of quality control meets the highest standards required by production operations.
Other features include: address extraction, zip code verification, the ability to deal with attachments to forms, new methods for launching tasks from zones, password security options and the ability to change recognition results on a font-specific basis.
The Hungarian company Recognita (Budapest, 36-1-201-7973) offers a family of recognition products.
Recognita Plus ($700), a multipurpose font-independent OCR package for PCs, recognizes more than 80 different languages. Recognita Plus is for the SOHO user. It supports most commonly used desktop scanners. It runs under DOS, Windows 3.1, NT and OS/2. For integrators, a Development Toolkit (DTK) ($930) provides access to all functions required by target applications.
Recognita Select ($700) meets moderate, less complex OCR needs. It also recognizes 80 languages. Its main purpose is to process daily business correspondence. Information flows into an internal database, which lets the user scroll, search or sort records, compile them into lists, filter them, print them or export them to ASCII.
Reconita Form ($2,300) processes data extracted from forms in a Windows environment. Forms can contain numbers, checkboxes and even barcodes. Identify form fields by their content. Export data to ASCII or XBase format.
Finally, Recognita GO-CR ($95) is an OCR software program for hand scanners that runs on Windows 3.1. According to the company, GO-CR recognizes mono-spaced and proportional text with a high accuracy rate. It comes with a built-in text editor.
Give me an I(CR)
Everybody who knows OCR knows Nestor. They also know that NCS (Providence, RI 401-331-9640) took Nestor's technology as their own (well, bought it!). And if they don't already know it, they should make the acquaintance of NestorReader 3.0 ($3,500). This product is a set of ICR development tools that let systems integrators, developers, and OEMs like Diamond Head Software (Richardson, TX 972-479-9205) and Imagination Software (Silver Springs, MD 301-588-8411), build forms processing solutions.
Nestor technology uses multiple neural-network architectures that go beyond voting algorithms.
Characters filter through the recognition process depending on their complexity and difficulty. This process recognizes neatly printed, isolated characters differently from sloppy, run-on or distorted characters.
Features like pre-processing, segmentation, recognition and zone text help your system process handwritten text much the way that humans recognize words. Image pre-processing removes noise and lines and enhances text. Since this stuff is integrated with the recognition functions, the system uses its intelligence to distinguish character accents and punctuation marks from noise.
Word- and character-segmentation parameters separate words and characters. There are four types of word-level segmentation: Multiple Lines (separates line vertically as in an address block), Single Line (groups words and spaces on output), Single Word (compresses spaces) and Single Character (shuts off word segmentation).
Recognition uses neural networks for identification of hand-print and machine-print. The original Nestor algorithms are more than 20 years old. The heart of the system is a set of trained data sets called "Recognition Memories." These memories may be called from each individual data zone.
NestorReader 3.0 contains two levels of context -- standard English context (this can be modified for international languages) and user-defined dictionaries. The Context setting uses statistical frequencies of character strings to assist the recognition process.
User-defined dictionaries work like standard context, except that customized word lists are used. This helps in forms processing apps, where each field can have unique characteristics (state abbreviations, names, etc.).
Dictionaries can be set for recognition assist (character strings assist recognition) or forced match (optimal word or number in the dictionary is returned).
Dictionaries can be very powerful recognition tools. Not only do they use primary recognition results, they also offer you backup word alternatives so you can achieve an optimal word value.
Related Articles: