Intelligent Enterprise featuring Transform
START NEWS & ANALYSIS OPINION CHANNELS PRODUCT GUIDES REVIEWS TECHWEBCASTS
CONTACTS ARCHIVES ADVANCED SEARCH

December 1998

The State of Intelligent Character Recognition

Arthur Gingrande

ICR has evolved from the cutting edge to commonplace in many forms applications. With Validation and context analysis, hand print is old hat. Cursive is the next big challenge.

Once thought of as "bleeding edge," intelligent character recognition (ICR) of hand print is now taken for granted in many forms processing applications. Whether it's a catalog company accepting order entries or the IRS processing your tax forms, many users now expect ICR software to perform as well as or better than human data-entry operators.

Virtually every ICR classifier or "engine" in today's marketplace involves some kind of neural network to refine its recognition accuracy. Neural network engines emulate the way humans think. They can be trained to accurately recognize instances of highly variable patterns like hand print by exposing them to a multitude of examples of hand print characters.

ICR engines not only provide their best guess as to what a particular character is, they are programmed to reveal their second and third (or even fourth) choices. They also assign a numerical index or "confidence value" to indicate the level of faith in each choice. The more ambiguous a character appears, the lower the confidence value will be for that character selection.

ICR engines can be programmed to accept or reject character selections based upon a user-definable confidence threshold value.

9 Leading ICR Vendors

There are close to one hundred different companies of various stripes -- vendors, VARs, toolkits suppliers and systems integrators -- that make or assemble from components some kind of ICR-based, automated forms processing solution. There are fewer than a dozen companies, however, that still go to the trouble to manufacture their own hand print character recognition engines.

Virtually every ICR engine is accurate enough to be used in a generic hand print recognition application. At the same time, some engines are better-suited than others to certain vertical applications. Here is a summary of products from nine leading suppliers.

Caere. Even though it pioneered some of the technology used in ICR, Caere (Los Gatos, CA 408-395-7000 www.caere.com) never got around to offering an ICR product -- until now. Caere's acquisition of Recognita last year gave it hand print ICR technology, which it is now releasing as part of Developers Kit 2000, an advanced toolkit released last month. The kit offers open architecture and a single source for multiple recognition technologies, including OCR, ICR, bar code, MICR and OMR.

Developers Kit 2000 actually features two hand print ICR engines. The Recognita engine supports numbers only and is designed for applications where a restricted character set is desired. The second handprint module is from reRecognition and is designed for alphanumeric handprint recognition. The latter can handle 10 languages and run at speeds of 150 CPS. The product, which is priced at $5,495 including the ICR engines, is intended to simplify the integration of multiple recognition modules.

Ceresoft. Ceresoft (Silver Spring, MD 301-445-8413 www.ceresoft.com) tackles unconstrained hand print and natural cursive handwriting with FreeStyle. The engine integrates a low-level neural network with a high-level linguistic processor to achieve high accuracy on handwritten text. Words can appear as free text or in data fields on forms.

FreeStyle uses finely tuned dictionaries, lexical and geometrical context, and word-element analysis to read cursive handwriting. FreeStyle was introduced at last May's AIIM show with a knockout demo that was capable of reading ordinary cursive handwriting on the spot. The engine can also recognize machine print.

Ceresoft has its own modular forms processing solution called FormAgent Suite, and the company also offers a toolkit version called FormAgent Freestyle Developers toolkit, with prices starting at $8,750.

Gentriqs USA. This company is the new US distribution arm of Gentriqs of Frankfurt, Germany. Their engine, Cleqs, was formerly distributed in the US by Gator Bait Software under the name Interpregator. Gentriqs USA (Tampa, FL 727-736-7728 www.cleqs.de) was formed in August after acquiring the programming assets of Gator Bait out of bankruptcy.

Cleqs, which means ink spot in German, was originally introduced as an ICR toolkit to compete against Nestor Reader and AEG solely on the basis of accuracy. The engine is said to offer improvements and innovations in the use of context, syntax and lexical analysis. The German parent company has OEM relationships with international forms processing vendors ReadSoft, Top Image Systems and Easy Archiv.

Cleqs has a reputation for accuracy on numerical fields, and it also handles segmentation of touching alpha and numerical characters. Cleqs' DLL prices are based on performance. The toolkit starts at $6,200 and prices go up to $50,000 for a 300-character-per-second license with hand print, machine print, OMR, barcode, MICR and OCR. The company has more than 10 language classifiers including the major European languages.

Mitek. Mitek's Quickstroke engine offers outstanding accuracy in reading numerical characters, particularly the difficult hand printed courtesy amount field on checks (as opposed to the legal amount field that consists of hand written words). Quickstrokes handles background and form removal and is capable of recognizing machine print on forms as well as unconstrained hand print.

Quickstrokes is heavily used in banking and financial service applications. About a year ago the company partnered with Parascript on a product called Checkscript, which uses a voting algorithm to obtain high accuracy on the cursively written legal amount on a check.

Mitek (San Diego, CA 619-635-5900 www.miteksys.com) formerly sold the Quickstrokes engine only as a component, but it has added its own forms processing solution called Premier Forms Processor, which is priced at $33,780 for a complete system. The Quickstrokes toolkit starts at $2,800. Courtesy amount recognition brings the toolkit pricing up to $17,000 .

NCS. Originally developed by the neural network development firm Nestor, NestorReader in 1991 became the first low-cost ICR engine that could run under Windows. As a result, NestorReader quickly developed the largest installed base of any ICR toolkit in the business. Nestor pioneered features like multiple-zone definition, context and syntax analysis, multiple and customized memories, automatic fax recognition and routing, and built-in deskew and line removal.

Now owned by National Computer System's NCS Recognition Products Group (Lincoln, RI 401-334-4811), NestorReader remains an outstanding hand print character recognition engine. NCS has refined the basic engine's image pre-processing, word and character segmentation, use of context and dictionaries and forms processing capabilities to improve accuracy. Toolkit pricing starts at $3,500.

Orbograph. Orbograph focuses on check recognition with its OrboCAR engine. OrboCAR reads unconstrained, handwritten numeric courtesy amounts from checks, remittance stubs and bank control documents.

In July, Orbograph (Billerica, MA 978-901-5042 www. orbograph.com) introduced OrboCAR Gemini, which combines OrboCAR with ICR technology licensed from Lucent Technologies. The combined product employs voting to help handle touching and overlapping characters, patterned backgrounds, inconsistent character densities and a wide range of cent formats. Pricing starts at $20,000.

Parascript. Parascript (Boulder, CO 303-381-3116 www.parascript.com) is a spin-off from ParaGraph International, the company that originally developed the recognition technology behind Apple's Newton. Parascript concentrates on recognizing natural handwriting, that sloppy, wobbly cursive that gets progressively less legible as we get older.

Parascript markets two recognition engine toolkits: FieldScript, for recognizing handwriting data fields on forms, and CheckScript, for recognizing check amounts. Unlike conventional ICR engines, Parascript technology is not neural network based, and it recognizes words, not individual characters. This is done with the aid of complex data validation routines, which compare ICR results with words through the use of special word and element dictionaries. The result is an ICR engine that can handle natural, cursive handwriting as well as unconstrained hand print.

CheckScript software boosts accuracy with algorithms that cross-validate the handwritten words in the legal amount field against the handwritten numeric characters in the courtesy amount field on a check. Parascript is banking on Checkscript becoming its "killer app." The company's pricing starts at around $15,000.

Siemens/CGK. Siemens/CGK (Vienna, VA 703-848-2117 www.cgk.de)now owns two recognition software technologies -- AEG Recognition Software and RecoStar. There is a planned migration path that will merge the two products by the year 2000. Until then, their products remain distinct and defined by function.

AEG Recognition Software ($4,500) has always been known as a premier ICR engine for high-speed production systems. AEG is noted for its rigor in tackling difficult character types by refining its ICR algorithms to improve recognition performance.

Speed has always been AEG's forte. Over the past year, AEG released a software toolkit version for use in midrange forms processing software products, with pricing based on selective, negotiated, volume-purchase agreements. This engine is licensed by a number of prominent forms processing and capture vendors, including MTI, Datacap, Captiva, Input Software and Cardiff.

CGK RecoStar differentiates itself from AEG by virtue of the fact that it can read unconstrained hand print highly accurately. It does an excellent job at segmenting touching characters and is also capable of removing complex backgrounds from hand printed data. Like the AEG engine, RecoStar uses trigram analysis to validate recognition results.

RecoStar's prowess in courtesy amount recognition (CAR) helps make it the clear leader in the banking market in Europe. The company's toolkit pricing starts at $4,500.

What About Speed?

Virtually all vendors are selling products that start out at speeds of 25 characters per second (CPS) for their entry-level product. In fact, most of them (Mitek, AEG, CGK, OrboCAR) put artificial "brakes" on their systems in order to price them by speed, regardless of the CPU you're using. This means that if you find yourself paying $15,000, $20,000 or more for a high-production system, chances are that your ICR engine is running at 100+ CPS.

There are no "bad" ICR engines anymore, but each ICR engine tends to possess its own point of view, or recognition bias, which makes some ICR engines better at certain applications than others.

Mitek, for example, is outstanding at recognizing hand printed numerics, and, along with OrboGraph and CGK, is among the handful of vendors that can recognize the courtesy amount on a bank check. AEG combines high speed and accuracy. NestorReader has broad market acceptance while Caere has entered the ICR market with an integrated toolkit. Parascript and Ceresoft, finally, are exploring the cutting edge with cursive recognition.

Arthur Gingrande is a partner of Imerge Consulting. He is based in Arlington, MA, and can be reached at arthur@imergeconsult.com (781-646-1893).


Related Articles:

 




Channels
Business Process Management
Content Storage
Content Management
Compliance
Enterprise Solutions
Document Scanning & Capture
Content Delivery & Publishing
Collaboration & Knowledge Management
Search and Classification
Locate an article from our print magazine. Just enter your Locator ID Number below.
ID#


NEWS FROM THE PIPELINE

OpenOffice.org 2.0 Closes On Final

New Study Finds Steep Growth For Smartphones

PalmSource Sale Cleared By Federal Agency

CTIA Panel Examines Enterprise Security Risks

[more]






HOME | ARCHIVE | REALWARE AWARDS

A Publication of the Network Computing Enterprise Architecture Group
Brought to you by CMP Media LLC, Copyright © 2005
Privacy Statement | Your California Privacy Rights | Terms Of Service