Intelligent Enterprise featuring Transform
START NEWS & ANALYSIS OPINION CHANNELS PRODUCT GUIDES REVIEWS TECHWEBCASTS
CONTACTS ARCHIVES ADVANCED SEARCH

December 1998

Heavyweight Competition

Leading OCR vendor Caere (Los Gatos, CA 408-395-5148 www.caere.com) has upped the ante in the recognition toolkit business with the recent release of Developer's Kit 2000, an integrated offering packing nine recognition engines in all. Designed to simplify the integration of multiple recognition modules, the toolkit is priced at $5,495.

Developers Kit 2000 is a single-source toolkit that includes OCR (machine print), ICR (handprint), bar code, OCR-A, OCR-B, MICR (E13B) and OMR. There are two OCR engines included. The Omnifont machine print OCR recognition module is used in Caere OmniPage. It recognizes English, UK English, Spanish, French, German, Italian, Portuguese, Swedish, Danish, Dutch, Norwegian and Brazilian (Portuguese). A second OCR module from Caere's Recognita unit recognizes more than 110 languages.

The output format can be generic Unicode or ANSI or can be filtered through INSO conversion modules into popular word processing formats such as Microsoft Word, WordPerfect and RTF (Rich Text Format) as well as HTML.

Developers Kit 2000 offers an open API architecture, so 32-bit software developers have the flexibility to integrate other individual image capture technologies or products not shipped in the Caere product. ActiveX or C interfaces help drive document processing.

Last January, Caere rival ScanSoft, a Xerox company (Peabody, MA 978-977-2000 www.textbridge.com), unveiled V4.5 of the TextBridge Application Programmer Interface (API). The toolkit is designed to help developers build and customize their own OCR applications. The toolkit employs cooperating expert subsystems, working through central controllers, that contribute to the analysis and recognition of characters and words, as well as understanding the underlying page. It is priced at about $5,000.

The TextBridge API is designed for the Microsoft Visual C and C++ development environment. In addition to improved recognition, TextBridge API will give developers automatic preprocessing capabilities such as page segmentation, rotation, fax/dot matrix detection/adjustment, lineart detection, reverse-video detection, noise removal, and deskew.

The API supports 12 languages including English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish and Swedish as well as 15 built-in lexical classes. An additional feature of the API includes the Expanded Lexifier(tm) - a natural language system that increases recognition accuracy for classes of text commonly found in business documents that are not true words, including telephone numbers, dates and social security numbers. Once text has been processed, it can be output into ISO or XDOC format (XDOC being Xerox' fully documented rich text mark-up language). Similarly, output data can also be sent to either a file or application-defined buffer.


Main Article:

 




Channels
Business Process Management
Content Storage
Content Management
Compliance
Enterprise Solutions
Document Scanning & Capture
Content Delivery & Publishing
Collaboration & Knowledge Management
Search and Classification
Locate an article from our print magazine. Just enter your Locator ID Number below.
ID#


NEWS FROM THE PIPELINE

OpenOffice.org 2.0 Closes On Final

New Study Finds Steep Growth For Smartphones

PalmSource Sale Cleared By Federal Agency

CTIA Panel Examines Enterprise Security Risks

[more]






HOME | ARCHIVE | REALWARE AWARDS

A Publication of the Network Computing Enterprise Architecture Group
Brought to you by CMP Media LLC, Copyright © 2005
Privacy Statement | Your California Privacy Rights | Terms Of Service