Intelligent Enterprise featuring Transform
START NEWS & ANALYSIS OPINION CHANNELS PRODUCT GUIDES REVIEWS TECHWEBCASTS
CONTACTS ARCHIVES ADVANCED SEARCH

March 2000

New Directions in Data Capture

By Penny Lunt

Do you want to put those high-volume forms on the Internet to ease the burden of paper processing? Do you have handfuls or scores of locations all shipping paper off to a central site? Are you tired of sorting and batching paper-based documents that are similar but not identical (e.g., purchase orders)?

The latest document and data capture solutions offer answers for all these challenges. They also offer more built-in functionality and customization features that will give your specific application that much more power and efficiency.

Reader Resources

Captiva Software
San Diego, CA
888-320-1000
ProductInfo 300

Cardiff Software
Vista, CA
760-936-4500
ProductInfo 301

Ceresoft
Silver Spring, MD
301-445-8413
ProductInfo 302

Input Software
San Jose, CA
408-325-3950
ProductInfo 303

Kofax
Irvine, CA
949-727-1733
ProductInfo 304

Microsystems Technology
Tampa, FL
813-222-0414
ProductInfo 305

Mitek
San Diego, CA
888-363-6767
ProductInfo 306

Recognition Research
Blacksburg, VA
540-961-6500
ProductInfo 307

Top Image Systems America
Carlsbad, CA
760-918-1660
ProductInfo 308

The basic things you look for in a data capture or forms processing application are accuracy, speed, scalability and reliability. Most forms processing software products now provide image enhancement, forms ID, OCR and ICR (machine print and handwriting recognition), database lookups, rules and validations, and key from image.

Reviewing the newest features in this year's crop of upgrades, we found three common themes: e-savvy capture, remote capture and freeform recognition. To go beyond the marketing spin, we've tracked down users who have actually tried these new features. Read on to learn from their experience and advice.

Processing Paper, Fax and Internet Input Streams

When drug companies want research on what drugs doctors prescribe, they often turn to Health Products Research (HPR) in Whitehouse, NJ. Merck, for example, might want a report that shows its popularity versus competitors and the sales performance of each of its drugs. HPR's surveys are as long as 12 pages, with questions on the front and back. They contain check boxes, bubbles and handwriting (where the doctors explain why they prescribe particular drugs). HPR receives 600 to 1,000 of these surveys a day.

HPR wanted to start accepting electronic as well as paper forms. "When doctors fill forms out electronically, it's easier on us," explains Bret Piano, executive director, software development. "There's less to verify and it's easier than reading handwriting. Doctors' handwriting isn't the easiest thing to read."

On the other hand, not all doctors have access to the Internet or email. Piano estimates that only 35% of doctors have computers. So it was equally important to continue mailing and, in the future, faxing the surveys.

Last September, HPR purchased Teleform Enterprise (which starts at $8,995) from Cardiff Software (www.cardiffsw.com), and they have since implemented paper-, HTML- and PDF-based forms processing. Paper forms don't have to be batched. They are identified and processed automatically.

"We just throw all the mail we get into the scanner (an 8080D from Bell & Howell) and Teleform recognizes what type of form each one is," says Piano. The software can handle forms designed within the software as well as pre-existing forms. It accepts address changes and status changes (such as a doctor's retirement or death) and updates the HPR database accordingly.

Piano says he liked the way he can write "helper applications" to modify Teleform. These are written in Visual Basic and access Teleform's Pervasive SQL database. HPR added a helper application that lets two operators verify different fields on the same form at the same time. One person might be looking only at company names while the other might be looking only at email addresses.

HPR is also customizing the dictionaries used for validation and word fill-in. If the operator wants the company "Pfizer," they can simply type in the "Pf" and the rest is automatically filled in, saving keystrokes.

"We wrote a little application that goes into the back end and updates those tables," he explains. "There might be a new company called ‘Pfizer-Smith.' We can update that in the dictionary."

HPR has used Cardiff's HTML- and PDF-based modules for placing forms online. Piano reports no glitches in using the HTML forms module. They use ASP scripts to get Teleform to read emails delivered from the Web server.

Thus far, Teleform PDF+Forms has been used for one project with mixed results.

"We liked [PDF+Forms] because we could email the file to the doctor and the doctor could fill it out on a PC or laptop or print it out and fill it out on a plane, then send it back at their leisure by email or fax," Piano says. "We could design the form once and put it on the Web and paper."

There were, however, some technical hiccups. HPR had difficulty populating Adobe's proprietary FDF (file data format) with their own data to create the forms. And some doctors with older versions of Adobe Acrobat Reader were unable to fill out the forms. Piano says HPR will figure out the workarounds and try PDF+Forms again.

Piano has been satisfied with Teleform's accuracy and speed. On one project, 9,000 forms were returned from the post office with notices that the doctors had moved. It took a mere three hours to scan all those forms and collect the doctors' ID numbers. Only 200 contained questionable characters that needed to be verified.

Soon HPR will try Teleform's fax feature, which pulls faxed-in information from a fax server and processes it like any other form.

E-Savvy Capture

Description: In this scenario you can use a single platform to capture and process paper, Web, fax, email and/or EDI input streams. Everything is processed with the same rules and can be exported to the same or, in some cases, multiple destinations.

Benefits: Consistent data through uniform application of rules across all input streams. Single investment and training program saves time and money. Forms have a consistent look and feel for customers, partners and employees no matter how they interact with you. Complete reports spanning all input streams support better decision making.

Vendors: Cardiff Software (www.cardiff.com) supports HTML- and PDF-based online forms as well as fax input (see case study above). OCR For Forms from Microsystems Technology (www.microsystemsonline.com) lets you create online entry screens and Internet forms. Data collection is handled through a Java applet or dynamic HTML.

Three more e-savvy offerings are planned for the second quarter. Captiva Software (www.captivasw.com) will add Internet forms support to FormWare 3.0. Input Software (www.inputsoftware.com) will support both capture and distribution of data through EDI, BizTalk, XML and e-mail in its InputAccel product. FormWorks 3.5 Internet Edition from Recognition Research (www.rrinc.com) will provide Web-based image retrieval and Web-based forms with online field validations and lookups. It will incorporate EDI features as well as all the FormWorks paper forms processing capabilities.


Distributed Capture

Description: Remote capture lets you scan forms and other documents at satellite offices and pass the images over a LAN, a WAN or the Internet. Paper is retained locally while indexing, validation and processing can be handled centrally.

Benefits: Rapid return on investment through savings on shipping costs and faster processing and approval time. This isn't entirely new, but it's just catching on at banks, brokerages, trucking companies and other organizations that need quick turnaround times.

Vendors: Kofax supports distributed capture through Ascent Capture 3.0 (see case study above). Input Software not only supports distributed capture with InputAccel, it lets you export data to multiple systems in multiple locations — be it databases, ERP systems, document archives, etc. Microsystems Technology's OCR for Forms supports remote scanning over the Internet and local networks. Captiva's FormWare 3.0, to be released in April, will include remote scanning clients that will communicate with the central server via File Transfer Protocol (FTP). Cardiff's Teleform system supports fax-based capture.


Freeform Recognition

Description: The ability to capture and process similar forms that don't follow a designated template, such as invoices from different suppliers or purchase orders from different customers.

Benefits: Lets you automate processing of unstructured forms that until recently required manual keying. Purchase orders, delivery logs and certain legal documents can have their data automatically stripped out and entered in a workflow.

Vendors: Mitek (www.miteksys.com) and Ceresoft (www.ceresoft.com) both offer technology they call "document understanding." Input Software is adding technology licensed and enhanced from Mitek's Doctus and Cogniform products to InputAccel 3.0, which will be released in April. Captiva will be releasing its own freeform recognition system with FormWare 3.0, also set for an April debut.

Capturing Data & Docs From Remote Locations

New account documents at brokerage firm J.C. Bradford used to travel by interoffice envelope from 80 branches to the headquarters mailroom in Nashville. From there, the documents entered a workflow that passed through several departments. At various points in between, many of these documents were lost and had to be recreated by branch staff or, worse, prospective customers. Lost documents also meant regulatory trouble when a stock exchange or the SEC made one of their frequent, random compliance checks.

To rectify this problem, they sought a remote capture and forms processing system that would let them scan new account documents at the branches and index and process the images quickly at headquarters.

"These documents have to be processed in a timely manner," says Thurman Bush, imaging administrator.

About 350 to 400 new accounts are opened a day, generating around 10,000 documents. As of late January, 22 of the branches had begun scanning these documents and sending them to headquarters using Ascent Capture 3.0 from Kofax (www.kofax.com). Eventually all branches will migrate to the new system.

Ascent Capture 3.0 software offers image processing features, but Bradford has kept things simpler for remote operators by deploying Fujitsu scanners featuring Kofax's Virtual ReScan technology. Virtual ReScan is a combination hardware and software system that makes automatic improvements in image quality by referring to grayscale versions of the images. "That makes the images trouble free," Bush says.

The people at each branch who handle the scanning batch documents into three classes: new account forms, W9 forms and all others. The images are sent via T1 lines to the Nashville data processing center after midnight, when network traffic is minimal.

At the central server, the Kofax software identifies and processes the new account forms and W9s as forms. Other documents are sent to indexing workstations and then into an Eastman Software workflow system. If there's a problem with an improperly scanned document, operators email that image back to the remote site with instructions to rescan the documents. The branches keep the paperwork onsite for five days just in case. Once processed, the imaged documents are stored offsite for seven years to meet SEC requirements.

J.C. Bradford has cut FedEx and other shipping costs and it has improved efficiencies using the new system. The next projects will be automating purchase orders, expense reports and anything else that is now sent through interoffice mail.

Document Understanding

Most forms processing products have automatic form identification that looks for registration marks, certain words or numbers or even the topology of the page. This type of form IDworks great if your forms have a consistent look to them. If your forms are things like invoices or purchase orders from different companies with different layouts and terms, such automatic ID doesn't work.

Freeform recognition lets you process unstructured forms. This is new technology and no vendor was able to provide a customer reference for it at press time. However, we think this feature could be valuable to readers, saving the time of separating paper into batches.

Ceresoft's DocAgent software classifies documents into three groups. The first is forms that have a similar format but have an indefinite number of items in a particular column, such as phone bills of the same design. The software allows for an undefined column length. The second is forms that have a similar but not identical layout, such as varying but similar health care claim forms. Here the software uses a flexible version of template matching. The third type is forms that have similar content but very different layouts, such as invoices from different companies. Here DocAgent OCRs the entire page and applies an "Intelligent Script" to determine what data elements need to be extracted and how to find them.

Mitek's version, Cogniforms, uses machine learning. You scan in samples of a form and highlight the data elements that need to be captured. Cogniforms automatically learns the characteristics of the document and picks up clues about where to find the data. For example, if every sample has the words "Go to:" to the left of a billing address, then it will look for "Go to:" on future forms. You can highlight fields and rows and tell the system that their lengths will vary, which accommodates things like phone bills.

Cogniforms could distinguish forms that are very similar with rules, depending on the forms and how similar they are. For forms that are all completely different, such as invoices from 300 different companies, you might be able to create a few templates that cover groups of the forms.

Captiva will be introducing freeform recognition with the next release of FormWare, 3.0. Their technology will OCR/ICR a region of the document, several regions or the entire document while using fuzzy searching to find a key word or data pattern such as "invoice date." This works for completely unstructured documents such as invoices and purchase orders. It also works for forms that have some fixed data elements and some floating ones. Dialog boxes let you tell the software where to look for each element.

FormWare could already handle forms with sections of different lengths, such as phone bills, with "table zones."

More Innovations

This article focused on three types of upgrades in forms processing products. But manufacturers have made other improvements in their software, and several plan to unveil these upgrades at the upcoming AIIM show in New York, April 10-12.

Key-free indexing — Input Software's InputAccel 3.0, to be announced at AIIM, will include a new indexing module that will let you simply highlight the required fields rather than type them in (they'll be automatically OCRed and processed). Input will be showing a beta version of DynamicInput at the AIIM show, too. This will provide a Web form that doesn't look like a form, it will provide more intuitive screens.

Faster form ID — Microsystems Technology is soon to release a new form ID feature that is said to improve throughput. It looks at the topology of a form and puts secondary and tertiary maps behind it that provide a claimed 99.9% accuracy.

Color support and QC — Recognition Research's FormWorks 3.5 Internet Edition (scheduled for AIIM) will support BancTec scanners and scan applications and the Siemens ScanStar scanner, which provides a scan module for dropping out mixed colored forms. RRI will also unveil a new SQC Tool for quality control and reporting.

Recognition improvements — Top Image Systems (www.topimagesystems.com) has added two new elements to its AFPSPro software. SuperICR provides trainable ICR voting. Multiple ICR engines vote on their recognition of characters. Voting is continuously compared to truth data tables until the ICR engines learn to vote better. That reduces false positives. The cost to fix a false positive is ten times more expensive than to just have someone type in the character, according to TIS America president Joseph Busque. The second new thing, the Just ICR engine, is a trainable software recognition engine. It was created for an application in Japan that had a stamp on it. By scanning the stamp several times, they taught the system to recognize the stamp as a character. You could take a bizarre font, scan it in and train the software to read that.

Clearly, forms processing products are heading in an electronic direction. Next month we will take a closer look at e-forms and how they can be processed with or without traditional paper forms.


Upgrading and Customizing a System

Medical Diagnostics had a forms processing system capable of handling 30,000 double-sided forms a day, but found out that it was impossible to upgrade. Having purchased the system only two years earlier, the Miami office of this California-based company learned an expensive lesson.

"We didn't want to have any problems with coding or upgrades," says Art Kozel, systems engineer. "We wanted to make sure the accuracy was good, with less than .05% substitution rate. [We didn't want] bogus data in our database."

Medical Diagnostic Systems manufactures medical quality control products that are used by clinical laboratories worldwide to test and register medical diagnostic tools. The Miami division provides quality assurance programs for customer labs. They collect instrument readings from labs all over the world, tabulate the results and deliver the data back to the laboratories.

The company needed a system that could recognize hand-written numbers and query a database to find an acceptable range for each type of test. (This is called a "one-to-many" lookup versus the "one-to-one" lookup that is common in forms applications.) They chose OCR for Forms from Microsystems Technology, which let them easily create extra validations while leaving an easy path for future upgrades.

The system, which was installed last summer, reads a barcode at the top of each form. The barcode provides references to the range lookups for each row on the form. OCR for Forms recognizes the numbers, links up to Medical Diagnostics' Oracle database via ODBC and requests the appropriate ranges. Numbers that fall out of range go to a verifier, as do any numbers under the 95% confidence level.

Medical Diagnostics' in-house software development team used simple Visual Basic scripting to create the database queries between the Unix database and OCR For Forms on NT. The company had other requirements. For example, if someone deletes a form, they want to know who deleted it. That required more VB scripting.

About 40% of Medical Diagnostic's forms arrive by fax, the other 60% come in through the mail. The system includes two identification workstations, two Kodak Imagelink 500 Scanners, four extract workstations and ten verify workstations.

Kozel tested the system with a run of 12,000 customer forms. "We checked to see whether the system could handle the volume, then went character-by-character to check how the system was doing," Kozel says. "First we had problems with 4s and 5s. Then [Microsystems] switched us to the Nestor [recognition engine], which did well with the numbers.

"The overall process has dramatically improved since we made the change," Kozel reports. "The old system had problems with higher volumes. With this system the work is taking us a fraction of the time."

 




Channels
Business Process Management
Content Storage
Content Management
Compliance
Enterprise Solutions
Document Scanning & Capture
Content Delivery & Publishing
Collaboration & Knowledge Management
Search and Classification
Locate an article from our print magazine. Just enter your Locator ID Number below.
ID#


NEWS FROM THE PIPELINE

OpenOffice.org 2.0 Closes On Final

New Study Finds Steep Growth For Smartphones

PalmSource Sale Cleared By Federal Agency

CTIA Panel Examines Enterprise Security Risks

[more]






HOME | ARCHIVE | REALWARE AWARDS

A Publication of the Network Computing Enterprise Architecture Group
Brought to you by CMP Media LLC, Copyright © 2005
Privacy Statement | Your California Privacy Rights | Terms Of Service