|
July 2000
SCAN STATION:
Color Indexing
By Maria Medina
Identifying and indexing large volumes of documents can be extremely time-consuming, especially if the fields you need to index dont appear in consistent locations on the page. Two new developers toolkits, Colindex by Dunord (www.dunord.com) and Color ID by Kofile (www.kofile.com), let you capture fields with the stroke of an ordinary highlighter pen. The toolkits recognize the highlighted fields on the page, pick them out, remove the color and send the fields to an application.
These toolkits can be used to capture portions of text you need for many purposes. In a legal environment, for instance, you can separate the pros and cons of a case using different color highlighters and marry them to another application. You can also single out key words in the document to assist with retrieval. In accounts receivable departments, data extraction can be simplified by highlighting purchase order numbers in yellow and customer names in green.
The highlighting toolkits can also be used for document identification. Think about the variety of documents generated from a single loan applicant. Aside from the application form, the lender has to keep credit reports, titles, collateral documentation, reference letters and more in a single file. You could assign each record type a color, highlight the records using a single line at the top of the page and link each color to an appropriate workflow. The stroke of a highlighter could also separate batches, denote attachments or indicate the start of a new multi-page document.
Quick Scan
Product: Colindex
Supplier: Dunord (www.dunord.com) Montreal, Quebec, 514-284-3123
File Formats:JPEG, BMP
Street Price: $1,495
Strengths: Simple, easy to use and affordable with a helpful Wizard feature. Toolkit bundle includes color dropout application.
Weaknesses: Recognition software is not included.
Product: Color ID
Supplier: Kofile (www.kofile.com) Rochester, NY
716-424-1950
File Formats: JPEG, BMP
Price: $10,000 including a grayscale OCR toolkit.
Strengths: Toolkit bundle includes grayscale optical character recognition software. Color ID had accurate hue differentiation; integrations planned with Kofiles Kovis and Kodaks MVCS software.
Weaknesses: Trouble reading some fields; treated some single fields as multiple fields.
|
We tested demos of each of these toolkits in our lab to see what color highlighting is all about. Obviously, youll need a color scanner to take advantage of this technology. We used a Digital Science Scanner 3590C from Kodak (www.kodak.com/go/docimaging).
Colindex is bundled with Dunords Colerase color dropout application at a combined price of $1,495 (run-time licenses are priced depending on volumes). This system works independently from color scanning applications, and it lets you use as many as seven different colors. You can run the highlighter over, just above or around the area you want to capture. The image snippet is then ready to be sent to other applications.
We liked the Colindex Wizard feature, which guides you through such parameter settings as mark size and spacing. Setting the correct parameters helps capture fields more accurately. For example, with the right settings you could avoid capturing a yellow logo of similar color and size to a yellow-highlighted text or data field.
Kofiles Color ID works similarly to Colindex, but Kofile is bundling it with its grayscale optical character recognition toolkit so you can both capture and recognize text highlighted with color. The toolkit is priced at $10,000 (not including run-time licenses). Color ID will be available separately from the grayscale recognition toolkit, but a price has yet to be set. Kofile also plans to integrate Color ID functionality with its own Kovis document management software (as a $1,500 option) as well as with Kodaks mid-volume capture software (as a $3,500 option).
Color ID handles 20 different highlighter hues. The hue recognition worked well. When we tested variations of yellow highlighters, the tool correctly picked out only the shade used during calibration.
Kofiles demo was fussier about how we highlighted. If we made the highlight line a little too curved or missed a small area, it treated a single field as a multiple field. This could be a problem in extracting and recognizing data, but the product was still in beta testing. Kofile said it would address this problem before the products debut in July.
It may be a while before color highlighting is used extensively. For now, sophisticated integrators and independent software vendors will take the lead in adopting this technology. The next step will be to integrate it with off-the-shelf software.
Maria Medina
|