September 1998
The Case for Color in Imaging
by Harvey Spencer
Color not only looks better, on some documents it carries important meaning. With costs falling in line, it's time to give color a look.
Once a distant frontier, high-volume color document imaging is well on
its way to becoming a reality. Costs are coming in line, and the user-
and accuracy-friendly attributes of color are becoming more than
compelling.
Today's imaging systems use black-and-white (bitonal) images because
they are fairly small (around 50 Kbytes compressed on average) and
because the compression used (TIFF Group 4) does not lose data. But
despite many tricks, bitonal has drawbacks, particularly when image
shadings vary or the image contains a stamp or other item that requires a
better view.
Clearly, color images would make the documents easier to read. But in
some cases color is critical to an image's usefulness. This is becoming
more and more the case as color is used in print and in Internet
published documents.
Pictured below are a few examples of documents that a user might want
to store and subsequently retrieve. Documents 1, 2 and 3 convert into
black and white fairly well. The critical information is easy enough to
read, but the reader does get a clearer understanding when the document
is in color. In document 2, for example, the first four digits of the
client identification number are more clearly seen to be pre-printed when
the image is in color. On document 3, the "Cancelled" notation is much
more noticeable in color.
While it's nice to have color with the first three documents, it's
more than a luxury in the case of documents 4 and 5. The information
conveyed in color is important and can be obliterated by a bitonal
scanner despite careful adjustment (see document 5).
What About Storage?
So much for the theory, but can you afford to store color images? To
give you some idea, the 200 dpi bitonal scan (TIFF Group 4) of document 4
compressed to 112 Kbytes. Surprisingly, the same document scanned in
color (JPEG) compressed to 131 KBytes -- just 20% larger. The JPEG format
used introduced a 10% loss in image detail, but this was not discernable
to the naked eye.
In some cases, color images will be considerably larger. The 200 dpi
color (JPEG) scan of document 6, for example, compressed to 434 Kbytes --
five times larger than the 84 Kbyte compressed bitonal. JPEG is the
current standard for color, but it was designed for photographs, so
companies are working on more efficient formats geared to text and line
art. One of these, DjVu, (http://djvu.reaserch.att.com) was demonstrated
by AT&T at AIIM and provided lossless 100x color compression.
If size is a problem, you can reduce the dots-per-inch resolution
without losing readability. Color and grayscale images carry much more
information. You usually have 3 bits designating color times 8 bits
designating shading information, making a total of 24 bits of information
per pixel versus one bit per pixel in bitonal. Therefore, you can usually
substantially reduce the dots-per-inch resolution and still get a very
legible image. I scanned the color bar chart (document 5) at 150 dpi, and
it is still completely readable due to the increased tonal information.
Still, most OCR requires a minimum of 200 dpi, so in a forms processing
application you will need the higher resolution.
However, even if color image files are several times larger than their
bitonal counterparts, it's important to consider the new realities of
storage cost. In the early days of imaging you could buy a 120MB disk for
$200. Now you can buy something 30 times larger for the same price. Years
ago we were working with ISA buses, 33MHz 386 processors and maybe 2MB of
RAM on the desktop. Today you can easily multiply these capacities and
speeds by a factor of ten.
Many people say to me that most business documents are black and white
so you don't need the color overhead. True, many documents are strictly
black and white, but take a look at document 6. This invoice is mostly
black and white, but it contains shadings. When converted to bitonal,
these shadings interfere with the text and can make recognition
impossible. If you remove the shadings, you might lose the data.
Color gives you better readability, as does grayscale (though the
latter are nearly as large as color files). This is clearly visible in
the case of document 7, which is an enlargement of the back of a check.
If you look at the bitonal image, you cannot clearly make out the
endorsements. Using grayscale, the layered aspects of the endorsement
jump out; the user can read the data and discern which stamp came first.
The best of all, however, is the color image, which takes the data to a
greater level of understanding and clarity.
It is true that most OCR and ICR works on bitonal patterns, but color
and grayscale can improve forms processing in other ways.
ScanOptics (Manchester, CT 860-645-7878) use grayscale in their
9000 scanner, and SCS (Birmingham, AL 205-251-2985), which was
recently acquired by ScanOptics, can use color images to improve key
entry in their Vista Capture forms processing product. OrboGraph (Israel,
972-8-942-3769), which specializes in courtesy amount recognition, says
that they can improve recognition by as much as 20% if they use grayscale
or color images. NCS (Lincoln, RI 401-334-4811) says their Mark
Sense accuracy is improved using grayscale.
What About Scanner Costs?
If color is compelling now, then why aren't more document imaging
systems working in color -- or at least able to store color as needed?
One reason is the lack of affordable high-speed color scanners. Another
reason has been the difficulty of compressing and storing larger files
fast enough once they are scanned. This latter issue seems to have been
solved with specialized JPEG chips, such as those now available from
Picture Elements (Boulder, CO 303-444-6767).
Until recently, there were only two high-performance (60 or more pages
per minute) color scanners available on the market: the ImageTrac from
Imaging Business Machines Llc. (Birmingham, AL 205-956-4071) and
the RecoScan from CGK/Siemens Nixdorf Information Systems (Vienna,
VA 703-848-2117). Banctec (Dallas 972-341-4000) demonstrated a new
color version of its S-Series scanner at AIIM, and it is expected to ship
this fall. All three of these scanners cost more than $50,000, making
them expensive for mainstream use, though they have their place in
specialized applications that demand color. Fujitsu sells the 600C, a
15-ppm bitonal/2-ppm color scanner, for under $2,000, but it's a simplex
scanner that lacks compression. You wouldn't use this scanner for
production color scanning.
What we really need is a higher-end 20- to 30-ppm color duplex scanner
priced at around $25,000 to $30,000 with on-board compression. It doesn't
have to be 300 dpi. Whoever comes up with this device will transform
document imaging systems.