February 1998
PUTTING IMAGES ONLINE IMAGE FILE FORMATS
Image files are the most varied files stored in document management systems. They can come from anywhere - the Web, document scanners, email, applications and in-house publication departments.
Storing a file to disk is not a simple procedure. A typical file will have a header telling any application trying to read the file about its contents. The header will identify the type of data in the file and what compression techniques were used. There will be esoterica such as byte order, version number and file size. There may even be a field in the header to store user notes. Text files are easy. No matter which application creates a file, you always know down deep in the file there will be ASCII text. There is not a lot of room for innovation in storing text (various versions of Microsoft Word not withstanding). Image files, on the other hand, are all over the map in terms of how they can be set up. At the top there is the difference between vector and bitmap images. Vector images are stored as instructions for drawing lines and then filling the spaces between the lines with color and texture. Bitmap images are literal translations of images, pixel by pixel, into bits.
Headers for image files have to tell applications whether the image is color, black and white or grayscale. If the image is color or grayscale, the header has to store the bit depth. Most images are compressed to save space. The header has to store the name of the compression algorithm used. Some compression algorithms require translation tables and custom color palettes.
Finally, just to make life interesting for all the image viewer and toolkit vendors, every application software developer has to have their own custom, proprietary image file format. The typical application developer just knows theirs is the best way to store complex images. Whether this is true or not is immaterial. Most file formats become widely accepted for one of two reasons. Either the application using the format is so compelling that everyone uses it or the file format in question is free of any licensing fees.
Cymmetry Systems' (Cambridge, MA 514-735-3219) AutoVue ($400-$600) lets you view more than 190 file formats. Spicer's (Kitchener, Ontario 519-748-2462) Imagenation ($200) opens more than 130 image file formats. AccuSoft's (Westborough, MA 508-898-2770) ImageGear toolkit lets programmers add more than 50 raster image file formats to their applications. Lead Technologies' (Charlotte, NC 704-332-5532) LeadTools ($2,000) supports more than 50 image file formats.
That doesn't include different word processing file formats, spreadsheet formats and audio file formats or all of the possible permutations that come from combining different compression algorithms with different codecs.
Image file formats multiply like rabbits/weeds/flies. New file formats pop up all the time, especially on the Web. The hardest formats to track down are permutations on existing file formats. While there seems to be a new TIFF image format every week, many new file formats are simply minor modifications to existing formats.
New image file formats come about for several reasons. New technology creates new formats. JPEG and GIF were created for the Internet. FlashPix was created to serve the image editing market. Fax technology created Group 3 compressed TIFFs.
Image file formats are also created for commercial reasons. Developers know how lucrative it is to control a worldwide standard. Although many of these proprietary formats rarely progress beyond a couple of Web sites or a couple of applications, they can create confusion.
Image File Formats Used in Document Imaging
Most business documents are scanned in grayscale or black and white. Faxed documents are black and white and are compressed. Document formats are designed to be compact and efficient. Image quality is important only insofar as making the document easy to read on a computer display or easy for computers to recognize handwriting, text and optical marks.
Business documents are compressed to save storage space, network bandwidth and backup time. Common compression schemes are Group 3 and Group 4 fax compression and JBIG. These compression algorithms are lossless and are usually contained within another file format like TIFF.
Adobe Portable Document Format. This is the format used by Adobe's (San Jose, CA 408-536-6000) Acrobat applications. PDF is a cross-platform file format that makes sure documents look the same on any computer or when printed from any printer. It is popular for posting promotional literature, technical bulletins and FAQs on the Internet.
Brooktrout format. Files from Brooktrout Technologies (Needham, MA 781-449-4100) have the extensions .301 or .BRK. This company has a number of fax products. The Brooktrout file format stores Group 3 compressed black and white images with a proprietary Brooktrout header.
Converted Image File. IBM has created or had a hand in creating a large number of image file formats used in document imaging. The Converted Image File is a special bitmap format for storing images of checks. The file format's header stores account information. The CIF format stores images of the front and back of a check in black and white and four bit grayscale.
Image Object Content Architecture. IOCA is another IBM image file format for document storage. It stores images as bitmaps in two, four, eight and 16 bit depths. Images can be compressed with IBM's proprietary ABIC compression, Group 3 or Group 4 compression or as JPEGs. IOCA is a format frame in which any kind of compressed image can be stored. It is set apart by the IBM specific nomenclature.
There are a number of extensions to IOCA. Mixed Object Document Content Architecture (MODCA) is a header where multiple images can be collected and stored as a single file. Many document management systems store multipage documents in this type of format. It saves space and simplifies database entries for multipage documents.
PC Paintbrush File Format. Although this is a graphic arts format originally designed for drawing programs, it is listed with the document imaging file formats because PCs use it for faxes. The DCX extension allows multiple pages to be stored in a single image file, perfect for faxes and similar documents.
WordPerfect Graphics Metafile. The WordPerfect metafile is a "wrapper" that allows other types of image files to be added to WordPerfect documents. Pictures can be added to wordprocessing documents in a standardized way. It also lets WordPerfect have its own drawing tools.
Image File Formats
on the Web
The Internet is a hub of activity when it comes to developing new image file formats. The driving force behind Internet file formats is bandwidth, or the lack of it. All image file formats used on the Internet are compressed in some way. There's a constant search for more efficient compression algorithms. This has led to a huge number of file formats with their own proprietary or non-proprietary compression.
In spite of (or perhaps because of) all the different image file formats on the Internet, the two most popular image file formats on the Internet remain Graphics Interchange Format (GIF) and Joint Photographic Experts Group (JPG). If you surf the Web or download an image from Usenet, you could be almost 100% certain that the image file will be one of these formats.
AutoCad Drawing Web Format. This format allows AutoCad images to be posted to the Internet. DWF images need to be viewed with a browser plug-in.
Graphics Interchange Format. This file format was popularized by Compuserve (Columbus, OH 614-457-8600) in the late eighties. It features lossless compression of 8-bit color and line art images. GIF's capabilities have been extended over the years. GIFs can store still images, progressive images and even simple animations. The file format uses LZW compression. Software developers who need to add GIF support to their applications can get a license for LZW compression from Unisys (Blue Bell, PA 716-742-6780).
JPEG File Interchange Format. JPEG-JFIF is one of the most popular file formats on the Internet. JPEG is based on a lossy compression scheme developed by the Joint Photographic Experts Group from which it gets its name.
The official name of the format is JFIF. JPEG compression is found in some other file formats including PICT and TIFF. JPEG-JFIF is best suited to color photographs and high bit-depth artwork. Line art and spot color images do not compress well using JPEG. JPEG owes much of its popularity to the fact that it is free.
JPEG compression has limitations. Compression is limited to a 30-to-one ratio. More compression results in visually unappealing artifacts. Almost all new file formats on the Internet are designed to replace JPEG. Because most of these JPEG replacements are proprietary, none of them have been able to break JPEG's popularity.
Portable Network Graphics. This format was created as an alternative to GIF. The Portable Network Graphics file format has improved compression and additional features like color correction and simple error detection. PNG has had a tough time gaining acceptance due to the GIFs dominance. This may change because the newest versions of Web browsers offer limited PNG support.
Sending Image Files
Across The Internet
The Internet was originally created to move seven-bit ASCII files from computer to computer. This works well for email and text documents. Graphic images do not conform to seven-bit ASCII and cannot be sent across the Internet like text documents. They either need to be converted to ASCII files or moved using a protocol that supports binary files.
The File Transfer Protocol (FTP) can move non-text files across the Internet. But FTP is not always practical. It requires a server and a continuous connection to the Internet. FTP doesn't offer built-in compression. Sending uncompressed files across a network wastes bandwidth.
This problem is not limited to image files. The Internet has always had problems carrying non-text documents. This is why most software distributed on the Internet is compressed. Partly to save space, but also to "package" the document into a form that's less likely to be mangled by the various Internet routers that lie between the file's source and destination.
Since the original Internet could only carry seven-bit ASCII characters, image files had to be converted to ASCII before they could be sent over the Internet. The standard way to do this is by uuencoding. Uuencoding is not a compression algorithm -- most files are larger after being uuencoded. Uuencoding only converts a document to seven bit ASCII characters.
Uuencoding is still used a lot in Usenet newsgroups. It is also good when sending or receiving email with primitive email applications. Some email applications only handle documents below a certain size. Uuencoding applications automatically break up large files into smaller chunks so they can be emailed. Uudecoding applications collect the file segments and knit them back together as part of the decoding process. Fortunately, this process can be automated.
New email applications permit Multipurpose Internet Mail Extensions (MIME). Uuencoding is normally handled by external helper applications. MIME is built into email applications. MIME, also called base 64 encoding, converts three eight-bit bytes into four seven-bit ASCII characters that the Internet can handle.
MIME encoding is handled automatically by the email applications that support it. You can also use MIME to attach an image file to a Usenet posting. Most Usenet newsreaders don't support MIME directly. Base 64 decoder applications are available separately as helper programs.
Even with uuencoding and MIME you often see images encoded into other file formats. While these formats are customarily used for applications, audio, video and animation, they can be used for image files.
Binhex. Binhex is used mostly on Macintoshes but can be found on other systems as well. Binhex is an ASCII format. It accomplishes the same thing as uuencoding.
MacBinary. Macintosh files have a resource fork and a data fork. When you put Macintosh files on a non-Macintosh system the resource fork is lost. MacBinary encoding preserves the resource fork when files are sent to non-Mac systems.
Stuffit. Stuffit ($130) is a commercial program from Aladdin Systems (Watsonville, CA 408-761-6200). Stuffit is available for all platforms. There are shareware versions available. Stuffit takes multiple files and combines them into one compressed archive for sending across any network.
Zip. Zip is the original compressed archive file format. Zip was introduced by PKWare (Brown Deer, WI 414-354-8699). PKZip ($80) combines multiple files into a compressed archive. It can also create self-extracting archives. These are compressed archives with the decompressing program built-in.
Image File Formats Used in Graphical Applications
Some of the oldest image file formats used on computers were created for graphical applications. When computers finally acquired graphics capabilities, the first applications using them were graphic arts programs.
Graphical applications are not limited to photo editing and line art. Computer Aided Design (CAD), medical and scientific imaging fall into this category. Graphical imaging is very demanding on computer hardware. The images have a very high-resolution with high bit depths. These images need to be easily manipulated for creative purposes or to bring out hidden features.
Even worse from the standpoint of some users is that graphic file formats are unpredictable. Every image editing application has its own proprietary format. Some formats like TIFF have new versions coming out regularly. A large document management system will sooner or later have to deal with a new and undocumented image file format.
AutoCad Drawing File Format. AutoCad is the most popular computer aided design application. It's used to design everything from plastic toys to skyscrapers. The DWG format uses AutoCad's internal libraries to store technical illustrations. This makes it hard for non-AutoCad applications to open up and convert DWG files.
AutoCad Drawing Exchange Format. Another AutoCad file format. Because DWG is not portable, DXF was designed to allow other applications to share files created with AutoCad. DXF stores the elements of an AutoCad image as text. Unlike other AutoCad file formats, DXF is fully documented.
Bitmap. This is the original file format for Windows. Bitmaps can be compressed or uncompressed. Bitmap compression uses run length encoding, similar to Group 3 encoding. Bitmaps can store images of any size and bit depth. In keeping with its DOS origins, the bitmap format is very simple. The latest bitmap file format, designed for Windows 95, makes provisions for color matching software and desktop publishing.
CALS Raster. This monochrome file format was created by the Department of Defense for technical graphics and CAD/CAM applications. The goal is to have a single universal file format for government agencies and contractors. This way everyone could speak the same language.
CALS files are bitmaps and either uncompressed or use Group 4 compression. There are two versions. The original format is called Type I. The newer version, Type II, allows tiling for quicker downloading across networks.
Dr. Halo. This oddly named file format was designed by Media Cybernetics (Silver Spring, MD 301-495-3305) for advanced imaging applications. Media Cybernetics makes software for scientific applications. The Dr. Halo format is split into two different files. The CUT file contains the pixel data. A separate file, PAL, stores the eight bit color information. The format is well suited to scientific applications where monochrome images are used.
Encapsulated Postscript File. This format was created by the desktop publishing software company Adobe (San Jose, CA 408-536-6000). PostScript is a programming language designed for printing.
The Encapsulated Postscript (EPS) File stores an image with a PostScript header. An EPS file can be sent directly to a PostScript capable printer or stored on computer disk. When stored on disk, the EPS file may or may not include a preview image. The preview image allows non-postscript capable applications to "see" the image contained within the EPS file.
Electronic Arts Interchange File Format. Electronic Arts is known as a game and edutainment software publisher. Their Interchange File Format (IFF) encapsulates other image file formats for easy transportability between programs. IFF will hold images, text, fonts, sounds and animation files.
Fractal Image Format. Fractal encoding creates a mathematical description of a bitmap. Although it can be used as a compression algorithm, its primary purpose is to rescale images without losing detail in the textures. Fractal Image Format is a proprietary image file format created by Iterated Systems (Atlanta, GA 404-264-8000) for fractally encoded images.
Hewlett-Packard Graphics Language. For many years Hewlett-Packard (Palo Alto, CA 415-857-1501) has maintained a line of plotters for scientific and technical applications. To operate those plotters, HP created a graphics language. Many CAD applications can output HPGL in a standard file format. There are two versions of HPGL. The newer version supports more plotters and allows for a proprietary compression scheme.
Macintosh Paint. One of the oldest image file formats, Macintosh Paint was created by Apple for their drawing program MacPaint. It is a simple monochrome format that is easily converted for PCs to use. The dimensions are fixed at 576 x 720. Files are always 51 kilobytes in size.
MS Paint. MS Paint (MSP) is a drawing program format similar to Macintosh Paint. MS Paint is the monochrome bitmap image file format for Windows. It's slightly newer than Macintosh Paint so it's a little more sophisticated. It is slowly being supplanted by the bitmap format.
Portable File Formats. The four image file formats created by Jef Poskanzer, Portable Bitmap File Format, Portable Graymap File Format, Portable Any-Map File Format and Portable Pixmap File Format were created as intermediate file formats for conversion utilities.
Normally a programmer has to write a custom filter if they want to convert from one image file format to another. If an application must deal with a dozen different formats, the programmer has to write 144 filters.
By using the portable formats as intermediaries, a programmer only needs 24 filters. While these portable formats are not designed for file storage, they do simplify application programming.
Kodak Photo CD. This bitmap style format was originally created by Kodak (Rochester, NY 716-726-7260) to store scanned photographs on CD-ROM. Photo CD doesn't just specify an image file format, it specifies an entire CD format. CD-ROM technology has improved since the PhotoCD was developed, but the Kodak Photo CD has held on. Photo CD is the original tiling format. It stores five different resolutions of the same image: 768 x 512, 384 x 256, 192 x 128, 1536 x 1024 and 3072 x 2048.
PICT. The Macintosh PICT format is similar to EPS files. The PICT format takes a conventional image and wraps it with QuickDraw instructions. QuickDraw is the language Macintoshes use to draw objects to the display. The PICT format is popular because it requires only minimum amount of programming to get the picture on the display. The display instructions are included in the file.
Adobe Photoshop. The most popular photo image editing software has its own image file format. PSD files are tailored to the way Adobe (San Jose, CA 408-536-6000) Photoshop ($900) works. The file format contains information for multiple layers, color information and masks. Any serious image editing and management application will deal with this format.
Sun Raster Data Format. This is an exclusive Sun Microsystems (Palo Alto, CA 650-786-7737) format. It stores data as a simple bitmap. This format is used in Solaris and other Unix computers.
SGI Image File Format. Like Sun, Silicon Graphics (Mountain View, CA 650-960-1980) developed its own file format. The SGI format stores eight- or 24-bit images in color or black and white.
Targa. This format was created by AT&T to capture and store high quality 24-bit color images. The file format is now owned by Truevision (Santa Clara, CA 408-562-4200) a desktop digital video hardware maker. The Targa format has achieved popularity because it was the first high-quality image format created for PC compatibles.
Tagged Image File Format (TIFF). This is the most ubiquitous and flexible file format in the imaging industry. TIFF was originally developed to store black and white scans. Over the years it has mutated to include color images and all manner of lossless and lossy compressed images. The most recent versions of TIFF let you store color correcting information for desktop publishing. TIFF also supports Group 3 and Group 4 compression for document images. Whatever your application, there will be a TIFF.
TIFF's best advantage, flexibility, is its greatest disadvantage. TIFF formats multiply with every new compression algorithm and business application. This challenges toolkit developers trying to keep up with the file format. It also presents a problem to end users who never know if they'll be able to read the next TIFF to come along. If your application uses TIFF images, either confine yourself to the formats you need or be prepared to track down every TIFF image viewer available.
Microsoft Windows Metafile. This format lets images be transferred between Windows applications. It's also found on non-Windows computers, primarily for compatibility purposes.
Related Articles: