August 1999
FIRST LOOKS
A Competitive Choice for PDF
Even before the Web became the way to share and distribute documents, the portable document format (PDF) was already a de facto standard. Now businesses and government agencies alike are scrambling to convert their paper documents and legacy images to PDF.
Enter DocuLex (Winter Haven, FL) and its new scanning and PDF conversion package, PDF.Capture. Released this spring and upgraded to Version 1.3 in June, PDF.Capture is a straightforward scan, index and conversion system with built-in OCR and batch processing features. Its designed for SCSI, TWAIN and Kofax Adrenaline-supported scanners, and it delivers TIFF images as well as PDF Image only and PDF Image+Text files.
DocuLex is certified by Adobe, the San Jose, CA-based creator of PDF, but the company has designed its own PDF conversion engine rather than licensing Adobes toolkit. The upside, according to DocuLex, is that their own engine is purpose-built for high-speed production use, and it is up to three times faster than competitors. (A head-to-head test was beyond the scope of this review, but we hope to compare leading PDF products this Fall.)
If youre converting large volumes of documents, one clear advantage of PDF.Capture is its cost: $6,995 complete with no per-click charges. Adobe Acrobat Capture is cheaper at $895, but you can only convert 20,000 images; each additional 20,000 pages costs $595, or you can license 200,000 pages at $4,995. An unlimited license costs $14,995.
Another alternative is InputAccel EZ from Input Software (San Jose, CA). Its priced at $6,995 including a licensed Adobe engine, but you have to pay per-click charges ranging from $420 for 15,000 pages (2.8 cents per image) down to less than two cents per image at multi-million-page volumes.
There are drawbacks to the DocuLex engine. Namely, it doesnt support color, it cant handle non-TIFF formats and it cant convert imported files to PDF Normal, one of the three variations of the standard. PDF Normal offers compact, searchable files that are often used in publishing and Web presentation.
If youre dealing with document imaging, PDF.Capture gives you the format most in demand for archival needs. PDF Image+Text gives you an image of the scanned page together with searchable OCRed text hidden behind the bitmap image. This is the ideal format for meeting legal requirements to preserve the original scanned image, according to Adobe.
PDF.Capture also lets you convert to compact PDF Image-only files for cross-platform delivery of non-searchable images. Both varieties of PDFs generated by the DocuLex product can be viewed with Acrobat Reader. If you plan to download the files, there is a 7-bit data structure option to improve modem communications.
As a production tool, PDF.Capture provides nine fields of indexing information. Four of these fields author, title, subject and keywords can be incorporated in the PDF metadata. You can also brand some or all of this data directly on the image, either pre- or post-OCR, as part of the softwares versatile endorsing feature.
Batches can be set up manually, or you can print out a stock, pre-configured separator sheet from the PDF.Capture File menu. The sheet has a barcode indicating the start of a new batch or document, as well as 13 mark sense choices for document grouping, OCR, resolution and indexing options. As you scan, the software automatically recognizes the document breaks and processes according to the selections you marked off.
You can import and convert existing images with PDF.Capture, but they have to be single-page TIFFs. (Multi-page TIFFs cannot be converted directly, as yet, but DocuLex offers a utility that will break them up for conversion.) There are five methods of importing files and creating new indexes. Between the four standard methods and one user-defined import mode, you can accommodate just about any indexing scheme.
Once the scanned or imported batches are ready for conversion, you can choose from a variety of OCR, annotation, bookmark and default navigation options. The OCR engine is from Expervision (Freemont, CA), and it handles nine major languages. You can add up to 10,000 custom words to the built-in 100,000+ word dictionary.
Installation took less than half an hour. Batch processing with the stock separator sheets worked beautifully. New documents were initiated, and we were prompted to enter or verify the four index fields as indicated on the separator sheet.
We scanned and converted a mix of documents. A 25-page single-column text document with tables and graphics took about two minutes (4.8 seconds per image) to convert to PDF Image+Text complete with OCRing and endorsing. A six-page newsletter with three-column text and a variety of type styles averaged 12 seconds per image.
Our tests were performed on a 233 Mhz processor. Doculex says processing averages as fast as 4 seconds per page with a 450 Mhz processor.
The final images were identical to the TIFFs, but you could select, cut and paste the text. The search function in Acrobat Reader found accurate OCR for all but a few display type faces.
A light version of PDF.Capture (without import) is being bundled with Ricohs ISO1 mid-range scanner as well as with the Savin 35 and Savin 45 digital copiers. A free trial version (without scanning) can be downloaded from the DocuLex Web site.
If you have more sophisticated applications requiring customization and automation, DocuLex offers a Control Module for $2,995. It lets you set up and maintain document and file naming conventions, image specs, scripting rules, batch management and quality control processing. The module maintains authority over multiple scanning stations and multithreaded processing of, for example, PDF conversion across several workstations for higher throughput. The module also provides extensive export capabilities.
PDF.Capture is not a full-blown capture solution without the optional control module, but as a purpose-built PDF conversion tool, it is cost effective and easy to use. DocuLex has its roots (and significant market share) in the legal community, but this product will help bring the company into the document solutions mainstream.
Doug Henschen