Intelligent Enterprise featuring Transform
START NEWS & ANALYSIS OPINION CHANNELS PRODUCT GUIDES REVIEWS TECHWEBCASTS
CONTACTS ARCHIVES ADVANCED SEARCH

June 2001

SCAN & CAPTURE

Make Use of PDF Image Content

by Adam Throne

PDF is the de facto standard for publishing images online, but many users are unaware that there are three types of PDF files: Image Only, Image Plus Text and PDF Normal. Image Only files do not include optical character recognition results, so they are usually the smallest of the three file types.

While some businesses and government agencies choose to post Image Only PDFs online - to speed viewing and spare network bandwidth - this leaves users frustrated when they attempt to search or copy text. Enter iCopy, an Acrobat Reader plug-in from Image Solutions Inc. (ISI) of Morristown, NJ.

ICopy performs optical character recognition on image files within Acrobat Reader, and it can copy and paste the resulting text (as well as images) into any word processor. ICopy lets you extract as much as an entire document, and it also works with the other two PDF file types by copying the available text.

Quick Scan

Supplier: Image Solutions Inc., Morristown, NJ, 973-292-6444, www.imagesolutions.com

Product: iCopy

Description: Acrobat Reader plug-in that lets you recognize (OCR) and copy text from Image Only or text-based PDF files.

Strengths: Fast, reliable, inexpensive recognition of text within PDF Image Only files and easy copying of images. Support for nine languages.

Weakness: Doesn't recognize multicolumn formatting.

Price: Starts at $95 per seat; volume discounts available.

ICopy's plug-in design makes it easy to use. Once installed (on Acrobat Reader versions 3.02 and higher), three buttons are added to the Acrobat Reader tool bar. One button lets you select the portion of text you wish to copy; the image snippet is immediately recognized and the results saved to a clipboard for pasting into a word processor. If you wish to copy multiple paragraphs, you can maintain the paragraph structure by selecting an "additional line breaks between paragaphs" setting.

Another button handles multipage documents. You can choose to use optical character recognition (OCR) for some or all of the pages in the document, and you can save the results to the clipboard or text file. We were able to use OCR on a text-heavy 100-page document in about four minutes on a 650 MHz PC.

You can adjust for varying resolution levels up to 400 dpi, but as in any OCR operation, accuracy varies depending on type size, font and image quality. ICopy recognized 10- and 8-point type highly accurately, but was less accurate when recognizing 6- and 4-point type.

A properties menu offers choices including an OCR confidence setting. When confidence falls below the threshold you select, iCopy can insert any character (such as an asterisk) into the text. This allows you to use a spell checker to review the accuracy of the results. There is also an anti-alias text setting for grayscale and iCopy works in nine major European languages.

A third iCopy button allows you to copy and paste photographs, graphics, nonmachine-readable text or tables. You can adjust the resolution and alias settings of these images.

The one thing that frustrated us about iCopy is that it won't recognize multicolumn formatting - though this is a common complaint about many PDF optical character recognition systems. The OCR engine scans all the way across the page - stringing together disjointed lines of text - rather than finishing one column and then returning to the top of the page to read another. The only way around this is to manually copy and paste one column at a time.

ICopy starts at $95 per seat, with volume discounts starting at five or more users. While it has been most popular in governmental and legal research applications, iCopy is an answer for anyone who frequently encounters PDF files that can't be searched or copied.

 




Channels
Business Process Management
Content Storage
Content Management
Compliance
Enterprise Solutions
Document Scanning & Capture
Content Delivery & Publishing
Collaboration & Knowledge Management
Search and Classification
Locate an article from our print magazine. Just enter your Locator ID Number below.
ID#


NEWS FROM THE PIPELINE

OpenOffice.org 2.0 Closes On Final

New Study Finds Steep Growth For Smartphones

PalmSource Sale Cleared By Federal Agency

CTIA Panel Examines Enterprise Security Risks

[more]






HOME | ARCHIVE | REALWARE AWARDS

A Publication of the Network Computing Enterprise Architecture Group
Brought to you by CMP Media LLC, Copyright © 2005
Privacy Statement | Your California Privacy Rights | Terms Of Service