Intelligent Enterprise featuring Transform
START NEWS & ANALYSIS OPINION CHANNELS PRODUCT GUIDES REVIEWS TECHWEBCASTS
CONTACTS ARCHIVES ADVANCED SEARCH
February, 1997

Text Retrieval Tackles Intranets

Text retrieval is a critical technique for helping your users find documents on your corporate intranet. Doculabs explains the key factors to look for in an intranet text retrieval product.

So you're thinking about using an intranet to share your corporate documents. Have you thought about how your users will find what they're looking for?

In the vast public Internet world, there's a host of well-known HTML indexing and search tools, such as Lycos, InfoSeek and AltaVista. But in a real-world corporate intranet, you'll find documents in all sorts of non-HTML formats, such as word processing documents, spreadsheets, presentations and scanned image files. In fact, it's been estimated that up to 80% of a company's documents reside in these unstructured formats.

So how will your users be able use to search for the documents they need over your corporate intranet?

Enter text indexing and retrieval tools from such vendors as

Dataware (Cambridge, MA 617-621-0820),
Excalibur (Vienna, VA 703-761-3700),
Fulcrum (Ottawa, CN 613-238-1761),
Inmagic (Woburn, MA 617-938-4442),
PLS (Rockville, MD 301-990-1155),
Verity (Sunnyvale, CA 415-960-7600)
and ZyLAB (Rockville, MD 301-590-0900).

These packages are designed to help you index your document repositories and let your users search those indices directly for the PCs they have on their desk with industry standard Web browsers.

This means your users are easily able to search for the documents they need saving time and making your employees more productive.

What Is Text Retrieval?

The concept behind text retrieval tools is simple: Let users search indices of your documents. Indexing is the key to increasing search performance across large document repositories. Searching large document repositories is slow. Searching compressed full text indices is much faster.

Search mechanisms provided by your operating system (such as the Find feature in Windows 95) do not create indices. Searching can be slow. You're limited to searching one machine or logical drive at a time. With indexing and retrieval tools, you can create indices for documents anywhere and get good searching performance.

Many text indexing and retrieval products grew out of the CD publishing market, where indices offered a way to help users quickly find specific information stored on slower media.

Today, most of the major text retrieval vendors offer Internet implementations, letting organizations deploy them in their corporate intranets.

While similar in concept, text indexing and retrieval products are different in functionality and approach. Different text indexing and retrieval tools are suited to different applications.

When sizing up text indexing and retrieval tools to use in your intranet, take a good look at your organizational requirements and ask questions.

For example, what kind of documents are you trying to manage? What level of searching functionality do you need? Is security an issue? Do you have the resources to set up and manage the system? Do you want your index database accessible to other systems? Do you need to customize your interfaces? The answers to these questions will narrow your choice. Then you can look for products that meet your needs.

What Kind of Problems Do They Solve?

Internet indexing and text retrieval products do a great job solving some specific business problems:

There's too much data... We're suffering from information overload. Particularly on the Internet, there's so much information out there it's difficult to find what you're looking for.

Searching for "business process reengineering" and getting a hit list of 1,000 items doesn't get you any closer to answering your BPR question.

This problem is made worse by search utilities that are primitive and difficult to use, requiring knowledge of specific operators.

But today's search tools are designed to address these shortcomings. Beyond standard Boolean searching, vendors now offer advanced search techniques that refine searching and return more managable hit lists.

PLS offers "concept" searching that creates a list of words statistically related to the original query -- giving you a new list of words to search on that you might not have thought of originally. Inmagic has powerful fielded searching and lets you link and relate fields with one another. This means you can link a particular image to a particular record. Products from Verity and PLS can return summaries of documents, with the user controlling the summary length.

Advanced search techniques like these have gone a long way toward helping users zero in on the information they want.

Text retrieval products are getting easier and easier to use. Many tools now offer "natural language" searching, which means you can type your searches in plain English, rather than learning clever operators.

...on too many systems... Historically, text retrieval was embedded as a feature within specific applications. But with the new generation of text retrieval products, we see truly powerful applications for indexing and searching across document repositories and formats. This is a major departure with key organizational benefits.

Fulcrum FIND! even lets you submit a single query from a browser to search simultaneously across multiple heterogeneous data stores (such as file systems, groupware repositories and the Web). It returns a combined hit list.

...with too rigid search tools... And because text retrieval products are robust applications, you can do more things with them than with an embedded search feature.

Many products offer customizable HTML forms and macros for modifying the user interface. Tailor your search forms for the needs of your organization and your users. Some products, like Dataware's NetAnswer, even let you present different search interfaces to different users, based on parameters like user ID or logon. Thus, you can present a complex search form to an advanced user, while passing a more basic form to a novice or external user.

...that are too hard to maintain... Administration is getting easier. Setup and indexing have historically required scripting, but more and more functions are being incorporated into easy-to-use graphical interfaces. Products like ZyLAB's ZyINDEX for Internet offer a simple installation routine and easy-to-follow Wizards that let less technical people and remote administrators set up the system.

And by exposing indices to standard query tools like SQL, proprietary index databases can be accessed by outside applications.

...and take too much effort to access. Typically, it's up to the user to initiate a search. But wouldn't it be nice if users could set up searches that periodically execute on their behalf?

We're talking about search agents, the next big thing in text retrieval. The first step was the ability to save searches for reexecution -- relatively standard functionality today.

The next step was to let users set up personal search agents that execute automatically and deliver the results directly to their desktops. Verity, PLS and Excalibur all offer ways to monitor specific areas and notify users of changes and updates. For example, agents can monitor newsfeeds or stock tickers.

This represents a major shift in focus. Now, instead of users actively "pulling" the information they want, agents can automatically "push" this information to the users. So text retrieval is evolving from being a helpful utility into a true productivity tool.

Are We There Yet?

Not quite. The problem of finding information keeps getting bigger and bigger. Many organizations have already invested significant resources in creating indices for their unstructured data.

One of the biggest challenges for the text retrieval market is to leverage an organization's existing indexing investment. Unfortunately users still need to be aware of the type of data they're looking for, know which repository to search and submit separate searches to different repositories. We'd like to see tools that let you search multiple repositories and indices and return a combined hit. Every organization is a sea of "islands" of information and repositories. Text retrieval could provide a means to bridge the islands.

Another challenge is the fact that, generally the system administrator still has to categorize information and set up indexing. These tasks should be handled by a person more familiar with the content. Making indexing tools simpler will put the power of index management in the hands of content managers.

Then there's the Microsoft question. Microsoft is scheduled to offer an indexing and retrieval package with BackOffice soon. Microsoft entrance into the market indicates how mainstream the it's become. It will be interesting to see which vendors thrive after Microsoft jumps into the fray.

James Watson Jr., Jeetu Patel and Joe Fenner are analysts with Doculabs (Chicago, IL 312-433-7793, info@doculabs.com). They specialize in electronic document management.






Channels
Business Process Management
Content Storage
Content Management
Compliance
Enterprise Solutions
Document Scanning & Capture
Content Delivery & Publishing
Collaboration & Knowledge Management
Search and Classification
Locate an article from our print magazine. Just enter your Locator ID Number below.
ID#


NEWS FROM THE PIPELINE

OpenOffice.org 2.0 Closes On Final

New Study Finds Steep Growth For Smartphones

PalmSource Sale Cleared By Federal Agency

CTIA Panel Examines Enterprise Security Risks

[more]






HOME | ARCHIVE | REALWARE AWARDS

A Publication of the Network Computing Enterprise Architecture Group
Brought to you by CMP Media LLC, Copyright © 2005
Privacy Statement | Your California Privacy Rights | Terms Of Service