Intelligent Enterprise featuring Transform
START NEWS & ANALYSIS OPINION CHANNELS PRODUCT GUIDES REVIEWS TECHWEBCASTS
CONTACTS ARCHIVES ADVANCED SEARCH
Rate & Review
Letter to the Editor
E-mail Article
Print Article
February 2002

Four Organizations Tackle 'Too Much' Information

by Jeff Morris

If there's one clear need in search and retrieval technologies today, it's for the ability to search across an enterprise, tapping both disparate databases and multiple content sources.

In fact, Nick Wilkoff, Analyst for Techrankings Research at Forrester Research, says the Cambridge, MA-based analysis and consulting firm is now focusing on "enterprise content management," which encompasses Web content, digital asset management and legacy content management.

"We're seeing a convergence," says Wilkoff. "A number of unstructured areas, including legal documents and contracts, were typically managed offline; now, more and more of these documents are being published online on Internet portals and intranets. Search engines now allow searching across these different repositories."

Wilkoff emphasizes that enterprises need to concentrate on integrating the repositories and on enforcing some sort of standard taxonomy for describing content. This was a significant challenge at AMR Research, Boston, where taxonomy and search technology are helping analysts stay abreast of internal and external research available in disparate formats.

"Doing a search against one Web site under your control is very different from searching through all the information published by your company in the last five years," points out Scott Lundstrom, AMR's chief technology officer. "There are multiple formats — product pricelists, product history, and so on — that don't exist on the Web."

Forrester's Wilkoff also emphasizes the importance of being able to search through rich media. "The technology in this area is maturing," he says, "with digital asset management adding context around extremely unstructured data types."

Video is one of the fastest-growing sources of rich media in both corporate and government settings, yet it is one of the most challenging types of information to turn into a searchable resource. Visual search technology has helped the National Air and Space Administration (NASA) organize and provide fast access to round-the-clock video feeds from the International Space Station.

You might think HighWire Press would have an easier time than AMR or NASA in providing effective searching. The online publisher's www.highwire.org site is focused exclusively on scholarly journals — text-centric documents that are all converted to the same format for Web publication. Yet John Sack, HighWire's director, says that conventional search capabilities were not enough.

"The problem is that, in general, sites without search engines tend to be more usable than sites with search engines," says Sack. "Consumers, especially, tend to type in the wrong search terms, and come away thinking that there's nothing available. Browsing is a much more effective method of finding what you need."

HighWire solved its search challenges by relying on technology to suggest categories so site visitors can drill down to exactly what they need.

When members of the Certified General Accountants of Ontario (CGA) can't find what they need online, they don't give up, they call up; but this has meant added cost and chaos to the organization's call center. With the help of search technology, the organization has put the "self" in its online self-service efforts.

While AMR, NASA, HighWire and the CGA all faced very different challenges, they all found answers with better search technology. Read on to find out how they turned "too much information" into strategic assets.

AMR Takes Its Own Advice

CASE STUDY: AMR Research
CHALLENGE: Track and retrieve documents internally and pull information from external sites
SEARCH PRODUCT: Autonomy
VENDOR: Autonomy, San Francisco, 415-243-9955, www.autonomy.com

"This is a case of an analyst actually putting his money where his mouth is," says Scott Lundstrom, chief technology officer of Boston-based AMR Research, one of the leading high-tech analyst firms. According to Lundstrom, most analyst firms were once small service organizations that typically ran without much infrastructure at all.

"But in the late '90s, the analyst industry went through an awakening, realizing that we had to become consumers of our own information," Lundstrom says. "The basis of our business is information on vendors, but the average analyst is overwhelmed with information. In a given month, hundreds of different outlets may publish documents on a particular company. In addition, we have a repository of our own, as our analysts meet with individual vendors. One or two analysts may meet with a company, but their notes might be of interest to the entire analyst group; there's no clear delineation among markets or analysts."

Thus, the challenge is grabbing as much information as possible off of corporate Web sites and newswires, adding all the notes generated by analysts, and then tagging all of this information and making it available in a collaborative environment.

That environment made for an additional challenge. "Analysts [would sometimes] start working on reports about a particular market segment not knowing that someone else was already working on the same topic," Lundstrom explains. "We needed to have a way to avoid this kind of wasteful, and embarrassing, duplication."

AMR began a selection process in the fall of 2000, considering search technologies from such providers as Microsoft, Alta Vista, Verity and Inktomi. But it was difficult to find a solution that satisfied requirements for use in a collaborative environment and provided an internal management scheme.

"The system had to be able to figure out what documents were about and assign tags automatically; we didn't want to have to administer meta tags by hand," Lundstrom emphasizes.

After four months, AMR decided that Autonomy best met its needs. One criteria was the "services-to-application" ratio: "Through our experience with Autonomy, we knew that the implementation cost was a little less than one times the license fee," says Lindstrom.

Autonomy's software was implemented within 90 days, initially covering categorization of content related to 100 vendors. Since then, the number of vendors covered has continued to expand, and keyword and concept search capabilities are now being extended to AMR's data via its public Web site.

AMR's collaborative needs are now satisfied by the system's real-time functionality. "As an analyst is saving [a Word] file into a particular directory on the server, a window pops up to say, 'Here's what we've written on this topic in the last month,'" Lundstrom explains. "That analyst can see what everyone else is doing."

The system is essentially tagging documents and making them available as they're written, which provides analysts with background information, prevents duplication of effort and reduces the need for analysts to do research up front.

Assessing return on investment, Lundstrom says client research managers can now process a larger number of inquiries in a shorter period of time — an advantage that should pay for the software within two years. More importantly, "at least four times since the implementation, the system has prevented us from issuing something publicly that was either incorrect or a duplication," Lundstrom adds. "There's no way to put a price on that."

NASA Taps Mission-Critical Video

CASE STUDY: NASA/InDyne
CHALLENGE: Allow NASA scientists and engineers to locate specific segments of space mission videos and give them remote access to their selections.
SEARCH PRODUCT: Convera Screening Room
VENDOR: Convera, Vienna, VA, www.convera.com

Back in 1998, NASA could see it coming. It would be big — no, it would be massive. And it couldn't be stopped.

Nor would anybody want to stop it. After all, the International Space Station would mark the dawn of a new era in space exploration. Still, to Silvia Stewart, video repository supervisor at the Houston-based Johnson Space Center, and her five staff members, the impact would be comparable to a giant meteor.

"It scared us," recalls Stewart. "Until then, we had been responsible for videos from about six space shuttle missions each year. But we knew that the space station would be operating 24 hours a day, seven days a week, 365 days a year — and would be capable of [simultaneously] downloading four channels of video. How would we be able to handle all of that, plus the shuttle missions?"

Stewart's trepidation was justified. She and her group (who are actually employed by InDyne, a McLean, VA-based service provider that counts NASA among its largest clients) were charged, among other duties, with writing scene-by-scene text descriptions of all mission videos. Those descriptions are essential to assist NASA engineers and scientists in finding specific sections of video needed to support crucial decisions and plan future missions. Stewart reports that since the Expedition One crew first opened the Space Station hatch on November 2, 2000, her department's workload has tripled.

Fortunately, they were prepared. Planning for the project and for the broader switch to digital video technology began two years before that first Space Station expedition. A lengthy vendor selection process culminated in the implementation of Screening Room software from Vienna, VA-based Convera (www.convera.com) in September 2000.

Screening Room is a modular video content management solution incorporating four integrated components: advanced video capture and analysis, intelligent indexing, acceleration of the video production process and publishing of video content to the Web. Video assets are controlled from a single platform by the creation of a digital archive.

Mission videos, meaning anything documenting activities in a manned space flight (including launch, landing, video downlinks and all crew recordings), account for half of the content in the repository that Stewart oversees. The other half is ground-based content. As downlink video is captured, members of Stewart's staff working in three shifts write scene-by-scene descriptions in real time. These descriptions are entered through Screening Room's Edit client directly into the Capture client, which captures and creates a storyboard of key frames. A Browser client provides the system's Web interface.

It's clear, says Stewart, that the system has provided benefits, not the least of which has been the ability to handle the Space Station workload without an increase in staff. But perhaps more importantly, it has allowed Mission Control personnel to review material earlier if, for example, they need to analyze a hardware problem.

"We traditionally had to fill requests from Mission Control by searching through the database, pulling the appropriate tapes and displaying them in a screening room," notes Stewart. "Now, anyone [using the browser client] can access the video file and storyboards, do their own search and view the video segments remotely. As soon as a video downlink ends, we post the video to the server; since most downlinks last an hour or less, those videos are viewable within the hour."

Remote access has extended fast video access beyond Mission Control to NASA engineers, mission analysts, mission planners and those involved with the shuttle and space station robotic arms. Remote access covers many NASA locations, including the Ames Research Center in Mountain View, CA, the Jet Propulsion Laboratory in California and the Kennedy Space Center in Florida.

Stewart expects one of the biggest paybacks of the Convera system will be in savings generated by digital video production.

"Projects often require that a compilation tape be produced consisting of numerous scenes and segments scattered across a lot of different tapes," she explains. "We research an engineer's request and give the video production facility a list containing all the start/stop times. Production then takes the list into the dub room, pulls off all the segments specified, and copies them onto one tape. It's really labor intensive. But now Convera generates an edit decision list, which is a key element of digital video production. We now have an editing system that can read the decision list, find the right files and communicate directly between our MPEG files and production's. We no longer need to do it all by hand."

It doesn't take a rocket scientist to see that digital asset management has translated into one giant leap for NASA.

Keywords Don't Cut It at HighWire

CASE STUDY: HighWire Press
CHALLENGE: Help clients find information needed among 13 million research papers.
SEARCH PRODUCT: Semio Taxonomy
VENDOR: Semio Corp., San Mateo, CA, 650-638-3330, www.semio.com

HighWire Press isn't actually a press, but it certainly has to perform a high-wire act. The nonprofit organization puts half of the world's most prestigious scientific, engineering and medical journals online. It then has to help people find exactly what they're looking for among the many long, complicated articles containing figures and thousands of words. The online database currently includes more than 13 million documents.

"It's really like finding the proverbial needle in a haystack," says John Sack, HighWire's director.

HighWire is a department within Stanford University akin to the Stanford University Press. The site began with a single online journal in 1995, and it now produces 296 journals online, with many more planned. Several years ago it became clear that HighWire's original search engine was not producing the needed results.

"A keyword search [wouldn't] cut it," says Sack. "That only worked when there wasn't much material online. Any term thrown at the search engine brought back hundreds of hits, and we realized that results/relevance ranking wasn't really helping people get to the right stuff."

Sack's solution was to give people alternative search terms with the help of taxonomy technology from Semio, San Mateo, CA. The effort began with 5,000 concepts and has since expanded to contain some 20,000 concepts. Developing the taxonomy has been a painstaking process; it's taken nearly two years to process 12 million records and finally get the new HighWire site to the beta stage. Was it worth it?

"Absolutely," Sack affirms. "It was essential that we give researchers the ability to see exactly what topics are covered in an article and to reduce their searches accordingly."

As a nonprofit institution, HighWire doesn't measure returns in hard dollars. "Our goal is to help people find what they need," says Sack "Someone about to do an experiment needs to know if it has already been done. By allowing searches to quickly be narrowed to a specific set of topics, I think we've succeeded."

E-Learning Starts with Search

CASE STUDY: Certified General Accountants of Ontario
CHALLENGE: Index and access one gigabyte of course materials and supporting information
SEARCH PRODUCT: Fulcrum KnowledgeServer from Hummingbird
VENDOR: Hummingbird, North York, Ontario, Canada, www.hummingbird.com

Go to the Web site of the Certified General Accountants of Ontario (www.cga-ontario.org) and click on "search." When you arrive at the search engine, click to expand the folder marked "Web Sites." You'll find a list of 13 additional folders, and you'll get a good idea of how Fulcrum Knowledge Server helped make a broad range of materials easier to find.

The Certified General Accountants (CGA) of Ontario is a self-governing body that guides the professional standards, conduct and discipline of its approximately 13,000 members and 8,000 students in the province of Ontario.

CGA designation is achieved by completing assignments and national examinations, passing a comprehensive final exam, fulfilling practical work experience requirements and meeting a university degree requirement. To meet all these requirements, members need information — and lots of it.

According to Boyd Dyer, manager of Web technology, CGA must furnish members with an incredible amount of supporting documentation, including course materials.

After converting paper-based course material to a Web presentable form in the summer of 2000, the CGA discovered it had nearly an entire gigabyte of information on its Web site. It became clear to Dyer that something had to be done as call after phone call to the organization inquired, "Where do I start looking?"

The CGA decided on Hummingbird and its Fulcrum KnowledgeServer technology. The fact that Hummingbird was a local company certainly entered into the decision, notes Dyer. But, he says, "a key selling point was that KnowledgeServer accepts multiple formats; it's not just text-based. And we upload many media commercials, PDF files, Word documents, executable files, MP3s ... our content runs the gamut."

Dyer says he did the installation himself in December 2000 and had the system up and running within one week. Since then, he's upgraded to newer versions; with the latest version, he says, all he needs to do is create the basic index, point the software toward the site, and it does everything else. The result, says Dyer, has been to free up a lot of the call center's time.

"Users can now find a lot of the information they need by themselves," he reports. "Our call center is always overburdened; using KnowledgeServer for search and retrieval has reduced that burden and made it more manageable."


Spotting the Trends in Search

The latest trend in search — according to Susan Feldman, doyenne among classification and retrieval analysts — is a move toward hybrid systems. "There are holes in every algorithm," notes Feldman, director of content and retrieval software research at IDC, Framingham, MA. "It doesn't matter what you're using them for; every algorithm is good at some things and not at others. Hybrid systems make up for some of those faults by looking at things from multiple viewpoints."

The use of several complementary algorithms to classify and tag information can produce more accurate retrieval results, says Feldman. For example, a mixture of natural language processing techniques with statistical processing can make up for deficiencies in either. In Feldman's opinion, one of the best examples of a hybrid system is Stratify (formerly PurpleYogi), Mountain View, CA, which uses four different algorithms for categorization. Other vendors offering a hybrid approach to classification include MediaSite, Pittsburgh, PA, and Quiver, San Francisco.

While classification (also known as taxonomy) tools can organize and tag information independently of search engines, the two technologies can also work hand-in-hand. Some vendors provide technology for both classification and retrieval, tuning search engines to leverage the results of the taxonomy technology.

Beyond the move toward hybrid systems, Feldman points to several other vendors that are introducing new approaches and technologies to classification and retrieval:

ClearForest, New York, mines text, categorizes it, and cuts it into chunks so that it is stored in small pieces that can answer questions such as, "What happened in data warehousing in October?" or "Show me the mergers and acquisitions that have occurred in the last six months in the petroleum industry." Information is reassembled on the fly in the form of interactive timelines and relationship grids.

iPhrase, Cambridge, MA, uses natural-language processing to improve retrieval, but it also improves navigation by presenting results in an easy-to-understand, tabular format, rather than in a long list of documents. Customers include Charles Schwab and CNET.

Knowmadic, Santa Clara, CA, offers a KM Studio that sets up agents that can record repetitive information-seeking forays on the Web and then lock onto the data that the user needs to monitor. KM Studio then checks on this data at regular intervals to see what has changed.

Inxight, Santa Clara, CA, has products that categorize, search and extract entities and provide visual navigation aids. Inxight's LinguistX platform is widely used by other retrieval vendors to improve the quality of search results. Customers include Lotus, Bertelsmann, Microsoft, Batelle, Verity and Factiva.

Primus Knowledge Solutions, Seattle, offers The Answer Engine (originally AnswerLogic), which provides direct answers to customer questions on the Web rather than lists of documents. The system employs tools that analyze patterns of questions in order to improve future responses.

Clairvoyant, Saratoga, CA, can detect kinds and degrees of emotion in email to rescue sales that are on the brink of failure. Angry customers can be detected before they are so irate that they walk away.

Solutions-United, Syracuse, NY, offers !metaMarker, which extracts intentions and urgencies in email in addition to emotions. These additional dimensions reportedly go beyond detection of angry customers by helping users formulate a rescue strategy. The system also categorizes and mines text documents.

Says Feldman, "It is clear that we have emerging technologies that will prove vital to most enterprises. They will improve the accuracy of search and enable users to explore the contents of their text collections in new ways. These technologies are out of the laboratory and into commercial use, but there are no clear leaders among the many small vendors that offer them."

Jeff Morris (jpm55@earthlink.net) is a freelance writer based in South Salem, NY.




Channels
Business Process Management
Content Storage
Content Management
Compliance
Enterprise Solutions
Document Scanning & Capture
Content Delivery & Publishing
Collaboration & Knowledge Management
Search and Classification
Locate an article from our print magazine. Just enter your Locator ID Number below.
ID#


NEWS FROM THE PIPELINE

OpenOffice.org 2.0 Closes On Final

New Study Finds Steep Growth For Smartphones

PalmSource Sale Cleared By Federal Agency

CTIA Panel Examines Enterprise Security Risks

[more]






HOME | ARCHIVE | REALWARE AWARDS

A Publication of the Network Computing Enterprise Architecture Group
Brought to you by CMP Media LLC, Copyright © 2005
Privacy Statement | Your California Privacy Rights | Terms Of Service