February 2002
Four Organizations Tackle 'Too Much' Information
by Jeff Morris
If there's one clear need in search and retrieval technologies today, it's for the ability to
search across an enterprise, tapping both disparate databases and multiple content sources.
In fact, Nick Wilkoff, Analyst for Techrankings Research at Forrester Research, says the
Cambridge, MA-based analysis and consulting firm is now focusing on "enterprise content management,"
which encompasses Web content, digital asset management and legacy content management.
"We're seeing a convergence," says Wilkoff. "A number of unstructured areas, including legal
documents and contracts, were typically managed offline; now, more and more of these documents are
being published online on Internet portals and intranets. Search engines now allow searching across
these different repositories."
Wilkoff emphasizes that enterprises need to concentrate on integrating the repositories and on
enforcing some sort of standard taxonomy for describing content. This was a significant challenge at
AMR Research, Boston, where taxonomy and search technology are helping analysts stay abreast of
internal and external research available in disparate formats.
"Doing a search against one Web site under your control is very different from searching through
all the information published by your company in the last five years," points out Scott Lundstrom,
AMR's chief technology officer. "There are multiple formats product pricelists, product history,
and so on that don't exist on the Web."
Forrester's Wilkoff also emphasizes the importance of being able to search through rich media.
"The technology in this area is maturing," he says, "with digital asset management adding context
around extremely unstructured data types."
Video is one of the fastest-growing sources of rich media in both corporate and government
settings, yet it is one of the most challenging types of information to turn into a searchable
resource. Visual search technology has helped the National Air and Space Administration (NASA)
organize and provide fast access to round-the-clock video feeds from the International Space
Station.
You might think HighWire Press would have an easier time than AMR or NASA in providing effective
searching. The online publisher's www.highwire.org site is focused exclusively on scholarly journals
text-centric documents that are all converted to the same format for Web publication. Yet John
Sack, HighWire's director, says that conventional search capabilities were not enough.
"The problem is that, in general, sites without search engines tend to be more usable than sites
with search engines," says Sack. "Consumers, especially, tend to type in the wrong search terms, and
come away thinking that there's nothing available. Browsing is a much more effective method of
finding what you need."
HighWire solved its search challenges by relying on technology to suggest categories so site
visitors can drill down to exactly what they need.
When members of the Certified General Accountants of Ontario (CGA) can't find what they need
online, they don't give up, they call up; but this has meant added cost and chaos to the
organization's call center. With the help of search technology, the organization has put the "self"
in its online self-service efforts.
While AMR, NASA, HighWire and the CGA all faced very different challenges, they all found answers
with better search technology. Read on to find out how they turned "too much information" into
strategic assets.
AMR Takes Its Own Advice
CASE STUDY: AMR Research
CHALLENGE: Track and retrieve documents internally and pull information from external sites
SEARCH PRODUCT: Autonomy
VENDOR: Autonomy, San Francisco, 415-243-9955, www.autonomy.com
"This is a case of an analyst actually putting his money where his mouth is," says Scott
Lundstrom, chief technology officer of Boston-based AMR Research, one of the leading high-tech
analyst firms. According to Lundstrom, most analyst firms were once small service organizations that
typically ran without much infrastructure at all.
"But in the late '90s, the analyst industry went through an awakening, realizing that we had to
become consumers of our own information," Lundstrom says. "The basis of our business is information
on vendors, but the average analyst is overwhelmed with information. In a given month, hundreds of
different outlets may publish documents on a particular company. In addition, we have a repository
of our own, as our analysts meet with individual vendors. One or two analysts may meet with a
company, but their notes might be of interest to the entire analyst group; there's no clear
delineation among markets or analysts."
Thus, the challenge is grabbing as much information as possible off of corporate Web sites and
newswires, adding all the notes generated by analysts, and then tagging all of this information and
making it available in a collaborative environment.
That environment made for an additional challenge. "Analysts [would sometimes] start working on
reports about a particular market segment not knowing that someone else was already working on the
same topic," Lundstrom explains. "We needed to have a way to avoid this kind of wasteful, and
embarrassing, duplication."
AMR began a selection process in the fall of 2000, considering search technologies from such
providers as Microsoft, Alta Vista, Verity and Inktomi. But it was difficult to find a solution that
satisfied requirements for use in a collaborative environment and provided an internal management
scheme.
"The system had to be able to figure out what documents were about and assign tags automatically;
we didn't want to have to administer meta tags by hand," Lundstrom emphasizes.
After four months, AMR decided that Autonomy best met its needs. One criteria was the
"services-to-application" ratio: "Through our experience with Autonomy, we knew that the
implementation cost was a little less than one times the license fee," says Lindstrom.
Autonomy's software was implemented within 90 days, initially covering categorization of content
related to 100 vendors. Since then, the number of vendors covered has continued to expand, and
keyword and concept search capabilities are now being extended to AMR's data via its public Web
site.
AMR's collaborative needs are now satisfied by the system's real-time functionality. "As an
analyst is saving [a Word] file into a particular directory on the server, a window pops up to say,
'Here's what we've written on this topic in the last month,'" Lundstrom explains. "That analyst can
see what everyone else is doing."
The system is essentially tagging documents and making them available as they're written, which
provides analysts with background information, prevents duplication of effort and reduces the need
for analysts to do research up front.
Assessing return on investment, Lundstrom says client research managers can now process a larger
number of inquiries in a shorter period of time an advantage that should pay for the software
within two years. More importantly, "at least four times since the implementation, the system has
prevented us from issuing something publicly that was either incorrect or a duplication," Lundstrom
adds. "There's no way to put a price on that."
NASA Taps Mission-Critical Video
CASE STUDY: NASA/InDyne
CHALLENGE: Allow NASA scientists and engineers to locate specific segments of space mission
videos and give them remote access to their selections.
SEARCH PRODUCT: Convera Screening Room
VENDOR: Convera, Vienna, VA, www.convera.com
Back in 1998, NASA could see it coming. It would be big no, it would be massive. And it
couldn't be stopped.
Nor would anybody want to stop it. After all, the International Space Station would mark the dawn
of a new era in space exploration. Still, to Silvia Stewart, video repository supervisor at the
Houston-based Johnson Space Center, and her five staff members, the impact would be comparable to a
giant meteor.
"It scared us," recalls Stewart. "Until then, we had been responsible for videos from about six
space shuttle missions each year. But we knew that the space station would be operating 24 hours a
day, seven days a week, 365 days a year and would be capable of [simultaneously] downloading four
channels of video. How would we be able to handle all of that, plus the shuttle missions?"
Stewart's trepidation was justified. She and her group (who are actually employed by InDyne, a
McLean, VA-based service provider that counts NASA among its largest clients) were charged, among
other duties, with writing scene-by-scene text descriptions of all mission videos. Those
descriptions are essential to assist NASA engineers and scientists in finding specific sections of
video needed to support crucial decisions and plan future missions. Stewart reports that since the
Expedition One crew first opened the Space Station hatch on November 2, 2000, her department's
workload has tripled.
Fortunately, they were prepared. Planning for the project and for the broader switch to digital
video technology began two years before that first Space Station expedition. A lengthy vendor
selection process culminated in the implementation of Screening Room software from Vienna, VA-based
Convera (www.convera.com) in September 2000.
Screening Room is a modular video content management solution incorporating four integrated
components: advanced video capture and analysis, intelligent indexing, acceleration of the video
production process and publishing of video content to the Web. Video assets are controlled from a
single platform by the creation of a digital archive.
Mission videos, meaning anything documenting activities in a manned space flight (including
launch, landing, video downlinks and all crew recordings), account for half of the content in the
repository that Stewart oversees. The other half is ground-based content. As downlink video is
captured, members of Stewart's staff working in three shifts write scene-by-scene descriptions in
real time. These descriptions are entered through Screening Room's Edit client directly into the
Capture client, which captures and creates a storyboard of key frames. A Browser client provides the
system's Web interface.
It's clear, says Stewart, that the system has provided benefits, not the least of which has been
the ability to handle the Space Station workload without an increase in staff. But perhaps more
importantly, it has allowed Mission Control personnel to review material earlier if, for example,
they need to analyze a hardware problem.
"We traditionally had to fill requests from Mission Control by searching through the database,
pulling the appropriate tapes and displaying them in a screening room," notes Stewart. "Now, anyone
[using the browser client] can access the video file and storyboards, do their own search and view
the video segments remotely. As soon as a video downlink ends, we post the video to the server;
since most downlinks last an hour or less, those videos are viewable within the hour."
Remote access has extended fast video access beyond Mission Control to NASA engineers, mission
analysts, mission planners and those involved with the shuttle and space station robotic arms.
Remote access covers many NASA locations, including the Ames Research Center in Mountain View, CA,
the Jet Propulsion Laboratory in California and the Kennedy Space Center in Florida.
Stewart expects one of the biggest paybacks of the Convera system will be in savings generated by
digital video production.
"Projects often require that a compilation tape be produced consisting of numerous scenes and
segments scattered across a lot of different tapes," she explains. "We research an engineer's
request and give the video production facility a list containing all the start/stop times.
Production then takes the list into the dub room, pulls off all the segments specified, and copies
them onto one tape. It's really labor intensive. But now Convera generates an edit decision list,
which is a key element of digital video production. We now have an editing system that can read the
decision list, find the right files and communicate directly between our MPEG files and
production's. We no longer need to do it all by hand."
It doesn't take a rocket scientist to see that digital asset management has translated into one
giant leap for NASA.
Keywords Don't Cut It at HighWire
CASE STUDY: HighWire Press
CHALLENGE: Help clients find information needed among 13 million research papers.
SEARCH PRODUCT: Semio Taxonomy
VENDOR: Semio Corp., San Mateo, CA, 650-638-3330, www.semio.com
HighWire Press isn't actually a press, but it certainly has to perform a high-wire act. The
nonprofit organization puts half of the world's most prestigious scientific, engineering and medical
journals online. It then has to help people find exactly what they're looking for among the many
long, complicated articles containing figures and thousands of words. The online database currently
includes more than 13 million documents.
"It's really like finding the proverbial needle in a haystack," says John Sack, HighWire's
director.
HighWire is a department within Stanford University akin to the Stanford University Press. The
site began with a single online journal in 1995, and it now produces 296 journals online, with many
more planned. Several years ago it became clear that HighWire's original search engine was not
producing the needed results.
"A keyword search [wouldn't] cut it," says Sack. "That only worked when there wasn't much
material online. Any term thrown at the search engine brought back hundreds of hits, and we realized
that results/relevance ranking wasn't really helping people get to the right stuff."
Sack's solution was to give people alternative search terms with the help of taxonomy technology
from Semio, San Mateo, CA. The effort began with 5,000 concepts and has since expanded to contain
some 20,000 concepts. Developing the taxonomy has been a painstaking process; it's taken nearly two
years to process 12 million records and finally get the new HighWire site to the beta stage. Was it
worth it?
"Absolutely," Sack affirms. "It was essential that we give researchers the ability to see exactly
what topics are covered in an article and to reduce their searches accordingly."
As a nonprofit institution, HighWire doesn't measure returns in hard dollars. "Our goal is to
help people find what they need," says Sack "Someone about to do an experiment needs to know if it
has already been done. By allowing searches to quickly be narrowed to a specific set of topics, I
think we've succeeded."
E-Learning Starts with Search
CASE STUDY: Certified General Accountants of Ontario
CHALLENGE: Index and access one gigabyte of course materials and supporting information
SEARCH PRODUCT: Fulcrum KnowledgeServer from Hummingbird
VENDOR: Hummingbird, North York, Ontario, Canada, www.hummingbird.com
Go to the Web site of the Certified General Accountants of Ontario (www.cga-ontario.org) and
click on "search." When you arrive at the search engine, click to expand the folder marked "Web
Sites." You'll find a list of 13 additional folders, and you'll get a good idea of how Fulcrum
Knowledge Server helped make a broad range of materials easier to find.
The Certified General Accountants (CGA) of Ontario is a self-governing body that guides the
professional standards, conduct and discipline of its approximately 13,000 members and 8,000
students in the province of Ontario.
CGA designation is achieved by completing assignments and national examinations, passing a
comprehensive final exam, fulfilling practical work experience requirements and meeting a university
degree requirement. To meet all these requirements, members need information and lots of it.
According to Boyd Dyer, manager of Web technology, CGA must furnish members with an incredible
amount of supporting documentation, including course materials.
After converting paper-based course material to a Web presentable form in the summer of 2000, the
CGA discovered it had nearly an entire gigabyte of information on its Web site. It became clear to
Dyer that something had to be done as call after phone call to the organization inquired, "Where do
I start looking?"
The CGA decided on Hummingbird and its Fulcrum KnowledgeServer technology. The fact that
Hummingbird was a local company certainly entered into the decision, notes Dyer. But, he says, "a
key selling point was that KnowledgeServer accepts multiple formats; it's not just text-based. And
we upload many media commercials, PDF files, Word documents, executable files, MP3s ... our content
runs the gamut."
Dyer says he did the installation himself in December 2000 and had the system up and running
within one week. Since then, he's upgraded to newer versions; with the latest version, he says, all
he needs to do is create the basic index, point the software toward the site, and it does everything
else. The result, says Dyer, has been to free up a lot of the call center's time.
"Users can now find a lot of the information they need by themselves," he reports. "Our call
center is always overburdened; using KnowledgeServer for search and retrieval has reduced that
burden and made it more manageable."
Spotting the Trends in Search
The latest trend in search according to Susan Feldman, doyenne among classification and
retrieval analysts is a move toward hybrid systems. "There are holes in every algorithm," notes
Feldman, director of content and retrieval software research at IDC, Framingham, MA. "It doesn't
matter what you're using them for; every algorithm is good at some things and not at others. Hybrid
systems make up for some of those faults by looking at things from multiple viewpoints."
The use of several complementary algorithms to classify and tag information can produce more
accurate retrieval results, says Feldman. For example, a mixture of natural language processing
techniques with statistical processing can make up for deficiencies in either. In Feldman's opinion,
one of the best examples of a hybrid system is Stratify (formerly PurpleYogi), Mountain View, CA,
which uses four different algorithms for categorization. Other vendors offering a hybrid approach to
classification include MediaSite, Pittsburgh, PA, and Quiver, San Francisco.
While classification (also known as taxonomy) tools can organize and tag information
independently of search engines, the two technologies can also work hand-in-hand. Some vendors
provide technology for both classification and retrieval, tuning search engines to leverage the
results of the taxonomy technology.
Beyond the move toward hybrid systems, Feldman points to several other vendors that are
introducing new approaches and technologies to classification and retrieval:
ClearForest, New York, mines text, categorizes it, and cuts it into chunks so that it is stored
in small pieces that can answer questions such as, "What happened in data warehousing in October?"
or "Show me the mergers and acquisitions that have occurred in the last six months in the petroleum
industry." Information is reassembled on the fly in the form of interactive timelines and
relationship grids.
iPhrase, Cambridge, MA, uses natural-language processing to improve retrieval, but it also
improves navigation by presenting results in an easy-to-understand, tabular format, rather than in a
long list of documents. Customers include Charles Schwab and CNET.
Knowmadic, Santa Clara, CA, offers a KM Studio that sets up agents that can record repetitive
information-seeking forays on the Web and then lock onto the data that the user needs to monitor. KM
Studio then checks on this data at regular intervals to see what has changed.
Inxight, Santa Clara, CA, has products that categorize, search and extract entities and provide
visual navigation aids. Inxight's LinguistX platform is widely used by other retrieval vendors to
improve the quality of search results. Customers include Lotus, Bertelsmann, Microsoft, Batelle,
Verity and Factiva.
Primus Knowledge Solutions, Seattle, offers The Answer Engine (originally AnswerLogic), which
provides direct answers to customer questions on the Web rather than lists of documents. The system
employs tools that analyze patterns of questions in order to improve future responses.
Clairvoyant, Saratoga, CA, can detect kinds and degrees of emotion in email to rescue sales that
are on the brink of failure. Angry customers can be detected before they are so irate that they walk
away.
Solutions-United, Syracuse, NY, offers !metaMarker, which extracts intentions and urgencies in
email in addition to emotions. These additional dimensions reportedly go beyond detection of angry
customers by helping users formulate a rescue strategy. The system also categorizes and mines text
documents.
Says Feldman, "It is clear that we have emerging technologies that will prove vital to most
enterprises. They will improve the accuracy of search and enable users to explore the contents of
their text collections in new ways. These technologies are out of the laboratory and into commercial
use, but there are no clear leaders among the many small vendors that offer them."
Jeff Morris (jpm55@earthlink.net) is a freelance writer based in South Salem, NY.
|