September 2003
Putting it Together: Taxonomy, Classification & Search
by Jeff Morris
Continued from [ page 2 ]
Coming Soon to a Web Site Near You
According to Woods of Ovum, a real transformation in search and information discovery is coming as a result of the integration of innovations including taxonomy navigation, personalization, bridging the gap between structured and unstructured information and the use of visualization techniques. He foresees a flexible categorization of information and knowledge sources structured and unstructured, explicit and tacit that can be easily visualized, navigated and tailored to the specific search requirements of both organizations and individual users.
In fact, this transformation has already begun. Led by consumer demand for faster, more complete product information, many enterprises are discovering that the e-commerce model speeds access to all kinds of corporate information. With the leading search vendors already on board, innovative ways of utilizing taxonomy and classification in various combinations with search are proliferating, making information retrieval not only faster and easier, but also more accurate and cost-effective than ever before.
A Quick Game of Tag
What if you have massive quantities of information that need to be tagged fast? Before any type of search can be conducted employing taxonomy or classification, information has to be tagged to allow it to be categorized and, ultimately, found.
Congressional Quarterly (CQ), a Washington, D.C.-based publishing company, faced a particularly daunting tagging challenge. Among the company's products is a Web-based legislative tracking service encompassing 24 databases, including five databases of documents from the Government Printing Office (GPO). While some of these documents are digests, most are large, 3- to 4-MB files. Yet endusers want to be able to locate the relevant portions of these documents almost immediately.
Susan Shipp, CQ's managing editor for new media, explains that although these documents are freely available to the public through government Web sites, CQ is able to attract subscribers by adding value in two ways:
l First, adding structure: The way in which bill texts and legal language are presented is very important, but the original documents could only be searched in a full-text approach. By converting these documents to XML, CQ can provide versions that look exactly like the print version while also providing a computer-navigable structure. The Library of Congress Web site offers only ASCII text versions of these same documents.
l Second, adding metadata and contextual linking: CQ is able to link to specific documents and portions of documents from its own content. As Shipp notes, "the Congressional Record can run up to 100 pages per day, and it's a daunting task to go through all that material just to find the small portion that deals with the bill or legislation in which you're interested." At the document level, CQ is able to provide links to related bills. (In order to preserve the integrity of the source document, these links appear in the margin.) The links are accomplished through a combination of metadata and CQ's own editorial system.
The key to these value-added capabilities lies in the ability to quickly metatag great quantities of information. That has been made possible by DataStream Conversion Services, a College Park, MD-based company that was born out of the University of Maryland's Technology Advancement Program business incubator.
[ BACK | NEXT ]
|