Intelligent Enterprise featuring Transform
START NEWS & ANALYSIS OPINION CHANNELS PRODUCT GUIDES REVIEWS TECHWEBCASTS
CONTACTS ARCHIVES ADVANCED SEARCH

December 1998

Content Management Breaks Down Your Docs

by Liz Levy

Where traditional document management ends, content management begins. Document management handles information from a file perspective. Each file is it's own entity, and it is indexed, stored, retrieved and used at the file level. Content management breaks down a file into its component parts, and these elements are indexed, stored, retrieved and used at the content level.

A document file can be made up of multiple types of information, such as charts, tables, headlines, captions, text and even sound or video. Content management breaks out the content from the document. The information can then be accessed individually and brought together in different ways. As a result, you can "repurpose" information for different mediums, including print, CD-ROM and the Web.

With the emergence of the Web, many businesses are discovering that they are in the publishing business. According to analyst firm International Data Corp. of Framingham, MA, "One of the biggest challenges facing organizations deploying intranet applications is the ongoing organization and management of the dynamic set of interrelated documents [content], generally authored as a collaborative process, that compose the content found on these corporate sites."

Document management systems have served many organizations well in this regard, providing a way to share information internally and externally across intranets. But what document management has been lacking is the ability to use and reuse bits and pieces of information for different purposes and with many different delivery options. This is where content management comes in.

There is some overlap in the document management and content management (as well as knowledge management) markets. "The process is more important than the product", says Priscilla Emery, senior VP of information products and services at AIIM International. "The product must accomplish your process. With the growth of the Web in business, everyone is becoming a publisher and content management deals with those publishing issues. It is a publishing process that works in front of the business process."

XML Provides a Foundation

XML (Extensible Markup Language) is a technology that is bringing together the separate worlds of document management and content delivery. Using XML, a document can be separated into its discrete elements. For example, a 50-page report might contain a five-page executive summary, three 15-page chapters, 20 charts, five tables, eight photographs, 40 headlines and 30 captions. XML offers a way to identify the different types of information presented and the relationships between that information.

We are all familiar with HTML (Hypertext Markup Language), the language most popularly used to display information on the Web. HTML deals primarily with presentation of electronic documents and does not offer much in the way of providing structure for the content. In the 50-page report example used above, HTML would not be able to provide information on the difference or relationship between a headline and a caption, with the possible exception that they might be presented in bold type.

HTML was developed from the more complex SGML (Standard Generalized Markup Language), which was developed in 1978 with the idea that a standard "markup" could create structure separate from the appearance of the document. Markup is simply providing information along with a document that says something about its appearance or structure. SGML is a language that is used to code nearly any type of document to describe its structure. SGML is not as widely used as HTML because of its complexity and the high cost of SGML editing tools.

XML was developed as a simpler version of SGML. XML documents are composed of many entities better described as objects, such as the Word, PowerPoint, TIF or Wave files that might make up a document. Each object can contain one or more elements, which is a single bit of information usually encapsulated as a string of text. Each of these elements can have certain attributes or properties that will describe the way in which the information is to be processed.

XML provides a way of describing the relationships between these entities, elements and attributes. It tells the computer how it can recognize the component parts of a document. It accomplishes this through the use of metadata, which is simply coding information about the information. Metadata is any information that accompanies an XML document, including tags, DTDs, links and style sheets.

Elements in the document are marked with and identified by tags. The tags are placed at the beginning and end of a string of text to describe the attributes of the element. Tags do not describe the format ý whether the text is bold or italic. Nor do they instruct the computer what to do in a Web page when people click on it. Tags simply identify what the element is. This is different from HTML, where the tags must perform all these functions at once.

Separating the formatting from the element's identity makes it easier to repurpose information for different mediums. With HTML you have to write new tags each time you reuse an element for display on a different medium. With XML a Document Type Definition (DTD) lets you define the tags that were created and then use and reuse the elements.

A DTD is created once to describe the document type (i.e. expense report, fiscal statement, press release), what the content elements are and their structure. It does not define aspects of format, such as bold or centered. Formatting is done later through XML style sheets using a format not yet finalized called XSL (Extensible Style Language).

Once a file is created and submitted to the central repository, it is validated against the DTD and parsed, meaning its content elements are broken out. Validation is a key function of XML. It lets the structure of data be checked before it is used in an application.

Once parsed, an XML document is manipulated through an object model (or API). A standard DOM (Document Object Model) is being developed by the World Wide Web Consortium (WC3), the international body that develops standards for the Web.

The DOM lays out an XML document and its elements in a tree structure.

Simply stated, XML provides a standard file format for presenting data and a standard way to include information about the data (metadata) to describe its own structure. XML Version 1.0 was finalized as a W3C Recommendation in February of this year. Work is still underway on the accompanying style language, called XSL (Extensible Style Language), and a link language, called XLL (Extensible Linking Language).

The XML standard has been adopted by both Microsoft and Sun. Microsoft has implemented support for XML within Internet Explorer 4.0 and will continue support with 5.0. XML will also be adopted in the next release of Office and other Microsoft products. When these products arrive, you will be able to "Save as XML" as simply as you do any other format today. Standard DTDs are being developed for many other popular applications, but as of now, users will need to design their own DTDs, which is not an easy task.

How Content Management Works

The model for content management is to store and manage XML content from a central repository, usually an object-oriented or relational database or a hybrid of both. Content management systems offer many of the same functions as document management systems, such as storage, version control, check-in/check-out and text searches. What's added is the ability to tag, index, search and reuse smaller elements of a document.

Content management systems output content from the central repository through any means of delivery available. Web delivery is particularly efficient. Take as an example a company that manufactures custom PCs that needs to provide customized manuals on demand for each configuration ordered. Content management would let this manufacturer break out and store content on individual PC parts as separate items with their relationships to other parts.

The content on just those parts ordered by a specific customer could be pulled together to create a manual. The manual could be printed, output to CD-ROM or served up on a Web site at the customer's request. The result is more efficient, dynamic and responsive use of content, and this translates into cost savings for the manufacturer and better, more personalized service for the customer.

Like many of today's document management systems, XML will take advantage of three-tier architectures that support Web applications. In this model, a browser front end interacts with a middle-tier Web server, which in turn communicates with a back-end database server for central storage. Information can be converted to XML on the middle tier, offering new ways to access stored information from mainframes and databases. This data can then be delivered to the Web and exchanged online as easily as HTML pages using HTTP.

A Look At Three Products

The three content management systems described below are providing innovative solutions for using information efficiently and for automating the publication of information to the Web. The main features that are common to all these products are a central repository based on a database, XML content, HTML conversion and Web publishing. Many of these vendors and products also have a history of supporting SGML. This may offer those looking to move from SGML to XML some added expertise and convenience.

DynaBase v3.0 from Inso (Boston, MA 617-753-6500 www.inso.com) is an XML-based content management and web publishing system. XML content management capabilities include import, parse and store XML components with indexing and version control. Content for publishing can also be served on the Web with XML tag-level scripting and XML tag-level search and retrieval.

The content management components include a Data Server and a Web Manager Client. The Data Server plugs into existing Web servers for content management. It is a repository that includes a built-in object-oriented database for storing any and all Web content. The content can be XML, HTML, graphics, video, applets or scripts.

The Web Manager Client is the interface to the Data Server. From the Web Manager Client, users can browse for files and check out content from the Data Server or import new content using HTTP. The Web Manager automatically launches authoring applications when you want to edit content.

Inso's publishing system includes the Web Server Plug-In and the Web Developer Client. The system lets publishers create Web sites and pages on the fly.

Version 3.0 of DynaBase, which was released in September, includes Inso's "Outside In" Web server-based file format converter, which automatically converts popular file formats like Microsoft Word, Excel and PowerPoint to HTML, GIF or JPEG. Also included are new workflow capabilities that can be applied to any authoring, editing and publishing process.

DynaBase supports Mac and Unix platforms through a cross-platform Java client. There is also added plug-in support for Microsoft Internet Information Server 4.0 running Windows NT and Netscape Enterprise Server 3.5.1 running Windows NT 4.0 and Sun Solaris 2.6.

Starting pricing for a DynaBase enterprise installation is about $47,000. This includes ten client licenses, one data server that supports unlimited users, three Web server plug-ins for management and delivery to the Web and ten host or domain names for supporting ten separate Web projects at once.

Blade Runner from Interleaf (Waltham, MA 781-768-1578 www.interleaf.com) is an enterprise content management system scheduled for release this month. BladeRunner addresses the entire life cycle of content in three layers: content creation, content repository and content publish.

Content is created in Blade Runner using an XML creation toolbar that is installed right into Microsoft Word. The XML toolbar lets you use Word as an XML editor. Standard DTDs are implemented and used to create Word templates that enforce certain rules or restraints on the author. Users do have to know XML. They create Word files using a template that matches the XML DTD. The author is guided with lists of choices for valid content. The document is then validated against the DTD and saved as an XML document. The document goes through a final validation before check-in to the content repository.

A DTD creation tool, Microstar's Near and Far Designer, is included, but any other DTD creation tools can be used. Authors are given lists of choices for content structure that adhere to the XML DTD. The content is validated against the DTD before it is checked into the content repository.

Blade Runner's content repository is built on an object-oriented database from Poet (San Mateo, CA 650-286-4640 www.poet.com). Once the content is validated, the XML document is "burst" (i.e., broken down) into its discrete elements. The reusable content elements can be brought together and formatted for output using the menu-driven Composer/Style Editor.

The database stores XML content along with the DTDs, XSL style sheets and XLinks. The system provides versioning, check-in, check-out, searching, navigational linking and referencing. Searches can be performed on text as well as content, properties, structure and metadata.

A tool is provided for creating XSL style sheets, which are applied to the XML content to render a document. Users can preview the layout and edit the final document.

The XML content is published using a two-part Batch Publishing Engine that prepares content for print, CD or the Web. Since browsers other than MS Internet Explorer have yet to natively support XML, the publishing engine converts content to HTML for presentation on the Web. The Assembly Engine uses Cascading Style Sheets for batch conversion to HTML for the Web.

The XML toolbar can also be installed in the Interleaf 7 publishing system. Interleaf supplies data adapters with Blade Runner that support the integration of applications like SAP and Baan and the use of this information by the content management system as XML content. Blade Runner operates on Windows NT servers and Windows NT/9X clients.

Pricing had not been set yet but will be announced this month at the release date and will be in the six-figure range for enterprise applications, depending on how the solution is packaged.

Information Manager 2.0 from Texcel (Cambridge, MA 617-621-7004 www.texcel.no) puts a great deal of emphasis on collaboration. It is a content and "process" management system that includes collaborative tools for using content. Users can find, edit, review, reuse and assemble information managed from an Information Manager (IM) database.

IM identifies all the elements of a document defined by XML or SGML markup as separate objects. These can be searched, retrieved and accessed for editing or viewing. IM tracks object versions in the database and the software manages the Links between objects and documents along with metadata.

The system integrates with SGML/XML editors such as Adobe's FrameMaker+SGML and ArborText's Adept Editor, making it easy to update XML and SGML data from the database in their native application. For Microsoft Word files and other types of files the application can be selected. Integrated ActiveX controls can be used to make different file types like MS Word files accessible from the database.

Using a Windows Explorer-like interface, users can browse multiple databases and click on an XML or SGML document. IM displays a view of a document's contents in a small window. Search tools let users find any content, link or metadata.

To support collaboration, an Electronic Review tool lets users insert comments that can be passed along with the document. IM includes a workflow system that lets you design and implement work processes.

IM's Document Assembly tool lets you gather content to create documents or publications that can be changed or updated for different purposes. A Document Assembly template can incorporate boilerplate text and database queries.

Visual Basic can be used to add customizations. Open APIs are also available to developers. IM supports Windows 95 and Windows NT clients. It also can support Web browsers, but the separate IM Web Application is needed. Windows NT servers and Unix servers are supported.

Information Manager costs $25,000 for the starter kit, which includes the server, software, development, workflow and four concurrent user licenses.

Doc. Managers Tackle Content

Many document management vendors have been slow to adopt XML. Those that have adopted it offer useful systems for managing information at the content level and delivering content to the Web as a front end to their enterprise systems.

Without XML, you cannot achieve the same level of granular information management. The 50-page report mentioned earlier could not be broken down into its discrete parts unless those parts were saved as separate files. To reuse these elements, you would have to cut and paste content together in a manual, labor-intensive way.

This is the crucial difference between content management and compound document management, which has long been supported by many document management systems. With compound document management, different objects, such as Word files, Excel files, images, graphics, sound or video can be accessed from one system and pulled together for delivery to the Web.

FileNet (Costa Mesa, CA 714-966-3400 www.filenet.com) is taking on Web publishing and talking about "content" management with the introduction of Panagon Web Publisher, which was introduced in November. The product offers an automated solution for managing and incorporating multiple types of information and the delivery of that information to the Web.

In most organizations, information such as financial reports, marketing materials and customer records are authored or created in conventional applications. When the company wants to put that information online, they send it to the Web master. The Web master is the middle man responsible for managing the site, including the structure, updates, revisions and purges.

Panagon Web Publisher distributes the job of the Web Master by providing tools to automate the publication of information to the Web. The system converts documents (or the discrete files of a compound document) into HTML on the fly.

FileNet's Web Publisher is a toolkit for the Panagon IDM (Integrated Document Management) system and can be used with imaging, COLD and workflow. Features include the PWP Project Manager, which draws source documents from Panagon libraries, organizes them into related sections and subsections and automatically publishes linked Web sites and online compound documents. Web sites can be published and managed from the Panagon libraries within the document repository.

PWP Station uses translation templates that control the automatic creation of HTML renditions of source documents, including format and style. The PWP Station also generates hyperlinks, tables of contents, keyword indexes and reference lists.

PWP Scheduler updates Web publications on a set schedule or whenever new versions of existing materials become available. This automates the process of revising Web publications to offer the very latest information.

Related project files maintain an audit trail of what was published at any given time, an important feature as the legal liabilities of Web publishing emerge.

FileNet says it is waiting for enabling technologies, such as wider Web browser support, before incorporating XML. When this happens, the system will support the publishing of discrete content as well as compound documents in XML as well as HTML pages. Panagon Web Publisher is base priced at $19,500 for the server and three clients with Web publishing stations.

Documentum (Pleasanton, CA 925-463-6800 www.documentum.com) describes "content" as any business-critical document or information in any format, such as word processing files, spreadsheets, graphics, emails, HTML, Java, etc.

RightSite is the module included in Documentum's Enterprise Document Management System (EDMS) that enables the assembly and delivery of content-based information to Intranets. Using RightSite, EDMS users can store, manage, retrieve and deliver mixed "content" from a common repository. All objects are stored in the DocuBase object-relational database and are shared with the RightSite Web server.

RightSite is not new, but support for XML has been added through integration with other tools for enhanced content management capabilities. Documentum promises to add native XML functionality in a future release. Presently RightSite supports XML through their partnerships with Abortext (Waltham, MA 781-529-1000 www.arbortest.com), makers of Adept XML editor.

RightSite includes native support for SGML, which can be used along with HTML to bring content to the Web. WebQL, a super set of the Structured Query Language (SQL), is a published set of HTML directives for dynamic page assembly. WebQL is used to tag items with attributes and create live links to other items. RightSite automates the process of link creation and deletion.

Saillant Consulting Group's (Denver, CO 303-708-8209) HTMLRender is integrated with Documentum's RightSite to automate the conversion of documents to HTML to simplify creating Web-ready documents.

Documentum says new APIs and SDKs will be offered in future releases to add more flexibility to RightSite as well as added support for active server pages and XML publishing. RightSite is included in EDMS 98 and is not sold separately.

PC Docs (Burlington, MA 781-273-3800 www.pcdocs.com) offers what they call a "compound document management and publishing module," called Docs Binder, for its Docs Open document management system. Docs Binder is geared toward collaborative environments where documents are made up of multiple items and have multiple authors spread across the enterprise.

Within a "binder," the separate components of varied file formats are managed as individual documents. The documents can be edited separately or simultaneously within the larger binder document. Individual documents can be used in more than one binder, allowing for the reuse of content. The content can be updated within all binders automatically.

The contents of the binder are viewed in an Explorer-like tree diagram. Documents can be dragged and dropped between and reordered within binders. Binders are stored in XML format and contain tags that identify links between documents.

For publishing, PC Docs integrates with third-party products such as PDFfusion from Computerized Document Control (Monmouthshire, UK 888-240-1752 www.docctrl.com). PDFfusion will publish binders as PDF documents on the fly. The software offers templates for layout. Other third-part publishing systems can be integrated using the Docs Binder toolkit.

 




Channels
Business Process Management
Content Storage
Content Management
Compliance
Enterprise Solutions
Document Scanning & Capture
Content Delivery & Publishing
Collaboration & Knowledge Management
Search and Classification
Locate an article from our print magazine. Just enter your Locator ID Number below.
ID#


NEWS FROM THE PIPELINE

OpenOffice.org 2.0 Closes On Final

New Study Finds Steep Growth For Smartphones

PalmSource Sale Cleared By Federal Agency

CTIA Panel Examines Enterprise Security Risks

[more]






HOME | ARCHIVE | REALWARE AWARDS

A Publication of the Network Computing Enterprise Architecture Group
Brought to you by CMP Media LLC, Copyright © 2005
Privacy Statement | Your California Privacy Rights | Terms Of Service