August 2002
Choose the Right Tools for Content Editing
by Bill Trippe
It's ironic. Technology providers have been eager to shelve the term document management and
embrace content management, but when you look at what many organizations need to manage, the most
challenging problem remains ... um ... document content.
Why are documents so challenging? Because they have structure and detail that other applications
require everything from manufacturing part numbers to the names on contracts. As content
management systems (CMSs) are integrated with other enterprise applications, a key question is, "How
can documents be created and kept up to date by human editors, while at the same time providing the
structure and detail that the CMS needs?"
For most CMS implementations, template-based and HTML forms-based interfaces still predominate,
especially in applications that connect human users to "fielded" data. For example, marketing staff
can change a product price by entering the new value in the CMS interface's price box.
But forms-based interfaces don't hold up well with document-length content. Users are much more
comfortable with a familiar word processing interface, including formatting and editorial tools such
as spell checking that have become a normal part of document workflow. As a result, many CMS vendors
have adopted mechanisms for at least loosely coupling word processing tools with their
applications.
In light of the ongoing need to manage document content, it's natural to wonder whether the
documents themselves, or some components of the documents, should be maintained as XML data. Would
this choice mean that the people responsible for creating and updating the documents would need to
learn XML coding and use XML editing tools?
These questions are only beginning to be answered, and as a result, the state of the art in
editorial interfaces is a mixed bag. The CMS implementations now in the field reveal a combination
of XML editing tools, template editing and some specialized tools for content conversion, pretagging
and posttagging.
Different Needs Demand Different Interfaces
If organizations are going to take full advantage of CMSs, knowledge workers must be able to
create and update content themselves and add intelligence to the content being managed. As more and
more people in the enterprise contribute and update content, the need increases for simple-to-learn
yet powerful tools that allow contributors to concentrate on the material they're creating or
updating. They shouldn't have to worry about the intricacies of how content should be tagged for
manipulation by the CMS.
When considering editorial interfaces for content management, organizations have a number of
options:
Template (or forms-based) interfaces have become de rigueur for CMSs, mainly because CMS
applications are often based on relational repositories, and the shortest distance between a thin
client and a relational database is an HTML- or Java-based form. The bulk of CMS applications are
reliant on this kind of interface as the primary means of entering and updating content and
assigning metadata at a coarse level.
Word processing applications such as Microsoft Word are used to create content to be stored in
its native form or saved as HTML, XML or another neutral format. The challenge, of course, is
integrating the proprietary structures in Word with the more neutral structures the CMS stores
either relational databases, some sort of object storage or XML. The granularity of tagging
available with this option can vary greatly. While Word itself typically applies styles at a
paragraph level, some editorial interfaces based on Word can impose XML or object-level
granularity.
HTML editors are used to create and edit content stored as HTML or smaller, relatively discrete
"chunks" of content that can be mapped from HTML to the underlying data structures. This approach
offers little efficiency for situations in which content must be finely grained and/or of high
quality.
XML editing tools can be used to create and edit all or some of the content and can also interact
with metadata. This approach provides the strongest capability for finely grained content tagging
but also can be more expensive in terms of software seats, training and support.
Preprocessing and postprocessing tools are available. For example, proprietary formats such as
Microsoft Word and Quark XPress can be "debinarized" and then run through filters on their way into
the CMS (including into XML-based relational database storage). The reverse process can be performed
on the way out. When using this approach, the level of tag granularity is dependent on the quality
and capabilities of the filters and processing from binary formats into XML or other object
structure.
Enterprises that have implemented content management typically use a combination of interfaces,
with template interfaces being the most popular choice. Each of these tools has different strengths
and weaknesses and may be more or less appropriate for your content needs, staffing situation and
user requirements.
As a general rule, users should be provided with authoring tools that match their level of
content contribution, training and flexibility (see "Authoring Tools by Type of User").
Treat Editing as a CM Application
Vendors and end-user organizations should treat the editorial interface to the content database
as an application unto itself. In effect, the template interface is an application, yet it has too
often been little more than a loose coupling of scripted, HTML-based forms linked to some kind of
back-end repository. If the coupling is too loose, you'll lose strict control over the content
that's being entered in the forms, and you may also lose the ability to effectively manage longer
text elements.
Early CMS projects were plagued with difficulties in establishing template interfaces and later
with maintaining them, but better editing tools are now available to integrate with CMSs. The
management applications have also matured, gaining improved support for file formats such as
Microsoft Word documents. CMSs are also better at controlling the editing of text via templates or
forms, mainly because specialized editing tools for integration with templates have emerged.
Two examples of better editing tools for use in templating environments are from RealObjects,
Saarbrucken, Germany, and Ektron, Amherst, NH. RealObjects' Edit-on is a Java applet that has been
integrated with Fatwire and other systems. Ektron offers eWebEditPro, an editing control that can be
added to template interfaces, and eWebEditPro+XML, a tool for adding and managing XML. Ektron's
editing tools are used by CMS vendors including Vignette, divine, Eprise and Microsoft.
Used properly, editing controls such as those from Ektron and RealObjects provide much more
granular handling of text that was previously only crudely tagged. These newer editing controls are,
at the least, improving on content accuracy and, at best, introducing more fine-grained markup,
including XML markup. With these and other improved tools for editing, newer-generation CMSs are
relying more on XML and the greater flexibility it provides in modeling content.
In the past, if a CMS customer implemented an editorial interface and then later needed to change
underlying data structures, the enterprise likely had to heavily modify or completely rewrite the
interface especially with CMSs that used relational databases as the underlying data store.
One of XML's advantages is that it makes modifications to data structure much easier. If the
underlying data store is relational and the interface is a heavily programmed template, changing the
underlying data structure is a complex, programming-heavy task. If the underlying data structure is
XML, changing that structure typically means modifying the Document Type Definition (DTD) or XML
Schema and then running a process to update the XML editing interface, often automatically. The XML
editor is then ready to parse the text according to the revised DTD or Schema.
"Integrating an XML editor with a [CMS] provides customers with immediate benefits," says Bruce
Sharpe, executive vice president of XML Content Solutions at Corel, Ottawa, Ontario, Canada, the
company that recently acquired the XML editing tool, XMetal. "Information can be accessed and edited
through a single interface. This makes a huge difference for customer[s] and their
productivity."
While XML editors offer clear benefits, not all editorial interfaces must be XML editors. Indeed,
the nature of enterprise content management is that underlying data structures will be a combination
of relational, XML and other data types. There will also be all manner of content in terms of
length, value and shelf life. The editorial interface(s) should then be appropriate to:
1. The content type both data type and length.
2. The user type from occasional contributor to IT administrator.
3. The content's point in the lifecycle initial creation through editing and updating.
4. The content's shelf life.
5. The requirement (or lack thereof) for content to be tagged at a granular level.
With these factors in mind, those implementing CMSs should consider supporting a variety of
interfaces. For example, regular contributors of complex and lengthy documents to an XML database
would be well served by a tightly integrated XML editor. But ad hoc contributors to that same
database should be given a simple-to-use tool that exposes precisely the content they need to edit
and validate before returning it to the repository. Ad hoc users could also be supported by a
workflow process that forwards their revised content to more skilled editors to ensure that content
was entered or updated correctly. In yet another example, corporate users of an Intranet that
supports a small number of simple document types could use a set of Microsoft Word templates. The
Word files could then be processed through a tool that normalizes the files into the format required
by the CM repository. When the documents need to be modified, reverse processes could reconstruct
the Word files for further updating and editing.
Metadata is likely a combination of XML, relational data and other data types. By its nature,
metadata is often structured, discrete and relatively short in length. For example, a CMS system for
an electronics company could maintain detailed information on parts, product availability and
maintenance procedures, all as metadata. Such information could include fixed values, choice groups
("yes" or "no," or "X," "Y" or "Z."), and other data types that lend themselves to structured
interfaces and enforced validation.
If we consider again our range of authors from occasional contributor to the power user they
likely have a similar range of needs for editing metadata. An infrequent contributor adding a
Microsoft Word file could be required to fill out a simple form or even be required to fill out the
"Property Sheet" embedded within Word (File Menu... Properties). On the other end of the spectrum, a
knowledge worker could be provided with an XML editing tool as an interface to the required
metadata. The GUI of a commercial XML editor such as Arbortext Epic or Corel XMetal can be
configured to behave like a forms-based interface while capturing and storing XML data.
Not all content will have to be XML. However, if you handle complex or lengthy documents, if you
need content to be "componentized" and you need it to be accessible to a variety of editorial
interfaces, XML makes sense. XML has helped fixture manufacturer Kohler Plumbing of Kohler, WI,
better manage the creation and maintenance of some 10,000 active documents ranging from in-box
literature, service documents, service manuals and tech sheets, to presales sheets and installation
guides.
"With our needs for multiple output, multilingual, multibrand, multilocalization and
multicustomers, we couldn't stay working at the document level," says Mark Peterson, manager of
Kohler Plumbing's North America technical communications department. "We had to move to working at
an object level." Moving to the object level has meant moving to XML editing.
Peterson has a team of 12, including writers, illustrators (many of the smaller documents are 50
percent illustration), translators and layout specialists. Many documents find form in print, CD and
online, as well as in support of catalog production. Peterson admits that the move brought some
"culture shock," a condition that a full-time, specialized worker may have to adjust to, but the
average business user may not.
According to Tony White, senior director of product marketing at Broadvision, Redwood City, CA,
the management system should be transparent to business users. "They should be able to create and
modify content using tools such as Microsoft Word or Web-based forms, without needing to know their
work is being submitted," White says.
As detailed in "CM System Editing Tools and Integrations,"
content management vendors
have worked hard to provide easy-to-use tools for business users. For the wider enterprise, it is
the range of options that counts. Not every user will be productive with the same kind of interface,
and not every organization will have the same kind of content. Try to match your content and users
with the most appropriate interfaces.
The good news is that as CMSs mature, the challenge of providing a better user experience is
being met, and both organizations and individual users will be the beneficiaries.
Bill Trippe (btrippe@nmpub.com) is president of New
Millennium Publishing (www.nmpub.com), Boston,
a consultancy specializing in electronic publishing, content management, SGML and XML.
|