Intelligent Enterprise featuring Transform
START NEWS & ANALYSIS OPINION CHANNELS PRODUCT GUIDES REVIEWS TECHWEBCASTS
CONTACTS ARCHIVES ADVANCED SEARCH

July 2000

e.docs:

Learning the Basics of XML

A short tutorial for XML novices.

By Penny Lunt

You hear lots of hype about XML and with good reason. XML (extensible markup language) is the standard for presenting data over the Web and wireless devices as well as for data integration. It’s useful for publishing to the Web because it separates data from presentation, letting you “repurpose” the same content from a business document to a browser to a personal digital assistant to a cell phone. It’s useful for data integration because two applications that output XML have only to agree on a Document Type Definition or schema to seamlessly pass information back and forth.

Despite all its power and importance, many people are unfamiliar with the basics of XML. Responding to this need, David Booth, senior research architect at Bluestone Software (www.bluestone.com), has developed a seminar covering the fundamentals outlined below.

XML is a data format language that is actually a simpler, easier-to-use subset of Standard Generalized Markup Language (SGML). The XML structure describes a “tree” of text (or other data), and it describes content only — it has no concern for presentation. Style sheets can be applied to an XML document to provide the layout and font types and styles.

Elements are the most common form of XML markup. They contain a start tag and an end tag delimited by brackets (<>) with data between the tags:<myTag>...</myTag>. The tags identify the nature of the content they surround, although they can also be empty and act as placeholders in the document. Elements are not predefined — that’s what makes the language extensible. They’re usually self-describing. A phone number, for example, would get a logical tag such as <phonenum>. Furthermore, elements may be nested inside other elements in a tree structure.

Resource Locator

• XML-related news and resources can be found at www.xml.com.

• Information about the World Wide Web Consortium’s XML standard can be found at www.w3.org/TR/REC-xml

• Interdoc provides conferences, training and consulting on XML, content management and document management. Check out their schedule at www.interdoc.ca.

• IBM’s developer Web site, www.ibm.com/developer/xml/ offers free online XML courses and articles. IBM also offers an XML tutorial at www-4.ibm.com/software/developer/
education/xmlintro/xmlintro.html

• David Booth, senior research architect at Bluestone Software (www.bluestone.com), recommends the following books on XML and DTDs: XML Bible, by Elliotte Rusty Harold; JustXML, by John E. Simpson; The XML Handbook, by Charles F. Goldfarb and Paul Prescod; and XML by Example, by Sean McGrath.

Info Byte

By 2002, escalating costs of managing Web content and components will drive more than 80 percent of Global 2000 enterprise sites to purchase packages or build applications to automate these functions, according to GartnerGroup.

The Web content management software market will be $2.5 billion in 2003, according to Gartner.

Well-formed XML obeys basic syntactic rules, three of which are:
1. The structure must be properly nested in a tree.
2. Every start tag, e.g. <myTag>, must have a corresponding end tag, e.g. </myTag>.
3. There can be no illegal characters. You can check the validity of an XML document you’ve written by opening it in a parser, of which there are many. Microsoft Internet Explorer 5.0, for example, has a built-in XML parser. If you open a non-well-formed XML document in IE 5.0, you will get an error message.

Here’s an example of a short, well-formed XML document:

<?xml version=”1.0”?>

<author_list>

<author>

<au_id>172-32-1176</au_id>

<au_lname>White</au_lname>

<au_fname>Betty</au_fname>

</author>

<author>

<au_id>213-46-8915</au_id>

<au_lname>Green</au_lname>

<au_fname>Marjorie</au_fname>

</author>

</author_list>

The first line above “<?xml verson= “1.0”?>” refers to the fact that this document is written in XML version 1.0. All XML is currently considered version 1.0, but the tag acknowledges that there may be future versions of XML.

A Document Type Definition (DTD) defines the structure and tags used in a particular kind of XMLdocument. The DTD, if used, is declared at the beginning of the document. The example above does not use a DTD. By agreeing on DTDs, companies can ensure that they’re speaking the same language when they exchange data.

Because of some technical shortcomings of DTDs, the World Wide Web Consortium is working on an alternative called XML Schema Language. This language will also define the structure and tags of a document, but in addition provide stronger data typing and better reusability than DTDs. “If they do this in a reasonable timeframe, schemas will replace DTDs,” Booth predicts.




Channels
Business Process Management
Content Storage
Content Management
Compliance
Enterprise Solutions
Document Scanning & Capture
Content Delivery & Publishing
Collaboration & Knowledge Management
Search and Classification
Locate an article from our print magazine. Just enter your Locator ID Number below.
ID#


NEWS FROM THE PIPELINE

OpenOffice.org 2.0 Closes On Final

New Study Finds Steep Growth For Smartphones

PalmSource Sale Cleared By Federal Agency

CTIA Panel Examines Enterprise Security Risks

[more]






HOME | ARCHIVE | REALWARE AWARDS

A Publication of the Network Computing Enterprise Architecture Group
Brought to you by CMP Media LLC, Copyright © 2005
Privacy Statement | Your California Privacy Rights | Terms Of Service