|
July 2000
e.docs:
Learning the Basics of XML
A short tutorial for XML novices.
By Penny Lunt
You hear lots of hype about XML and with good reason. XML (extensible markup language) is the standard for presenting data over the Web and wireless devices as well as for data integration. Its useful for publishing to the Web because it separates data from presentation, letting you repurpose the same content from a business document to a browser to a personal digital assistant to a cell phone. Its useful for data integration because two applications that output XML have only to agree on a Document Type Definition or schema to seamlessly pass information back and forth. Despite all its power and importance, many people are unfamiliar with the basics of XML. Responding to this need, David Booth, senior research architect at Bluestone Software (www.bluestone.com), has developed a seminar covering the fundamentals outlined below.
XML is a data format language that is actually a simpler, easier-to-use subset of Standard Generalized Markup Language (SGML). The XML structure describes a tree of text (or other data), and it describes content only it has no concern for presentation. Style sheets can be applied to an XML document to provide the layout and font types and styles.
Elements are the most common form of XML markup. They contain a start tag and an end tag delimited by brackets (<>) with data between the tags:<myTag>...</myTag>. The tags identify the nature of the content they surround, although they can also be empty and act as placeholders in the document. Elements are not predefined thats what makes the language extensible. Theyre usually self-describing. A phone number, for example, would get a logical tag such as <phonenum>. Furthermore, elements may be nested inside other elements in a tree structure.
Resource Locator
XML-related news and resources can be found at www.xml.com.
Information about the World Wide Web Consortiums XML standard can be found at www.w3.org/TR/REC-xml
Interdoc provides conferences, training and consulting on XML, content management and document management. Check out their schedule at www.interdoc.ca.
IBMs developer Web site, www.ibm.com/developer/xml/ offers free online XML courses and articles. IBM also offers an XML tutorial at www-4.ibm.com/software/developer/ education/xmlintro/xmlintro.html
David Booth, senior research architect at Bluestone Software (www.bluestone.com), recommends the following books on XML and DTDs: XML Bible, by Elliotte Rusty Harold; JustXML, by John E. Simpson; The XML Handbook, by Charles F. Goldfarb and Paul Prescod; and XML by Example, by Sean McGrath.
Info Byte
By 2002, escalating costs of managing Web content and components will drive more than 80 percent of Global 2000 enterprise sites to purchase packages or build applications to automate these functions, according to GartnerGroup.
The Web content management software market will be $2.5 billion in 2003, according to Gartner.
|
Well-formed XML obeys basic syntactic rules, three of which are: 1. The structure must be properly nested in a tree. 2. Every start tag, e.g. <myTag>, must have a corresponding end tag, e.g. </myTag>. 3. There can be no illegal characters. You can check the validity of an XML document youve written by opening it in a parser, of which there are many. Microsoft Internet Explorer 5.0, for example, has a built-in XML parser. If you open a non-well-formed XML document in IE 5.0, you will get an error message.
Heres an example of a short, well-formed XML document:
<?xml version=1.0?> <author_list> <author> <au_id>172-32-1176</au_id> <au_lname>White</au_lname> <au_fname>Betty</au_fname> </author> <author> <au_id>213-46-8915</au_id> <au_lname>Green</au_lname> <au_fname>Marjorie</au_fname> </author> </author_list>
The first line above <?xml verson= 1.0?> refers to the fact that this document is written in XML version 1.0. All XML is currently considered version 1.0, but the tag acknowledges that there may be future versions of XML.
A Document Type Definition (DTD) defines the structure and tags used in a particular kind of XMLdocument. The DTD, if used, is declared at the beginning of the document. The example above does not use a DTD. By agreeing on DTDs, companies can ensure that theyre speaking the same language when they exchange data.
Because of some technical shortcomings of DTDs, the World Wide Web Consortium is working on an alternative called XML Schema Language. This language will also define the structure and tags of a document, but in addition provide stronger data typing and better reusability than DTDs. If they do this in a reasonable timeframe, schemas will replace DTDs, Booth predicts.
|
|