Network Working Group S. Hollenbeck Internet-Draft VeriSign, Inc. Expires: October 4, 2002 M. Rose Dover Beach Consulting, Inc. L. Masinter Adobe Systems Incorporated April 5, 2002 Guidelines For The Use of XML in IETF Protocols draft-hollenbeck-ietf-xml-guidelines-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 4, 2002. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract The eXtensible Markup Language (XML) is a framework for structuring data. While it evolved from SGML -- a markup language primarily focused on structuring documents -- XML has evolved to be a widely- used mechanism for representing structured data. There are a wide variety of Internet protocols; many have need for a representation for structured data relevant to their application. There has been much interest in the use of XML as a representation Hollenbeck, et al. Expires October 4, 2002 [Page 1] Internet-Draft XML in IETF Protocols April 2002 method. This document describes basic XML concepts, analyzes various alternatives in the use of XML, and provides guidelines for the use of XML within IETF standards-track protocols. Intended Publication Status It is the goal of the authors that this draft (when completed and then approved by the IESG) be published as a Best Current Practice (BCP). Conventions Used In This Document This document recommends, as policy, what specifications for Internet protocols -- and, in particular, IETF standards track protocol documents -- should include as normative language within them. The keywords "SHOULD", "MUST", "MAY", etc. are used in the sense of how they would be used within other documents with the meanings as specified in RFC 2119 [1]. Discussion Venue The authors welcome discussion and comments relating to the topics presented in this document. Though direct comments to the authors are welcome, public discussion is taking place on the "ietf-xml-use" mailing list. To join the list, send a message to "ietf-xml-use- request@imc.org" with the word "subscribe" in the body of the message. There is a web site for the list archives at http:// www.imc.org/ietf-xml-use/. Hollenbeck, et al. Expires October 4, 2002 [Page 2] Internet-Draft XML in IETF Protocols April 2002 Table of Contents 1. Introduction and Overview . . . . . . . . . . . . . . . . . . 4 1.1 Intended Audience . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 XML Evolution . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 XML Users, Support Groups, and Additional Information . . . . 5 2. XML Selection Considerations . . . . . . . . . . . . . . . . . 6 3. XML Alternatives . . . . . . . . . . . . . . . . . . . . . . . 7 4. XML Use Considerations and Recommendations . . . . . . . . . . 9 4.1 XML Declarations . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 XML Processing Instructions . . . . . . . . . . . . . . . . . 9 4.3 Well-Formedness . . . . . . . . . . . . . . . . . . . . . . . 10 4.4 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.5 Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.6 Element and Attribute Design Considerations . . . . . . . . . 11 4.7 Binary Data . . . . . . . . . . . . . . . . . . . . . . . . . 11 5. Internationalization Considerations . . . . . . . . . . . . . 13 5.1 Character Sets . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2 Language Declaration . . . . . . . . . . . . . . . . . . . . . 13 5.3 Other Considerations . . . . . . . . . . . . . . . . . . . . . 13 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 Normative References . . . . . . . . . . . . . . . . . . . . . 18 Informative References . . . . . . . . . . . . . . . . . . . . 19 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 21 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 22 Hollenbeck, et al. Expires October 4, 2002 [Page 3] Internet-Draft XML in IETF Protocols April 2002 1. Introduction and Overview The eXtensible Markup Language (XML) is a framework for structuring data. While it evolved from the Standard Generalized Markup Language (SGML) [18] -- a markup language primarily focused on structuring documents -- XML has evolved to be a widely-used mechanism for representing structured data in protocol exchanges. See [34] for an introduction to XML. 1.1 Intended Audience Many Internet protocol designers are considering using XML and XML fragments within the context of existing and new Internet protocols. This document is intended as a guide to XML usage and as IETF policy for standards track documents. Experienced XML practitioners will likely already be familiar with the background material here, but the guidelines are intended to be appropriate for those readers as well. 1.2 Scope This document is intended to give guidelines for the use of XML content within a larger protocol. There are a number of protocol frameworks already in use or under development which focus entirely on "XML protocol": the exclusive use of XML as the data representation in the protocol. For example, the World Wide Web Consortium (W3C) is developing an XML Protocol framework [31] based on the Simple Object Access Protocol (SOAP) [35]. The applicability of those protocols is not part of the scope of this document. In addition, there are higher-level representation frameworks, based on XML, that have been designed as carriers of certain classes of information; for example, the Resource Description Framework (RDF) [30] is an XML-based representation for logical assertions. This document does not provide guidelines for the use of such frameworks. 1.3 XML Evolution Originally published in February 1998 [29], XML's popularity has led to several additions to the base specification. Although these additions are designed to be consistent with version 1.0 of XML, they have varying levels of stability, consensus, and implementation. Accordingly, this document identifies the major evolutionary features of XML and makes suggestions as to the circumstances in which each feature should be used. Hollenbeck, et al. Expires October 4, 2002 [Page 4] Internet-Draft XML in IETF Protocols April 2002 1.4 XML Users, Support Groups, and Additional Information There are many XML support groups, some devoted to the entire XML industry (e.g., http://xml.org/), some devoted to developers (http:// xmlhack.com/), some devoted to the business applications of XML (e.g., http://oasis-open.org/), and many, many groups devoted to the use of XML in a particular context. It is beyond the scope of this document to provide a comprehensive list of referrals. Interested readers are directed to the three links above as starting points, as well as their favorite Internet search engine. (TBD: pointers to other best practice and design guidelines, such as http://www.xfront.com/BestPracticesHomepage.html and http:// www.goland.org/xmlschema.htm) Hollenbeck, et al. Expires October 4, 2002 [Page 5] Internet-Draft XML in IETF Protocols April 2002 2. XML Selection Considerations XML is a tool that provides a means towards an end. Choosing the right tool for a given task is an essential part of ensuring that the task can be completed in a satisfactory manner. This section describes factors to be aware of when considering XML as a tool for use in IETF protocols: o XML is a meta-markup language that can be used to define markup languages for specific domains and problem spaces. o XML provides both logical structure and physical structure to describe data. Data framing is built-in. o XML includes features to support internationalization and localization. o XML is extensible. New tags (and thus new protocol elements) can be defined without requiring changes to XML itself. o XML is still evolving. The formal specifications are still being influenced and updated as use experience is gained and applied. o XML is text-based, so XML fragments are easily created, edited, and managed using common utilities. Further, being text-based means it more readily supports incremental development, debugging, and logging. o XML is verbose when compared with many other data encapsulation languages. A representation with element extensibility and human readability typically requires more bits when compared to one optimized for efficient machine processing. o XML implementations are still relatively new. As designers and implementers gain experience, it is not uncommon to find defects in early and current products. o XML support is available in a large number of software development utilities, available in both open source and proprietary products. o XML processing speed can be an issue in some environments. XML processing can be slower because XML data streams may be larger than other representations, and the use of general purpose XML parsers will add a software layer with its own performance costs. Hollenbeck, et al. Expires October 4, 2002 [Page 6] Internet-Draft XML in IETF Protocols April 2002 3. XML Alternatives This document focuses on guidelines for the use of XML, but it's useful to consider why one would use XML as opposed to some other mechanism. This section considers some other commonly used representation mechanisms and compares XML to those alternatives. For example, Abstract Syntax Notation 1 (ASN.1) [16] along with the corresponding Basic Encoding Rules (BER) [17] are part of the OSI communication protocol suite, and have been used in many subsequent communications standards (e.g., the ANSI Information Retrieval protocol [15] and the Simple Network Management Protocol (SNMP) [21]). The eXternal Data Representation (XDR) [22] and variations of it have been used in many other distributed network applications (e.g., the Network File System protocol [28]). With ASN.1, data types are explicit in the representation, while with XDR, the data types of components are described externally as part of an interface specification. Many other protocols use data structures directly (without data encapsulation) by describing the data structure with Backus Normal Form (BNF) [13]; many IETF protocols use an Augmented Backus-Naur Form (ABNF) [24]. The Simple Mail Transfer Protocol [27] is an example of a protocol specified using ABNF. Representation methods differ from XML in several important ways: Specification encoding: XML schema are themselves represented in XML, and the specification itself can be written using arbitrary characters from the language. The specification of representations in other systems (ASN.1, XDR, ABNF) are generally in ASCII [14] text. Text Encoding and character sets: the character encoding used to represent a formal specification. XML defines a consistent character model based on ISO 10646 [19], with a base that supports at least UTF-8 [4] and UTF-16 [26], and allows for other encodings. While ASN.1 and XDR may carry strings in any encoding, there is no common mechanism for defining character encodings within them. Typically, ABNF definitions tend to be defined in terms of octets or characters in ASCII. Data Encoding: XML is based on a character model. XML Schema [11] includes mechanisms for representing some datatypes (integer, date, array, etc.) but other binary datatypes are encoded in Base64 [23]. ASN.1 and XDR have rich mechanisms for encoding a wide variety of datatypes. Extensibility: XML has a rich extensibility model: XML Hollenbeck, et al. Expires October 4, 2002 [Page 7] Internet-Draft XML in IETF Protocols April 2002 representations can frequently be versioned independently. Many XML representations can be extended by adding tokens to the XML namespace (if done compatibly); other extensions can be added by adding to the namespace. ASN.1 is similarly extensible through the use of Object Identifiers (OIDs). XDR representations tend to not be independently extensible by different parties because the framing and datatypes are implicit and not self-describing. The extensibility of BNF-based protocol elements needs to be explicitly planned. Legibility of protocol elements: As noted above, XML is text-based, and thus carries the advantages (and disadvantages) of text-based protocol elements. Typically this is shared with (A)BNF-defined protocol elements. ASN.1 and XDR use binary encodings which are not visible. ASN.1, XDR, and BNF are described here as examples of alternatives to XML for use in IETF protocols. There are other alternatives, but a complete enumeration of all possible alternatives is beyond the scope of this document. Hollenbeck, et al. Expires October 4, 2002 [Page 8] Internet-Draft XML in IETF Protocols April 2002 4. XML Use Considerations and Recommendations This section notes several aspects of XML and makes recommendations for use. Since the 1998 publication of XML version 1 [29], an editorial second edition [8] was published in 2000; this section refers to the second edition. 4.1 XML Declarations An XML declaration (defined in section 2.8 of [8]) is a small header at the beginning of an XML data stream that indicates the XML version and the character encoding used. For example, specifies the use of XML version 1 and UTF-8 character encoding. Protocol specifications must be clear about use of XML declarations. In some cases, the XML used is a small fragment in a larger context, where the XML version and character encoding are specified externally. In those cases, the XML declaration might add extra overhead. In other cases, the XML is a larger component which may find its way alone as an external entity body, transported as a MIME message. In those cases, the XML declaration is an important marker and useful for reliability and extensibility. In general, an XML protocol element should either disallow XML declarations ("MUST NOT be used") or require one ("MUST have"). A design which allows but does not require an XML declaration leads to unreliable implementations. When in doubt, require an XML declaration. 4.2 XML Processing Instructions An XML processing instruction (defined in section 2.6 of [8]) is a component of an XML document that signals extra "out of band" information to the receiver; a common use of XML processing instructions are for document applications. For example, the XML2RFC application used to generate this document and described in [25] supports a "table of contents" processing instruction: Again, protocol specifications must be clear about whether -- and if so, what kind of -- XML processing instructions are allowed. However, XML processing instructions appear to have rare applicability to XML fragments embedded in Internet protocols, and it is recommended that their use be explicitly disallowed ("MUST NOT use"). In cases where XML processing instructions are allowed, the nature of the allowable processing instructions should be specified Hollenbeck, et al. Expires October 4, 2002 [Page 9] Internet-Draft XML in IETF Protocols April 2002 explicitly. 4.3 Well-Formedness A well-formed XML instance is one in which all character and markup data conforms to a specific set of structural rules defined in section 2.1 of [8]. An XML instance that is not well-formed is not really XML; well- formedness is the basis for syntactic compatibility with XML. Without well-formedness, most of the advantages of using XML disappear. For this reason, it is imperative that protocol specifications REQUIRE that XML instances be well-formed. 4.4 Validity Beyond well-formedness there are additional mechanisms that define a set of structural and data format constraints. Two mechanisms are commonly used to define grammars for classes of XML documents: Document Type Definition (DTD) (defined in section 2.8 of [8]) and XML Schema (defined in [10] and [11]). DTDs are an older technology that has been found to have drawbacks, particularly in the features provided for extensibility and data typing. XML Schema was designed to address many DTD shortcomings. For example, with a DTD a validating parser can confirm that an element contains character data, but with XML Schema a validating parser can also confirm that the value of an element matches a particular regular expression. XML Schema provides powerful features to define a complete and precise specification of allowable protocol syntax and data type definitions. In order to obtain the advantages of XML as a data structure specification system, protocol specifications should supply an XML Schema and insist that XML instances MUST be valid according to that schema. 4.5 Namespaces XML namespaces, defined in [9], provide a means of assigning markup to a specific vocabulary. If two elements or attributes from different vocabularies have the same name, they can be distinguished unambiguously if they belong to different namespaces. Additionally, namespaces provide significant support for protocol extensibility as they can be defined, reused, and processed dynamically. Markup vocabulary collisions are very possible when namespaces are not used to separate and uniquely identify vocabularies. Protocol Hollenbeck, et al. Expires October 4, 2002 [Page 10] Internet-Draft XML in IETF Protocols April 2002 definitions should use existing XML namespaces where appropriate. When new namespaces are needed, the namespace name (a URI) should be defined within the RFC itself, and the IETF URN namespace described in [20] should be used to designate the namespace; for example: abc:xmlns="urn:ietf:params:xml:ns:abc" 4.6 Element and Attribute Design Considerations XML provides much flexibility in allowing a designer to use either elements or element attributes to carry data. Element attributes are generally intended to contain meta-data that describes the value of the element, and as such they are subject to the following restrictions: o Attributes are unordered, and o Attribute values can only contain simple XML data types. Consider the following example that describes an IP address using a "type" attribute to describe the address value:
10.1.2.3 XML allows the same information to be encapsulated using a