Network Working Group S. Hollenbeck
Internet-Draft VeriSign, Inc.
Expires: December 3, 2002 M. Rose
Dover Beach Consulting, Inc.
L. Masinter
Adobe Systems Incorporated
June 4, 2002
Guidelines for the Use of XML within IETF Protocols
draft-hollenbeck-ietf-xml-guidelines-04.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://
www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 3, 2002.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
The Extensible Markup Language (XML) is a framework for structuring
data. While it evolved from SGML -- a markup language primarily
focused on structuring documents -- XML has evolved to be a widely-
used mechanism for representing structured data.
There are a wide variety of Internet protocols being developed; many
have need for a representation for structured data relevant to their
application. There has been much interest in the use of XML as a
Hollenbeck, et al. Expires December 3, 2002 [Page 1]
Internet-Draft XML Within IETF Protocols June 2002
representation method. This document describes basic XML concepts,
analyzes various alternatives in the use of XML, and provides
guidelines for the use of XML within IETF standards-track protocols.
Intended Publication Status
It is the goal of the authors that this draft (when completed and
then approved by the IESG) be published as a Best Current Practice
(BCP).
Conventions Used In This Document
This document recommends, as policy, what specifications for Internet
protocols -- and, in particular, IETF standards track protocol
documents -- should include as normative language within them. The
capitalized keywords "SHOULD", "MUST", "REQUIRED", etc. are used in
the sense of how they would be used within other documents with the
meanings as specified in RFC 2119 [1].
Discussion Venue
The authors welcome discussion and comments relating to the topics
presented in this document. Though direct comments to the authors
are welcome, public discussion is taking place on the "ietf-xml-
use@imc.org" mailing list. To join the list, send a message to
"ietf-xml-use-request@imc.org" with the word "subscribe" in the body
of the message. List archives [49] are available on the World Wide
Web.
Hollenbeck, et al. Expires December 3, 2002 [Page 2]
Internet-Draft XML Within IETF Protocols June 2002
Table of Contents
1. Introduction and Overview . . . . . . . . . . . . . . . . . 4
1.1 Intended Audience . . . . . . . . . . . . . . . . . . . . . 4
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 XML Evolution . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 XML Users, Support Groups, and Additional Information . . . 5
2. XML Selection Considerations . . . . . . . . . . . . . . . . 6
3. XML Alternatives . . . . . . . . . . . . . . . . . . . . . . 8
4. XML Use Considerations and Recommendations . . . . . . . . . 10
4.1 XML Declarations . . . . . . . . . . . . . . . . . . . . . . 10
4.2 XML Processing Instructions . . . . . . . . . . . . . . . . 10
4.3 XML Comments . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 XML Syntax, Information Set, and Alternative Syntaxes . . . 11
4.5 Well-Formedness . . . . . . . . . . . . . . . . . . . . . . 11
4.6 Validity and Extensibility . . . . . . . . . . . . . . . . . 12
4.7 Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.7.1 Namespaces and Attributes . . . . . . . . . . . . . . . . . 14
4.8 Element and Attribute Design Considerations . . . . . . . . 15
4.9 Binary Data and Text with Control Characters . . . . . . . . 17
4.10 Incremental Processing . . . . . . . . . . . . . . . . . . . 17
4.11 Entity Declarations . . . . . . . . . . . . . . . . . . . . 17
4.12 URI Processing in XML . . . . . . . . . . . . . . . . . . . 18
4.13 Interaction with the IANA . . . . . . . . . . . . . . . . . 18
5. Internationalization Considerations . . . . . . . . . . . . 20
5.1 Character Sets and Encodings . . . . . . . . . . . . . . . . 20
5.2 Language Declaration . . . . . . . . . . . . . . . . . . . . 20
5.3 Other Internationalization Considerations . . . . . . . . . 21
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . 22
7. Security Considerations . . . . . . . . . . . . . . . . . . 23
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24
Normative References . . . . . . . . . . . . . . . . . . . . 25
Informative References . . . . . . . . . . . . . . . . . . . 26
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 29
A. Appendix A: Change History . . . . . . . . . . . . . . . . . 30
Full Copyright Statement . . . . . . . . . . . . . . . . . . 33
Hollenbeck, et al. Expires December 3, 2002 [Page 3]
Internet-Draft XML Within IETF Protocols June 2002
1. Introduction and Overview
The Extensible Markup Language (XML, [8]) is a framework for
structuring data. While it evolved from the Standard Generalized
Markup Language (SGML, [32]) -- a markup language primarily focused
on structuring documents -- XML has evolved to be a widely-used
mechanism for representing structured data in protocol exchanges.
See "XML in 10 points" [46] for an introduction to XML.
1.1 Intended Audience
Many Internet protocol designers are considering using XML and XML
fragments within the context of existing and new Internet protocols.
This document is intended as a guide to XML usage and as IETF policy
for standards track documents. Experienced XML practitioners will
likely already be familiar with the background material here, but the
guidelines are intended to be appropriate for those readers as well.
1.2 Scope
This document is intended to give guidelines for the use of XML
content within a larger protocol. The goal is not to suggest that
XML is the "best" or "preferred" way to represent data; rather, the
goal is to lay out the context for the use of XML within a protocol
once other factors point to XML as a possible data representation
solution. The Common Name Resolution Protocol (CNRP, [13]) is an
example of a protocol that would be addressed by these guidelines if
it were being newly defined. This document does not address sending
XML as a document over MIME and MIME-like protocols such as SMTP or
HTTP.
There are a number of protocol frameworks already in use or under
development which focus entirely on "XML protocol" -- the exclusive
use of XML as the data representation in the protocol. For example,
the World Wide Web Consortium (W3C) is developing an XML Protocol
framework based on SOAP ([43] and [44]). The applicability of such
protocols is not part of the scope of this document.
In addition, there are higher-level representation frameworks, based
on XML, that have been designed as carriers of certain classes of
information; for example, the Resource Description Framework (RDF,
[37]) is an XML-based representation for logical assertions. This
document does not provide guidelines for the use of such frameworks.
1.3 XML Evolution
XML 1.0 was originally published as a W3C recommendation in February
1998 [36], and was revised in a 2nd edition [8] in October 2000.
Hollenbeck, et al. Expires December 3, 2002 [Page 4]
Internet-Draft XML Within IETF Protocols June 2002
Several additional facilities have also been defined that layer on
the base specification. Although these additions are designed to be
consistent with XML 1.0, they have varying levels of stability,
consensus, and implementation. Accordingly, this document identifies
the major evolutionary features of XML and makes suggestions as to
the circumstances in which each feature should be used.
1.4 XML Users, Support Groups, and Additional Information
There are many XML support groups, with some devoted to the entire
XML industry [50], some devoted to developers [51], some devoted to
the business applications of XML [52], and many, many groups devoted
to the use of XML in a particular context.
It is beyond the scope of this document to provide a comprehensive
list of referrals. Interested readers are directed to the three
references above as starting points, as well as their favorite
Internet search engine.
Hollenbeck, et al. Expires December 3, 2002 [Page 5]
Internet-Draft XML Within IETF Protocols June 2002
2. XML Selection Considerations
XML is a tool that provides a means towards an end. Choosing the
right tool for a given task is an essential part of ensuring that the
task can be completed in a satisfactory manner. This section
describes factors to be aware of when considering XML as a tool for
use in IETF protocols:
o XML is a meta-markup language that can be used to define markup
languages for specific domains and problem spaces.
o XML provides both logical structure and physical structure to
describe data. Data framing is built-in.
o XML instances can be validated against the formal definition of a
protocol specification.
o XML supports internationalization.
o XML is extensible. Unlike some other markup languages (such as
HTML), new tags (and thus new protocol elements) can be defined
without requiring changes to XML itself.
o XML is still evolving. The formal specifications are still being
influenced and updated as use experience is gained and applied.
o XML does not provide native mechanisms to support detailed data
typing. Additional mechanisms are required to specify abstract
protocol data types.
o XML is text-based, so XML fragments are easily created, edited,
and managed using common utilities. Further, being text-based
means it more readily supports incremental development, debugging,
and logging. A simple "canned" XML fragment can be embedded
within a program as a string constant, rather than having to be
constructed.
o Binary data has to be encoded into a text-based form to be
represented in XML.
o XML is verbose when compared with many other structured data
representation languages. A representation with element
extensibility and human readability typically requires more bits
when compared to one optimized for efficient machine processing.
o XML implementations are still relatively new. As designers and
implementers gain experience, it is not uncommon to find defects
in early and current products.
Hollenbeck, et al. Expires December 3, 2002 [Page 6]
Internet-Draft XML Within IETF Protocols June 2002
o XML support is available in a large number of software development
utilities, available in both open source and proprietary products.
o XML processing speed can be an issue in some environments. XML
processing can be slower because XML data streams may be larger
than other representations, and the use of general purpose XML
parsers will add a software layer with its own performance costs
(though these costs can be reduced through consistent use of an
optimized parser). In some situations, processing XML requires
examining every byte of the entire XML data stream, with higher
overhead than with representations where uninteresting segments
can be skipped.
Hollenbeck, et al. Expires December 3, 2002 [Page 7]
Internet-Draft XML Within IETF Protocols June 2002
3. XML Alternatives
This document focuses on guidelines for the use of XML. It is useful
to consider why one might use XML as opposed to some other mechanism.
This section considers some other commonly used representation
mechanisms and compares XML to those alternatives.
For many fundamental protocols, the extensibility requirements are
modest, and the performance requirements are high enough that fixed
binary data blocks are the appropriate representation; mechanisms
such as XML merely add bloat. RFC 3252 [26] describes a humorous
example of XML as protocol bloat.
In addition, there are other representation and extensibility
frameworks that have been used successfully within communication
protocols. For example, Abstract Syntax Notation 1 (ASN.1) [30]
along with the corresponding Basic Encoding Rules (BER, [31]) are
part of the OSI communication protocol suite, and have been used in
many subsequent communications standards (e.g., the ANSI Information
Retrieval protocol [29] and the Simple Network Management Protocol
(SNMP, [16]). The External Data Representation (XDR, [17]) and
variations of it have been used in many other distributed network
applications (e.g., the Network File System (NFS) protocol [25]).
With ASN.1, data types are explicit in the representation, while with
XDR, the data types of components are described externally as part of
an interface specification.
Many other protocols use data structures directly (without data
encapsulation) by describing the data structure with Backus Normal
Form (BNF, [27]); many IETF protocols use an Augmented Backus-Naur
Form (ABNF, [19]). The Simple Mail Transfer Protocol (SMTP, [24]) is
an example of a protocol specified using ABNF.
ASN.1, XDR, and BNF are described here as examples of alternatives to
XML for use in IETF protocols. There are other alternatives, but a
complete enumeration of all possible alternatives is beyond the scope
of this document.
Representation methods differ from XML in several important ways:
Specification encoding: XML schemas (defined in [11] and [12]) are
themselves represented in XML. The specification of representations
in other systems (ASN.1, XDR, ABNF) is generally in ASCII [28] text.
Text Encoding and character sets: the character encoding used to
represent a formal specification. XML defines a consistent character
model based on ISO 10646 [33], and requires that XML parsers accept
at least UTF-8 [4] and UTF-16 [23], and allows for other encodings.
Hollenbeck, et al. Expires December 3, 2002 [Page 8]
Internet-Draft XML Within IETF Protocols June 2002
While ASN.1 and XDR may carry strings in any encoding, there is no
common mechanism for defining character encodings within them.
Typically, ABNF definitions tend to be defined in terms of octets or
characters in ASCII.
Data Encoding: XML is defined as a sequence of characters, rather
than a sequence bytes. XML Schema [12] includes mechanisms for
representing some data types (integer, date, array, etc.) but many
binary data types are encoded in Base64 [18] or hexadecimal. ASN.1
and XDR have rich mechanisms for encoding a wide variety of data
types.
Extensibility: XML has a rich extensibility model such that XML
specifications can frequently be versioned independently.
Specifications can be extended by adding new element names and
attributes (if done compatibly); other extensions can be added by
defining new XML namespaces [9], though there is no standard
mechanism in XML to indicating whether or not new extensions are
mandatory to recognize. ASN.1 is similarly extensible through the
use of Object Identifiers (OIDs). XDR specifications tend to not be
independently extensible by different parties because the framing and
data types are implicit and not self-describing. The extensibility
of BNF-based protocol elements needs to be explicitly planned.
Legibility of protocol elements: As noted above, XML is text-based,
and thus carries the advantages (and disadvantages) of text-based
protocol elements. Typically this is shared with (A)BNF-defined
protocol elements. ASN.1 and XDR use binary encodings which are not
visible.
Hollenbeck, et al. Expires December 3, 2002 [Page 9]
Internet-Draft XML Within IETF Protocols June 2002
4. XML Use Considerations and Recommendations
This section notes several aspects of XML and makes recommendations
for use. Since the 1998 publication of XML version 1 [36], an
editorial second edition [8] was published in 2000; this section
refers to the second edition.
4.1 XML Declarations
An XML declaration (defined in section 2.8 of [8]) is a small header
at the beginning of an XML data stream that indicates the XML version
and the character encoding used. For example,
specifies the use of XML version 1 and UTF-8 character encoding.
In some uses of XML as an embedded protocol element, the XML used is
a small fragment in a larger context, where the XML version is fixed
at "1.0" and the character encoding is known to be "UTF-8". In those
cases, an XML declaration might add extra overhead. In cases where
the XML is a larger component which may find its way alone as an
external entity body (transported as a MIME message, for example),
the XML declaration is an important marker and is useful for
reliability and extensibility. The XML declaration is also an
important marker for character set/encoding (see Section 5.1), if any
encoding other than UTF-8 is allowed.
Protocol specifications must be clear about use of XML declarations.
XML [8] notes that "XML documents should begin with an XML
declaration which specifies the version of XML being used." In
general, an XML declaration should be encouraged ("SHOULD be
present") and must always be allowed ("MAY be sent"). An XML
declaration should be required in cases where, if allowed, the
character encoding is anything other than UTF-8.
4.2 XML Processing Instructions
An XML processing instruction (defined in section 2.6 of [8]) is a
component of an XML document that signals extra "out of band"
information to the receiver; a common use of XML processing
instructions are for document applications. For example, the XML2RFC
application used to generate this document and described in RFC 2629
[22] supports a "table of contents" processing instruction:
Processing instructions can be ignored by processors because they are
Hollenbeck, et al. Expires December 3, 2002 [Page 10]
Internet-Draft XML Within IETF Protocols June 2002
not part of a document's character data. As a consequence, it is
recommended that processing instructions be ignored when encountered
in normal protocol processing. It is thus also recommended that
processing instructions not be used to define normative protocol data
structures or extensions.
4.3 XML Comments
An XML comment (defined in section 2.5 of [8]) is a component of an
XML document that provides descriptive information that is not part
of the document's character data. XML comments, like comments used
in programming languages, are often used to provide explanatory
information in human-understandable terms. An example:
XML comments are ignored by conformant processors. As a consequence,
it is strongly recommended that comments not be used to define
normative protocol data structures or extensions. It is thus also
strongly recommended that comments be ignored if encountered in
normal protocol processing.
4.4 XML Syntax, Information Set, and Alternative Syntaxes
XML [8] is defined in terms of a concrete syntax: a sequence of
characters, using the characters "<", "=", "&", etc. as delimiters.
However, there is also a specification for the abstract specification
for the "Information Set" [40] of a well-formed XML document that
conforms to the XML namespace [9] recommendation. One might think of
an XML parser as consuming the concrete syntax and producing an XML
Information Set for further processing.
In some cases, higher-level protocols have been defined using the XML
Information Set of the XML items transferred, rather than in terms of
the current XML concrete syntax. Other concrete syntax
representations can be defined for the XML Information Set, but this
is not common. Since the context of XML embedded within other
Internet protocols requires an unambiguous definition of the concrete
syntax, defining a protocol element in terms of its XML Information
Set, or allowing other concrete syntax representations, is out of
scope for this document.
4.5 Well-Formedness
An instance is XML if and only if it is well-formed, i.e. all
character and markup data conforms to a specific set of structural
rules defined in section 2.1 of [8].
Hollenbeck, et al. Expires December 3, 2002 [Page 11]
Internet-Draft XML Within IETF Protocols June 2002
Character and markup data that is not well-formed is not XML; well-
formedness is the basis for syntactic compatibility with XML.
Without well-formedness, all of the advantages of using XML
disappear. For this reason, it is recommended that protocol
specifications explicitly require XML well-formedness ("MUST be well-
formed").
The IETF has a long-standing tradition of "be liberal in what you
accept" that might seem to be at odds with this recommendation.
Given that XML requires well-formedness, conformant XML parsers are
intolerant of well-formedness errors. When specifying the handing of
erroneous XML protocol elements, a protocol design must never
recommend attempting to partially interpret non-well-formed instances
of an element which is required to be XML. Reasonable behaviors in
such a scenario could include attempting retransmission or aborting
an in-progress session.
4.6 Validity and Extensibility
One important value of XML is that there are formal mechanisms for
defining structural and data content constraints; these constrain the
identity of elements or attributes or the values contained within
them. There is more than one such formalism:
o A "Document Type Definition" (DTD) is defined in section 2.8 of
[8]; the concept came from a similar mechanism for SGML. There is
significant experience with using DTDs, including in IETF
protocols.
o XML Schema (defined in [11] and [12]) provides additional features
to allow a tighter and more precise specification of allowable
protocol syntax and data type specifications.
o There are also a number of other mechanisms for describing XML
instance validity; these include, for example, Schematron [48],
RELAX NG [47], and the Document Schema Definition Language [34].
There is ongoing discussion (and controversy) within the XML
community on the use and applicability of various validity constraint
mechanisms. The choice of tool depends on the needs for
extensibility or for a formal language and mechanism for constraining
permissible values and validating adherence to the constraints.
There are cases where protocols have defined validity using one or
another validity mechanism, but the protocol definitions have not
insisted that all corresponding protocol elements be "valid". The
decision depends in part on the design for protocol extensibility.
Each formalism has different ways of allowing for future extensions;
Hollenbeck, et al. Expires December 3, 2002 [Page 12]
Internet-Draft XML Within IETF Protocols June 2002
in addition, a protocol design may have its own versioning mechanism,
way of updating the schema, or pointing to a new one. The use of XML
namespaces (Section 4.7) with XML Schema allows other kinds of
extensibility without compromising schema validity.
No matter what formalism is chosen, there are usually additional
syntactic constraints, and inevitably additional semantic
constraints, on the validity of XML elements that cannot be expressed
in the formalism.
This document makes the following recommendations for the definition
of protocols using XML:
o Protocols should use an appropriate formalism for defining
validity of XML protocol elements. XML Schema should be used as
the formalism in the absence of clearly stated reasons to choose
another.
o Protocols may or may not insist that all corresponding protocol
elements be valid, according to the validity mechanism chosen; in
either case, the extensibility design should be clear. What
happens if the data is not valid?
o As described in Section 3 there is no standard mechanism in XML
for indicating whether or not new extensions are mandatory to
recognize. XML-based protocol specifications should thus
explicitly describe extension mechanisms and requirements to
recognize or ignore extensions.
An idealized model for XML processing might first check for well-
formedness; if OK, apply the primary formalism and, if the instances
"passes", apply the other constraints so that the entire set (or as
much as is machine processable) can be checked at the same time.
However, it is reasonable to allow conforming implementations to
avoid doing validation at run-time and rely instead on ad-hoc code to
avoid the higher expense, for example, of schema validation, coupled
with the fact that there will likely be additional hand-crafted
semantic validation.
4.7 Namespaces
XML namespaces, defined in [9], provide a means of assigning markup
to a specific vocabulary. If two elements or attributes from
different vocabularies have the same name, they can be distinguished
unambiguously if they belong to different namespaces. Additionally,
namespaces provide significant support for protocol extensibility as
they can be defined, reused, and processed dynamically.
Hollenbeck, et al. Expires December 3, 2002 [Page 13]
Internet-Draft XML Within IETF Protocols June 2002
Markup vocabulary collisions are very possible when namespaces are
not used to separate and uniquely identify vocabularies. Protocol
definitions should use existing XML namespaces where appropriate.
When a new namespace is needed, the "namespace name" is a URI that is
used to identify the namespace; it's also useful for that URI to
point to a description of the namespace. Typically (and recommended
practice in W3C) is to assign namespace names using persistent http
URIs.
In the case of namespaces in IETF standards-track documents, it would
be useful if there were some permanent part of the IETF's own web
space that could be used for this purpose. In lieu of such, other
permanent URIs can be used, e.g., URNs in the IETF URN namespace (see
[14] and [15]).
4.7.1 Namespaces and Attributes
There is a frequently misunderstood aspect of the relationship
between unprefixed attributes and the default XML namespace - the
natural assumption is that an unprefixed attribute is qualified by
the default namespace, but this is not true. Rather, the unprefixed
attribute belongs to a set of attributes that are defined
specifically for the element to which it is applied. Thus, in the
following:
The meaning of attribute "a" is defined separately for each
attribute. By comparison, the prefixed attribute "n:b" is defined
independently of the element to which it is applied.
Note that an attribute used without a namespace prefix does not adopt
the default namespace; rather, it is interpreted according to the
semantics of the containing element. For more details, see appendix
A.2 of the W3C XML namespace recommendation [9].
One way to deal with this is to use attributes that can be applied to
any element from any namespace with a namespace prefix, even when
that namespace is also the default namespace. Consider the following
example in which new elements can be added into existing elements.
The default behavior is for a recipient to ignore any unrecognized
element; however, a "mustUnderstand" attribute is defined such that
recipients must recognize and understand any new element that has one
with value "true". An instance might look like this:
Hollenbeck, et al. Expires December 3, 2002 [Page 14]
Internet-Draft XML Within IETF Protocols June 2002
foo
bar
In this example the element is a local extension to the
element that has to be understood (indicated by the
attribute tns:mustUnderstand="true") by the recipient of this
information to be used sensibly.
Note that this attribute is defined in the "tns" namespace, rather
than the namespace of the containing element. In terms of XML
namespaces [9], this means that it belongs to a global namespace and
has the same meaning wherever it appears. Further, that meaning is
defined according to the semantics associated with the "tns"
namespace. If the attribute were defined without a namespace prefix,
its meaning would be dependent on the containing element (in this
case, an element from the "local" namespace), which would make it
difficult to ensure that it would always be interpreted according to
the intended semantics.
4.8 Element and Attribute Design Considerations
XML provides much flexibility in allowing a designer to use either
elements, attributes, or element content to carry data. This section
gives a flavor of the design considerations; there is much written
about this in the XML literature. Consistent use of elements,
attributes, and values is an important characteristic of a sound
design.
Attributes are generally intended to contain meta-data that describes
the value of the element, and as such they are subject to the
following restrictions:
o Attributes are unordered,
o There can be no more than one instance of a given attribute within
a given element (although an attribute may contain several values,
separated by [:space:]),
o Attribute values can have no internal XML markup for providing
internal structure, and
Hollenbeck, et al. Expires December 3, 2002 [Page 15]
Internet-Draft XML Within IETF Protocols June 2002
o Attribute values are normalized ([8], section 3.3) before
processing
Consider the following example that describes an IP address using an
attribute to describe the address value:
10.1.2.3
One might encode the same information using an element
instead of an "addrType" attribute:
ipv4
10.1.2.3
Another way of encoding the same information would be to use markup
for the "addrType":
10.1.2.3
Choosing between these designs involves tradeoffs concerning, among
other considerations, the likely extensibility patterns and the
ability of the formalism to constrain the values appropriately. In
the first example, the attribute can be thought of as meta-data to
the element which it modifies, and provides for a kind of "element
extensibility". The third example allows for a different kind of
extensibility: the "ipv4" space can be extended using other
namespaces, and the element can include additional markup.
Many protocols include parameters that are selected from an
enumerated set of values. Such enumerated values can be encoded as
elements, attributes, or strings within element values. Any protocol
design should consider how the set of enumerated values is to be
extended: by revising the protocol, by including different values in
different XML namespaces, or by establishing an IANA registry (as per
RFC 2434 [21]). In addition, a common practice in XML is to use a
URI as an XML attribute value or content.
Languages that describe syntactic validity (including XML Schema and
DTDs) often provide a mechanism for specifying "default" values for
an attribute. If an element does not specify a value for the
attribute, then the "default" value is used. The use of default
values for attributes is discouraged by this document. Although the
use of this feature can reduce both the size and clutter of XML
Hollenbeck, et al. Expires December 3, 2002 [Page 16]
Internet-Draft XML Within IETF Protocols June 2002
documents, it has a negative impact on software which doesn't know
the document's validity constraints (e.g., for packet tracing or
digital signature).
4.9 Binary Data and Text with Control Characters
XML is defined as a character stream rather than a stream of octets.
There is no way to embed raw binary data directly within an XML data
stream; all binary data must be encoded as characters. There are a
number of possible encodings; for example, XML Schema [12] defines
encodings using decimal digits for integers, Base64 [18], or
hexadecimal digits. In addition, binary data might be transmitted
using some other communication channel, and referenced within the XML
data itself using a URI.
Protocols that need a container that can hold both structural data
and large quantities of binary data should consider carefully whether
XML is appropriate, since the Base64 and hex encodings are
inefficient. Otherwise, protocols should use the mechanisms of XML
Schema to represent binary data; the Base64 encoding is best for
larger quantities of data.
XML does not allow "control" characters (0x00-0x1F) except for TAB
(0x09), CR (0x0A), and LF (0x0D). They may not be specified even
using character entity references. There is currently no common way
of encoding them within what is otherwise ordinary text. This means
that strings that might be considered "text" within an ABNF-defined
protocol element may need to be treated as binary data within an XML
representation, or some other encoding mechanism might need to be
invented.
4.10 Incremental Processing
In some situations, it is possible to incrementally process an XML
document as each tag is received; this is analogous to the process by
which browsers incrementally render HTML pages as they are received.
Note that incremental processing is difficult to implement if
interspersed across multiple interactions. In other words, if a
protocol requires incremental processing across both directions of a
bidirectional stream, then it may place significant burden on
protocol implementers.
4.11 Entity Declarations
In addition to its role as a validity mechanism, an XML DTD provides
a facility for "Entity Declarations" ([8], section 4.2). An Entity
Declaration defines, in the DTD, a kind of macro capability where an
"entity reference" may be used to call up and include the content of
Hollenbeck, et al. Expires December 3, 2002 [Page 17]
Internet-Draft XML Within IETF Protocols June 2002
the entity declaration.
This feature adds complexity to XML processing, and seems more
appropriate for use of XML in document processing than in data
representation. As such, this document recommends avoiding entity
declarations in protocol specifications.
4.12 URI Processing in XML
The XML Base specification [41] defines an attribute "xml:base" in
the XML namespace that is intended to affect the "base" to be used
for relative URI processing described in RFC 2396 [20]. The
facilities of xml:base for controlling URI processing may be useful
to protocol designers, but if xml:base is allowed the interaction
with any other protocol facilities for establishing URI context must
be specified clearly.
Note also that, in many cases, the term "URI" and the syntactic use
of URIs within XML allows non-ASCII characters within URIs. For
example, the XML Schema "anyURI" datatype ([12] section 3.2.17)
allows for direct encoding of characters outside of the US-ASCII
range. Most current IETF protocols and specifications do not allow
this syntax. Protocol specifications should be clear about the range
of characters specified, e.g., by adding a restriction to the range
of characters allowed in the anyURI schema datatype, or by specifying
that characters outside the US-ASCII range should be escaped when
passed to older protocols or APIs.
4.13 Interaction with the IANA
When XML is used in an IETF protocol there are multiple factors that
might require IANA action, including:
o XML media types. A piece of XML in a protocol element is
sometimes intrinsically bound to the protocol context in which it
appears, and in particular might be directly derived from and/or
input to protocol state-machine implementations. In cases where
the XML content has no relevant meaning outside it's original
protocol context, there is no reason to register a MIME type.
When it is possible that XML content can be interpreted outside of
its original context (such as when that XML content is being
stored in a file system or tunneled over another protocol), then a
MIME type should be registered to specify the specific format for
the data and to provide a hint as to how it might be processed.
If MIME labeling is needed, then the advice of RFC 3023 [5]
applies. In particular, if the XML represents a new language or
document type, a new MIME media type should be registered for the
Hollenbeck, et al. Expires December 3, 2002 [Page 18]
Internet-Draft XML Within IETF Protocols June 2002
reasons described in RFC 3023 sections 7 and A.1. In situations
where XML is used to encode generic structured data (e.g., a
document-oriented application that involves combining XML with a
stylesheet), "application/xml" might be appropriate ("MAY be
used"). The "text/xml" media type is not recommended ("SHOULD NOT
be used") because of issues involving display behavior and default
charsets.
o URI registration. There is an ongoing effort ([14], [15]) to
create a URN namespace explicitly for defining URIs for namespace
names and other URI-designated protocol elements for use within
IETF standards track documents; it might also establish IETF
policy for such use.
Hollenbeck, et al. Expires December 3, 2002 [Page 19]
Internet-Draft XML Within IETF Protocols June 2002
5. Internationalization Considerations
This section describes internationalization considerations for the
use of XML to represent data in IETF protocols. In addition to the
recommendations here, IETF policy on the use of character sets and
languages described in RFC 2277 [3] also apply.
5.1 Character Sets and Encodings
IETF protocols frequently speak of the "character set" or "charset"
of a string, which is used to denote both the character repertoire
and the encoding used to represent sequences of characters as
sequences of bytes.
XML performs all character processing in terms of the Universal
Character Set (UCS, [33] and [35]). XML requires all XML processors
to support both the UTF-8 [4] and UTF-16 [23] encodings of UCS,
although other encodings (charsets) compatible with UCS may be
allowed. External parsed entities encoded in UTF-16 are required to
begin with a Byte Order Mark ([8] section 4.3.3).
IETF policy [3] requires that the UTF-8 charset be allowed for all
text.
This document requires that IETF protocols using XML allow for the
UTF-8 encoding of XML data, and recommends, for simplicity, that only
UTF-8 be allowed. In those situations where other charsets are
allowed, the encoding must be specified using an "encoding" attribute
in the XML declaration (see Section 4.1), even if there might be
other protocol mechanisms for noting it. If the UTF-16 encoding is
allowed, XML requires that UTF-16 encoded external entities start
with a byte order mark (BOM).
5.2 Language Declaration
Text encapsulated in XML can be represented in many different human
languages, and it is often useful to explicitly identify the language
used to present the text. XML defines a special attribute in the
"xml" namespace, xml:lang, that can be used to specify the language
used to represent data in an XML document. The xml:lang attribute
(which has to be explicitly declared for use within a DTD or XML
Schema) and the values it can assume are defined in section 2.12 of
[8].
It is strongly recommended that protocols representing data in a
human language mandate use of an xml:lang attribute if the XML
instance might be interpreted in language-dependent contexts.
Hollenbeck, et al. Expires December 3, 2002 [Page 20]
Internet-Draft XML Within IETF Protocols June 2002
5.3 Other Internationalization Considerations
There are standard mechanisms in the typography of some human
languages that can be difficult to represent using merely XML
character string data types. For example, pronunciation clues can be
provided using Ruby annotation [38], and embedding controls (such as
those described in section 3.4 of [45]) or an XHTML [39] "dir"
attribute can be used to note the proper display direction for
bidirectional text.
There are a number of tricky issues that can arise when using
extended character sets with XML document formats. For example:
o There are different ways of representing characters consisting of
combining characters, and
o There has been some debate about whether URIs should be
represented using a restricted US-ASCII subset or arbitrary
Unicode (e.g. "URI character sequence" vs "original character
sequence" in RFC 2396 [20]).
Some of these issues are discussed, with recommendations, in the
W3C's "Character Model for the World Wide Web" document [42].
It is strongly recommended that protocols representing data in a
human language reuse existing mechanisms as needed to ensure proper
display of human-legible text.
Hollenbeck, et al. Expires December 3, 2002 [Page 21]
Internet-Draft XML Within IETF Protocols June 2002
6. IANA Considerations
This memo, per se, has no impact on the IANA.
Hollenbeck, et al. Expires December 3, 2002 [Page 22]
Internet-Draft XML Within IETF Protocols June 2002
7. Security Considerations
Being text-based, protocols built with XML face significant threats,
including unintended disclosure, modification, and replay. Simple
passive attacks, such as packet sniffing, allow an attacker to
capture and view information intended for someone else. Captured
data can be modified and replayed to the original intended recipient,
with the recipient having no way to know that the information has
been compromised, detect modifications, be assured of the sender's
identity, or to confirm which protocol instance is legitimate.
Several security service options are available to mitigate these
risks. Though XML does not include any built-in security services,
other protocols and protocol layers provide services that can be used
to protect XML protocols. XML encryption [10] provides privacy
services to prevent unintended disclosure. Canonical XML and [6] XML
digital signatures [7] provide integrity services to detect
modification and authentication services to confirm the identity of
the data source. Other IETF security protocols (e.g., the Transport
Layer Security (TLS) protocol [2]) are also available to protect data
and service endpoints as appropriate. Given the lack of security
services in XML, it is imperative that protocol specifications
mandate additional security services to counter common threats and
attacks; the specific required services will depend on the protocol's
threat model.
Hollenbeck, et al. Expires December 3, 2002 [Page 23]
Internet-Draft XML Within IETF Protocols June 2002
8. Acknowledgements
The authors would like to thank the following people who have
provided significant contributions to the development of this
document:
Mark Baker, Tim Berners-Lee, Tim Bray, Josh Cohen, Alan Crouch,
Martin Duerst, Jun Fujisawa, Yaron Goland, Graham Klyne, Dan Kohn,
Chris Lilley, Murata Makoto, Michael Mealling, Jean-Jacques Moreau,
Andrew Newton, Julian Reschke, Jonathan Rosenberg, Simon St Laurent,
and Daniel Veillard.
Hollenbeck, et al. Expires December 3, 2002 [Page 24]
Internet-Draft XML Within IETF Protocols June 2002
Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[2] Dierks, T., Allen, C., Treese, W., Karlton, P., Freier, A. and
P. Kocher, "The TLS Protocol Version 1.0", RFC 2246, January
1999.
[3] Alvestrand, H., "IETF Policy on Character Sets and Languages",
BCP 18, RFC 2277, January 1998.
[4] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC
2279, January 1998.
[5] Murata, M., St.Laurent, S. and D. Kohn, "XML Media Types", RFC
3023, January 2001.
[6] Boyer, J., "Canonical XML Version 1.0", RFC 3076, March 2001.
[7] Eastlake, D., Reagle, J. and D. Solo, "(Extensible Markup
Language) XML-Signature Syntax and Processing", RFC 3275, March
2002.
[8] Bray, T., Paoli, J., Sperberg-McQueen, C. and E. Maler,
"Extensible Markup Language (XML) 1.0 (2nd ed)", W3C REC-xml,
October 2000, .
[9] Bray, T., Hollander, D. and A. Layman, "Namespaces in XML", W3C
REC-xml-names, January 1999, .
[10] Imamura, T., Dillaway, B., Schaad, J. and E. Simon, "XML
Encryption Syntax and Processing", W3C REC-xmlenc-core, October
2001, .
[11] Thompson, H., Beech, D., Maloney, M. and N. Mendelsohn, "XML
Schema Part 1: Structures", W3C REC-xmlschema-1, May 2001,
.
[12] Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes", W3C
REC-xmlschema-2, May 2001, .
Hollenbeck, et al. Expires December 3, 2002 [Page 25]
Internet-Draft XML Within IETF Protocols June 2002
Informative References
[13] Mealling, M., Popp, N. and M. Moseley, "Common Name Resolution
Protocol (CNRP)", draft-ietf-cnrp-12 (work in progress),
February 2002.
[14] Masinter, L., Mealling, M., Klyne, G. and T. Hardie, "An IETF
URN Sub-namespace for Registered Protocol Parameters", draft-
mealling-iana-urn-03 (work in progress), May 2002.
[15] Mealling, M., "The IETF XML Registry", draft-mealling-iana-
xmlns-registry-03 (work in progress), November 2001.
[16] Case, J., Fedor, M., Schoffstall, M. and J. Davin, "Simple
Network Management Protocol (SNMP)", STD 15, RFC 1157, May
1990.
[17] Srinivasan, R., "XDR: External Data Representation Standard",
RFC 1832, August 1995.
[18] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies",
RFC 2045, November 1996.
[19] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, November 1997.
[20] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
Resource Identifiers (URI): Generic Syntax", RFC 2396, August
1998.
[21] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
Considerations Section in RFCs", BCP 26, RFC 2434, October
1998.
[22] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June
1999.
[23] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
RFC 2781, February 2000.
[24] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April
2001.
[25] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame,
C., Eisler, M. and D. Noveck, "NFS version 4 Protocol", RFC
3010, December 2000.
Hollenbeck, et al. Expires December 3, 2002 [Page 26]
Internet-Draft XML Within IETF Protocols June 2002
[26] Kennedy, H., "Binary Lexical Octet Ad-hoc Transport", RFC 3252,
1 April 2002.
[27] Backus, J., "The syntax and semantics of the proposed
international algebraic language of the Zurich ACM-GAMM
conference", June 1959.
[28] American National Standards Institute, "Code Extension
Techniques for Use with the 7-bit Coded Character Set of
American National Standard Code (ASCII) for Information
Interchange", ANSI X3.41, FIPS PUB 35, 1974.
[29] American National Standards Institute, "Information Retrieval:
Application Service Definition and Protocol Specification",
ANSI Z39.50, ISO Standard 23950, 1995.
[30] International Organization for Standardization, "Information
Processing Systems - Open Systems Interconnection -
Specification of Abstract Syntax Notation One (ASN.1)", ISO
Standard 8824, December 1990.
[31] International Organization for Standardization, "Information
Processing Systems - Open Systems Interconnection -
Specification of Basic Encoding Rules for Abstract Syntax
Notation One (ASN.1)", ISO Standard 8825, December 1990.
[32] International Organization for Standardization, "Information
processing - Text and office systems - Standard Generalized
Markup Language (SGML)", ISO Standard 8879, 1988.
[33] International Organization for Standardization, "Information
Technology - Universal Multiple-octet coded Character Set (UCS)
- Part 1: Architecture and Basic Multilingual Plane", ISO
Standard 10646-1, May 1993.
[34] International Organization for Standardization, "Document
Description and Processing Languages", December 2001, .
[35] Unicode Consortium, "Unicode 3.2", UAX 28, March 2002, .
[36] Bray, T., Paoli, J. and C. Sperberg-McQueen, "Extensible Markup
Language (XML) 1.0", W3C REC-xml-1998, February 1998, .
[37] Lassila, O. and R. Swick, "Resource Description Framework (RDF)
Model and Syntax Specification", W3C REC-rdf-syntax, February
Hollenbeck, et al. Expires December 3, 2002 [Page 27]
Internet-Draft XML Within IETF Protocols June 2002
1999, .
[38] Suignard, M., Ishikawa, M., Duerst, M. and T. Texin, "Ruby
Annotation", W3C REC-RUBY, May 2001, .
[39] Pemberton, S., "XHTML 1.0: The Extensible HyperText Markup
Language", W3C REC-XHTML, January 2000, .
[40] Cowan, J. and R. Tobin, "XML Information Set", W3C REC-infoset,
October 2001, .
[41] Marsh, J., "XML Base", W3C REC-xmlbase, June 2001, .
[42] Duerst, M., Yergeau, F., Ishida, R., Wolf, M., Freytag, A. and
T. Texin, "Character Model for the World Wide Web 1.0", April
2002, .
[43] Gudgin, M., Hadley, M., Moreau, JJ. and H. Nielsen, "SOAP
Version 1.2 Part 1: Messaging Framework", December 2001,
.
[44] Gudgin, M., Hadley, M., Moreau, JJ. and H. Nielsen, "SOAP
Version 1.2 Part 2: Adjuncts", December 2001, .
[45] Duerst, M. and A. Freytag, "Unicode in XML and other Markup
Languages", February 2002, .
[46] W3C Communications Team, "XML in 10 points", November 2001,
.
[47] OASIS Technical Committee: RELAX NG, "RELAX NG Specification",
December 2001, .
[48] Jelliffe, R., "The Schematron", November 2001, .
Hollenbeck, et al. Expires December 3, 2002 [Page 28]
Internet-Draft XML Within IETF Protocols June 2002
URIs
[49]
[50]
[51]
[52]
Authors' Addresses
Scott Hollenbeck
VeriSign, Inc.
21345 Ridgetop Circle
Dulles, VA 20166-6503
US
Phone: +1 703 948 3257
EMail: shollenbeck@verisign.com
Marshall T. Rose
Dover Beach Consulting, Inc.
POB 255268
Sacramento, CA 95865-5268
US
Phone: +1 916 483 8878
EMail: mrose@dbc.mtview.ca.us
Larry Masinter
Adobe Systems Incorporated
Mail Stop W14
345 Park Ave.
San Jose, CA 95110
US
Phone: +1 408 536 3024
EMail: LMM@acm.org
URI: http://larry.masinter.net
Hollenbeck, et al. Expires December 3, 2002 [Page 29]
Internet-Draft XML Within IETF Protocols June 2002
Appendix A. Appendix A: Change History
The following changes were made to produce version -04 from -03:
o More minor editorial changes in several places.
o Reworded Section 1.3.
o Modified the last point in Section 2.
o Added Section 4.4 to address XML Information Set.
o Updated the first and third paragraphs of Section 4.5.
o More rework in Section 4.7.1.
o Added new section Section 4.12 to address URI processing and
xml:base.
o Updated Section 4.13.
o Added some more text to Section 5.1 to address byte order marks
and UTF-16.
The following changes were made to produce version -03 from -02:
o Minor editorial fixes throughout.
o Minor updates in Section 2.
o Updated Section 4.2.
o Added Section 4.3 to address XML comments.
o Updated Section 4.5.
o Moved the last paragraph of Section 4.7.1 to Section 4.6.
o Added an additional example to Section 4.7.1.
o Modified Section 4.6, Section 4.8, Section 4.9, and Section 5.1 to
address received comments.
o Moved discussion of IANA interactions to Section 4 and noted that
this document has no direct impact on IANA.
o Updated the Schematron reference.
Hollenbeck, et al. Expires December 3, 2002 [Page 30]
Internet-Draft XML Within IETF Protocols June 2002
o Fixed the "XML-in-10-points" reference.
The following changes were made to produce version -02 from -01:
o Changed the title slightly ("in IETF" to "within IETF") to help
clarify the scope.
o Changed the abstract slightly (added "being developed") to the
first sentence.
o Changed the "conventions" paragraph slightly.
o Added text to the introduction/scope to clarify that the document
is not intended as an endorsement to use XML.
o Removed TBD from Section 1.
o Added an additional list element on binary data encoding in
Section 2, added another sentence to the "text based" list
element, and modified the "processing speed" list element.
o Rewrote the first paragraphs of Section 3, adding a reference to
RFC 3252.
o Rewrote Section 4.1.
o Reworded and added text to Section 4.5.
o Changed "in lieu of" to "in the absence of" in old paragraph 7 of
Section 4.6.
o Restructured Section 4.6 to acknowledge that there is still some
controversy surrounding XML Schema.
o Added paragraph on default attributes to Section 4.8, added a new
paragraph to address value enumeration, reworked the example, and
changed the last paragraph slightly.
o Rewrote Section 4.9.
o Added Section 4.10 to address incremental processing.
o Rewrote portions of Section 5; adding references to Unicode 3.2
and ISO 10646.
The following changes were made to produce version -01 from -00:
o Changed "eXtensible" to "Extensible" throughout.
Hollenbeck, et al. Expires December 3, 2002 [Page 31]
Internet-Draft XML Within IETF Protocols June 2002
o Fixed the discussion mailing list name in the front matter.
o Changed use of "data encapsulation" to "structured data
representation" (or similar) throughout.
o Added namespace reference and text to discussion of extensibility
in Section 3.
o Rewrote Section 4.6 and added needed references.
o Added text to address extension recognition and attributes in
Section 4.7.
o Added another attribute restriction in Section 4.8.
o Added reference to the "An IETF URN Sub-namespace for Registered
Protocol Parameters" I-D in Section 6.
o Added reference to RFC 2396 and W3C character model in Section 5.
Hollenbeck, et al. Expires December 3, 2002 [Page 32]
Internet-Draft XML Within IETF Protocols June 2002
Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Hollenbeck, et al. Expires December 3, 2002 [Page 33]