Network Working Group A. Mouat Internet-Draft diffxml Expires: April 19, 2006 October 16, 2005 A delta format for XML documents draft-mouat-xml-patch-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 19, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document specifies an implementation independent format for expressing a set of changes between 2 XML documents. This set of changes is commonly referred to as a "delta" in computing terminology. The delta can be used to automatically transform (or "patch") one XML document into another. Mouat Expires April 19, 2006 [Page 1] Internet-Draft diffxml October 2005 Table of Contents 1. Requirements notation . . . . . . . . . . . . . . . . . . . . 3 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Hierarchical vs Line Based Differencing . . . . . . . . . . . 5 4. Structure of DUL Document . . . . . . . . . . . . . . . . . . 6 4.1. Insert Operation . . . . . . . . . . . . . . . . . . . . . 6 4.1.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2. Content . . . . . . . . . . . . . . . . . . . . . . . 7 4.1.3. Example . . . . . . . . . . . . . . . . . . . . . . . 7 4.2. Insert Attribute Operation . . . . . . . . . . . . . . . . 7 4.2.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 8 4.2.2. Content . . . . . . . . . . . . . . . . . . . . . . . 8 4.2.3. Example . . . . . . . . . . . . . . . . . . . . . . . 8 4.3. Delete Operation . . . . . . . . . . . . . . . . . . . . . 8 4.3.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 9 4.3.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 9 4.4. Update Operation . . . . . . . . . . . . . . . . . . . . . 10 4.4.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 10 4.4.2. Content . . . . . . . . . . . . . . . . . . . . . . . 11 4.4.3. Examples . . . . . . . . . . . . . . . . . . . . . . . 11 4.5. Move Operation . . . . . . . . . . . . . . . . . . . . . . 12 4.5.1. Attributes . . . . . . . . . . . . . . . . . . . . . . 12 4.5.2. Example . . . . . . . . . . . . . . . . . . . . . . . 13 4.6. Complete Example . . . . . . . . . . . . . . . . . . . . . 13 4.7. Context Information . . . . . . . . . . . . . . . . . . . 14 5. Formal Definitions . . . . . . . . . . . . . . . . . . . . . . 15 6. Security Considerations . . . . . . . . . . . . . . . . . . . 16 7. IANA Consideration . . . . . . . . . . . . . . . . . . . . . . 17 7.1. MIME type registration . . . . . . . . . . . . . . . . . . 17 8. URN Sub-Namespace Registration . . . . . . . . . . . . . . . . 18 9. Normative References . . . . . . . . . . . . . . . . . . . . . 18 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 19 Intellectual Property and Copyright Statements . . . . . . . . . . 20 Mouat Expires April 19, 2006 [Page 2] Internet-Draft diffxml October 2005 1. Requirements notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [5]. Mouat Expires April 19, 2006 [Page 3] Internet-Draft diffxml October 2005 2. Introduction The Delta Update Language (DUL) is an application agnostic format for describing changes to XML documents. This has potential usage in many applications, including reducing network transport usage by obviating the need to send entire XML documents to convey a potentially small change within a document. Mouat Expires April 19, 2006 [Page 4] Internet-Draft diffxml October 2005 3. Hierarchical vs Line Based Differencing Standard UNIX tools exist for comparing (diff) and patching (patch) files, which operate on a line-by-line basis using well-studied methods for computing the longest common subsequence. Using these tools on hierarchically stuctured data, such as XML, leads to sub- optimal results, as they are incapable of recognizing the tree-based structure of the files. For example the XML fragments Figure 1 and Figure 2 are identical in XML terms but substantially different in line-by-line terms: CDATA x Figure 1 CDATA x Figure 2 For this reason the output format is an XML document that deals in terms of tree operations upon nodes of a given document. Mouat Expires April 19, 2006 [Page 5] Internet-Draft diffxml October 2005 4. Structure of DUL Document DUL deltas are XML [9] documents that MUST be well-formed and SHOULD be valid. DUL documents MUST be based on XML 1.0. This specification makes use of XML namespaces for identifying DUL documents and document fragments. The namespace URI for elements defined by this specification is a URN [6], using the namespace identifier 'ietf' defined by [7] and extended by [8]. This URN is: urn:ietf:params:xml:ns:dul The prefix "dul" is used throughout to specify elements in the DUL namespace. This RFC assumes the reader has a working knowledge of XML [9], XPath [10] and DOM Level 2 [11]. There are five basic operations represented by XML elements. These are insert, insertAttr, delete, move and update, defined as follows: 4.1. Insert Operation CDATA xml fragment Figure 3 Represents the insertion of an XML fragment into the document. 4.1.1. Attributes 4.1.1.1. The parent Attribute The parent attribute represents the parent of the node to insert under. The variable "xpathexpr" is an XPath expression that MUST uniquely identify the node. The XPath expression SHOULD be restricted to using node tests of the form "node()", which matches any XPath node, followed by an abbreviated position predicate of the form ["x"] where "x" is the position number of the node. The parent attribute MUST be present. 4.1.1.2. The childno Attribute The childno attribute represents the position at which to insert the Mouat Expires April 19, 2006 [Page 6] Internet-Draft diffxml October 2005 new node. The variable "cn" is the child number of the parent node that the new node is to be inserted as. If there is already a node at this position, that node will be moved to position "cn+1". The number represents the XPath "node()" position which is not necessarily the same as the DOM node index. When inserting attributes the child number is unused and MAY be omitted, as attributes have no defined order. 4.1.1.3. The charpos Attribute The charpos attribute is required for cases where an insert is made in the middle of, immediately after or immediately before character data. It holds the character position at which to insert the node. The variable "char" is the numeric position at which to insert the node. The first character of a text node is 1, in accordance with the XPath standard. Setting the attribute to 1 is equivalent to inserting before the text. The charpos attribute MAY be omitted, and in such cases "char" defaults to 1. 4.1.2. Content The content of an insert element is an XML fragment to be inserted verbatim. All whitespace, comments and processing instructions which are children of the element MUST be inserted exactly as they appear. Any XML included as content MUST be well formed. 4.1.3. Example The following delta fragment represents the insertion of the element
followed by the text "Coleridge" into a document.
Coleridge Figure 4 4.2. Insert Attribute Operation CDATA value Mouat Expires April 19, 2006 [Page 7] Internet-Draft diffxml October 2005 Figure 5 Represents the insertion of an attribute into the document. 4.2.1. Attributes 4.2.1.1. The parent Attribute The parent attribute specifies the parent of the attribute being inserted. The variable "xpathexpr" is an XPath expression that MUST uniquely identify the node. The XPath expression SHOULD be restricted to using node tests of the form "node()", which matches any XPath node, followed by an abbreviated position predicate of the form ["x"] where "x" is the position number of the node. The parent attribute MUST be present. 4.2.1.2. The name Attribute The name attribute represents the name of the new attribute. It MUST be a valid name for an XML attribute. The name attribute MUST be present. 4.2.2. Content The content of the insertAttr element is the value to be given to the attribute. It MUST be a valid value for an XML attribute. 4.2.3. Example The following delta fragment represents the insertion of the attribute title with the value "poetry" into the document. poetry Figure 6 4.3. Delete Operation Mouat Expires April 19, 2006 [Page 8] Internet-Draft diffxml October 2005 Figure 7 Represents the deletion of a subtree, text or attribute from the document. 4.3.1. Attributes 4.3.1.1. The node Attribute The node attribute identifies the node to perform the delete operation on. The variable "xpathexpr" is an XPath expression which uniquely identifies the node to be deleted. Attributes may be deleted by an appropriate XPath expression which specifies their title. The variable "xpathexpr" is subject to the same restrictions as for an insert, with the exception that when an attribute is being deleted it is specified as the last predicate of "xpathexpr". The node attribute MUST NOT be omitted. 4.3.1.2. The charpos Attribute The charpos attribute is only used when character data is being deleted, and is used in conjunction with the length attribute. The variable "char" is the index of the first character to delete, counting in the same way as for the insert operation. It is unused in cases where the node is not a text node. The charpos attribute MAY be omitted, and in such cases "char" defaults to 1. 4.3.1.3. The length Attribute The length attribute is only used when character data is being deleted and identifies how many characters to delete. The variable "len" is the number of characters to delete, from and including the character specified by the charpos attribute. The length attribute MAY be omitted, and in such cases "len"defaults to 0. Therefore if length is unspecified when deleting a text node then no deletion will occur. If the length specified is greater than the length to the end of the node, the length is treated as being equal to the length to the end of the node. Note that entity references may be changed by this operation. Specifying a greater length does not allow deletion of other nodes. 4.3.2. Examples The following delta fragment represents the deletion of the title attribute of an element. Mouat Expires April 19, 2006 [Page 9] Internet-Draft diffxml October 2005 Figure 8 The following delta fragment represents the deletion of a text node, removing the first 7 characters from the node identified. Figure 9 4.4. Update Operation value Figure 10 Represents the updating of a value associated with the given node. 4.4.1. Attributes 4.4.1.1. The node Attribute The node attribute identifies the node to be updated. The variable "xpathexpr" is an XPath expression which uniquely identifies the node to be updated. Attributes may be updated by an appropriate XPath expression which specifies their title. The variable "xpathexpr" is subject to the same restrictions as for the insert operation, with the exception that when an attribute is being updated it is specified as the last predicate of "xpathexpr". Also XPath expression may not point to an element, as elements have no associated value that can be updated. The names of attributes and elements cannot be changed with this operation. The node attribute MUST NOT be omitted. 4.4.1.2. The charpos Attribute The charpos attribute is used when character data is being updated, and is used in conjunction with the length attribute. The variable "char" is the index of the first character to replace, counting in the same way as for the insert operation. It is unused in cases where the node is not a text node. The charpos attribute MAY be Mouat Expires April 19, 2006 [Page 10] Internet-Draft diffxml October 2005 omitted and in such cases defaults to 1. 4.4.1.3. The length Attribute The length attribute is used when character data is being updated and identifies how many characters to replace. The variable "len" represents the number of characters to replace, from and including the character specified by the charpos attribute. The length attribute MAY be omitted and in such cases defaults to 0. The number of characters specified by the length attribute are always replaced, if the new text is not "len" characters long, the old text is truncated. Similarly if the new text is more than "len" characters long, the excess text is inserted without overwriting. Hence if the length attribute is unspecified or 0 when updating a text node, the new text is inserted at the appropriate position, without overwriting the old text. 4.4.2. Content The variable "value" represents the new value for the node. The meaning of the variable is dependent on the type of node being updated. The content MUST NOT contain XML elements. In cases where character data is being updated, the new text overwrites characters beginning at position "char" and ending at "char + len". Excess characters in "value" are appended without overwriting. 4.4.3. Examples The following delta fragment represents an update of a non-attribute node: this is a comment Figure 11 The following delta fragment updates the value of an attribute called "title" to "Arch Bishop": Arch Bishop Figure 12 Mouat Expires April 19, 2006 [Page 11] Internet-Draft diffxml October 2005 4.5. Move Operation Figure 13 Represents the move of a subtree or leaf node within a document. 4.5.1. Attributes 4.5.1.1. The node Attribute The node attribute identifies the node or subtree to be moved. The variable "xpathexpr" is an XPath expression which uniquely identifies the node or subtree to be moved. Attributes may not be moved. The variable "xpathexpr" is subject to the same restrictions as for the insert operation. The node attribute MUST not be omitted. 4.5.1.2. The oldCharpos Attribute The "oldCharpos" attribute is used in cases where a move is made from the middle of, immediately after or immediately before character data. It holds the character position of the node to be moved. The variable "ochar" is the numeric position of the node or the first text character to move. The first character of a text node is 1, in accordance with the XPath standard. The oldCharpos attribute MAY be omitted, and in is such cases defaults to 1. 4.5.1.3. The length Attribute The length attribute identifies the number of characters to move. It is unnecessary when not moving a text node. The length attribute MAY be omitted, and in such cases defaults to 0. When moving a text node, no move will take place if the variable "len" is 0. 4.5.1.4. The parent Attribute The parent attribute identifies the new parent for the node or subtree. The "parxpathexpr" variable uniquely identifies the element that the node identified by xpathexpr is to become a child of. The Mouat Expires April 19, 2006 [Page 12] Internet-Draft diffxml October 2005 XPath expression is restricted as for the insert operation. The parent attribute MUST NOT be omitted. 4.5.1.5. The childno Attribute The childno attribute identifies the node position at which to insert the moved node or subtree. The variable "cn" is the child number of the node identified by "parxpathexpr" that the moved node or subtree is to be inserted as. Any existing node at this position becomes the "cn+1" node. The variable "cn" is the XPath "node()" position that the node will have (as opposed to the DOM node index). The childno attribute MUST NOT be omitted. 4.5.1.6. The newCharpos Attribute The newCharpos attribute is used when a move is made to a position in the middle of, immediately after or immediately before character data. The variable "nchar" is the numeric character position at which to insert the node, counting in the same way as for the insert operation. The first character of a text node is 1, in accordance with the XPath standard. Setting the attribute to 1 represents an insertion before the text. The newCharpos MAY be omitted, and in such cases defaults to 1. 4.5.2. Example The following delta fragment represents the move of the subtree at the 2nd child of the 3rd child of the root element, to the 2nd child of the 2nd child of the root element. Figure 14 4.6. Complete Example The DUL document Section 4 represents then changes required to transform the document Figure 15 into Figure 16. CDATA sometext Mouat Expires April 19, 2006 [Page 13] Internet-Draft diffxml October 2005 Figure 15 CDATA textmoretext Figure 16 moretext Figure 17 4.7. Context Information As with the UNIX diff and patch utilities, it would be useful to support patching of arbitrary files via context matching. This would require DUL documents to contain extra data pertaining to the context of nodes. This is considered to be a future concern and is not currently supported. Mouat Expires April 19, 2006 [Page 14] Internet-Draft diffxml October 2005 5. Formal Definitions Mouat Expires April 19, 2006 [Page 15] Internet-Draft diffxml October 2005 6. Security Considerations There are no special security considerations for this specification. Security considerations are more appropriate in documents describing protocols that might use the delta format described in this specification. Mouat Expires April 19, 2006 [Page 16] Internet-Draft diffxml October 2005 7. IANA Consideration 7.1. MIME type registration To: ietf-types@iana.org Subject: Registration of MIME media type application/xml-diff MIME media type name: application MIME subtype name: xml-diff Required parameters: none Optional parameters: none Encoding Considerations: Same considerations as for XML. Security Considerations: See Section 6. Interoperability Considerations: TODO Published Specification: This document is the published specification for the MIME type being registered. Applications which use this media type: Applications maintaining configuration or application information on HTTP/WebDAV servers are expected to use this media type. Additional Information: There is no magic number or file extension associated with this MIME type. Person & email address to contact for further information: Adrian Mouat (amouat@postmaster.co.uk). Intended usage: Common Author/Change Controller: TODO TODO: also register as an instance manipulation for use in RFC 3229 Mouat Expires April 19, 2006 [Page 17] Internet-Draft diffxml October 2005 8. URN Sub-Namespace Registration URN Sub-Namespace Registration for urn:ietf:params:xml:ns:dul This section registers a new XML namespace, as per the guidelines in . URI: The URI for this namespace is urn:ietf:params:xml:ns:dul. TODO: Fill out the rest of this section as per the guidelines in [8]. 9. Normative References [1] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [2] Clemm, G., Amsden, J., Ellison, T., Kaler, C., and J. Whitehead, "Versioning Extensions to WebDAV (Web Distributed Authoring and Versioning)", RFC 3253, March 2002. [3] Goland, Y., Whitehead, E., Faizi, A., Carter, S., and D. Jensen, "HTTP Extensions for Distributed Authoring -- WEBDAV", RFC 2518, February 1999. [4] Mogul, J., Krishnamurthy, B., Douglis, F., Feldmann, A., Goland, Y., van Hoff, A., and D. Hellerstein, "Delta encoding in HTTP", RFC 3229, January 2002. [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [6] Moats, R., "URN Syntax", RFC 2141, May 1997. [7] Moats, R., "A URN Namespace for IETF Documents", RFC 2648, August 1999. [8] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, January 2004. [9] [10] [11] Mouat Expires April 19, 2006 [Page 18] Internet-Draft diffxml October 2005 Author's Address Adrian Mouat diffxml.sf.net Filsa Quarff, Shetland ZE2 9EY Email: amouat@postmaster.co.uk Mouat Expires April 19, 2006 [Page 19] Internet-Draft diffxml October 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Mouat Expires April 19, 2006 [Page 20]