Network Working Group Donald Eastlake 3rd INTERNET-DRAFT Motorola Expires: June 2001 December 2000 Protocol versus Document Points of View -------- ------ -------- ------ -- ---- Status of This Document This draft is intended to become an Informational RFC. It's distribution is unlimited. Please send comments to the author. This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Two points of view are contrasted: the "document" point of view, where objects of interest are like pieces of paper, and the "protocol" point of view where objects of interest are like composite protocol messages. While each point of view has its place, inappropriate adherence to a purely document point of view is detrimental to protocol design. By understanding both of these points of view, conflicts between them will be lessened. D. Eastlake 3rd [Page 1] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 Table of Contents Status of This Document....................................1 Abstract...................................................1 Table of Contents..........................................2 1. Introduction............................................3 2. Points of View..........................................3 2.1 Basic Point of View....................................3 2.2 The Question of Meaning................................4 2.3 Meaning and Adjuncts...................................4 2.4 Processing or Lack Thereof.............................5 2.5 Canonicalization and Security..........................5 2.6 Unique Internal Labels.................................6 3. Examples................................................7 4. Synthesis of the Points of View.........................8 5. Conclusion..............................................8 References.................................................9 Author's Address..........................................10 Expiration and File Name..................................10 D. Eastlake 3rd [Page 2] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 1. Introduction Much of the IETF's traditional work has concerned low level binary protocol constructs. These are almost always viewed from the protocol point of view as defined below. But as higher level application constructs and syntaxes are involved in the standards process, difficulties can arise due to participants who are fixated on the document point of view as defined below. Those practiced in and accustomed to one point of view may have difficulty in understanding the other. Even after they understand the other, it is very easy to slip back into thinking about things only from their accustomed point of view. Section 2 below tries to define and explore the differences between these points of view. Section 3 gives some examples. And Section 4 tries to synthesize the views and give general design advice in areas which can reasonably be viewed either way. 2. Points of View The following subsections contrast the document and protocol points of view. Each view is exaggerated for effect. The document point of view is indicated in paragraphs headed "DOCUM" while the protocol point of view is indicated in paragraphs headed "PROTO". 2.1 Basic Point of View DOCUM: What is important are complete digital documents viewed by people or things which are very close equivalents. A major concern is to be able to present such documents as directly as possible to a court or adjudicator should a dispute arise. Since what is presented to the person all that is important, anything which can effect this, such as a "style sheet", should be considered part of the document. PROTO: What is important are bits on the wire generated and consumed by well defined computer processes or things which are very close equivalents. Pieces of such messages may ordinarily end up being included in or influencing data displayed to a person, but it is just as common for no person to ever see any of it. A message as a whole is only viewed by a geek when debugging. If you actually ever have to prove something about such a message in a court, there isn't any way to avoid having expert witnesses interpret it. D. Eastlake 3rd [Page 3] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 2.2 The Question of Meaning DOCUM: The "meaning" of a document is a deep and interesting human question. It is probably necessary for the document to include or reference human language policy and/or warranty/disclaimer information. It is reasonable to consult attorneys and require some minimal human readable statements to be "within the four corners" of the document (i.e., actually embedded in the digital structure). PROTO: The "meaning" of a protocol message is clear from the protocol specification and is frequently defined in terms of the state machines of the sender and recipient. Protocol messages are only truly meaningful to the processes producing and consuming them, which processes have additional context. Adding human readable text that is not functionally required is silly. Consulting attorneys may needlessly complicate the protocol and in the worst case tie any design effort in knots. 2.3 Meaning and Adjuncts DOCUM: From a document point of view, at the top level we have the equivalent of a person looking at a piece of paper. So machine detectable and processable adjunct items such as digital signatures, person's names, dates, etc., must, in general, be self documenting as to meaning. Thus a digital signature needs to include what that signature means (is the signer a witness, author, guaranteer, or what?). Similarly, a person's name or date might need include what that person's role is or the meaning of the date such as editor, author, contributor or date of creation, modification, or distribution. Furthermore, given the unrestrained scope of what can be documented, there is a risk of trying to enumerate and standardize all possible "semantic tags" for each type of adjunct data, which can be a difficult, complex, and hazy task. PROTO: From a protocol point of view, the semantics of the message and every adjunct in it are defined in the protocol specification. Thus, if there is a slot for a digital signature, person's name, a date, or whatever, the party that is to enter that data, the party or parties that are to read it, and its meaning are all pre- defined. Even if there are several possible meanings, the specific meaning that applies can be selected by a separate field and only the meanings relevant to the particular protocol need be considered. Thus, from the protocol point of view, there is not need to expand each adjunct with a meaning field. Another way to look at this is that the meaning of each adjunct, instead of being pushed into the adjunct as the document point of view encourages, D. Eastlake 3rd [Page 4] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 is protomoted to the level of the document or the protocol specification, resulting in simpler adjuncts. 2.4 Processing or Lack Thereof DOCUM: The standard model of a document is as a quasi-static object somewhat like a piece of paper. About all you do to documents is transfer them as a whole, from one storage area to another, or add attachments. (Possibly you might want an extract from a document or to combine multiple documents into a summary but this isn't the common case.) PROTO: The standard model of a protocol message is as an ephemeral composite object created by a source process and consumed by a destination process. Normally a message is constructed from information contained in or pieces of other messages previously received by the sending process, as well as local information. 2.5 Canonicalization and Security Canonicalization is the transformation of the information in a message into a "standard" form, discarding "insignificant" information. For example, encoding into a standard character set or changing line endings into a standard encoding and discarding the information as to what the original character set or line ending encodings were. Obvious, what is "standard" and what is "insignificant" varies with the application or protocol and can be tricky to determine. DOCUM: From the document point of view, canonicalization is extremely suspect if not outright evil. After all, if you have a piece of paper with writing on it, any modification to "standardize" its format can be an unauthorized change in the original message as created by the author. From the document point of view, digital signatures are like authenticating signatures or seals or time stamps on the bottom of the "piece of paper". They do not justify and should not depend on the slightest change in the message appearing above them. Similarly, from the document point of view, encryption is just putting the "piece of paper" in a vault that only certain people can open, and does not justify any standardization or canonicalization of the message. PROTO: From the protocol point of view, you know that you just have a pile of bits that have never been seen and never will be seen by a person. In some cases, a human sensible representation of some of the bits may be shown to a person. But, for protocols of D. Eastlake 3rd [Page 5] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 realistic complexity, most of the parts of the message will be artifacts of encoding, protocol structure, and computer representation rather than anything intended for a person to see. In theory, the "original" idiosyncratic form of any digitally signed part could be conveyed unchange through the computer processes which implement the protocol and usefully signed in that form, but in practical systems of any complexity, this always proves unreasonably difficult for at least some parts of some messages. Thus, the signed data must be canonicalized as part of the signing and verification processes. Even if, miraculously, an initial system design avoids all cases of signed message part reconstruction based on processed data or re-encoding based on character set or line ending or capitalization or numeric representation or time zones or whatever, later revisions and extensions are almost certain to require such reconstruction and/or re-encoding. Because of this, from the protocol point of view, canonicalization is a necessity. It is just a question of exactly what canonicalization or canonicalizations. Thus, for protocol systems of practical complexity, you are faced with the choice of (1) doing no canonicalization and having brittle signatures, useless due to insignificant failures to verify, or (2) doing the sometimes difficult and tricky work of designing an appropriate canonicalization or caonnicalizaitons to be used as part of signature generation and verification producing robust and useful signatures. While the application of canonicalization is more obvious with digital signatures, it may also apply to encryption, particularly encryption of parts of a message. Sometimes elements of the environment where the encrypted data is found effect its interpretation. For example, the character encoding or bindings of dummy symbols. When the data is decrypted, it may be into an environment with a different character encoding and dummy symbol bindings. With a plain text message part, it is usually clear what of these environmental elements need to be incorporated in or conveyed with the message. But a encrypted message part is opaque. Thus some canonical representation that incorporates such environmental factors may be needed. 2.6 Unique Internal Labels It is sometimes considered desireable to able to reference parts of strucutred objects by some sort of "label" or "id" or "tag". The idea is that this forms a fixed "anchor" that can be used "globally", at least within an application domain, to reference the tagged part. D. Eastlake 3rd [Page 6] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 DOCUM: From the document point of view, it seems logical to just provide for a text tag. The concept would be that users or applications could easily come up with short readable tags. These would probably be meaningful to a person if humanly generated (i.e., "Susan") and at least fairly short and systematic if automatically generated (i.e., "A123"). The ID attribute type in XML [XML] appears to have been thought of this way although it can be used in other ways. PROTO: From a protocol point of view, unique interal labels look very different than they do from a document point of view. Since pieces of different protocol messages may later be combined in a variety of ways, previously unique lables may conflict. There are in really only three possibilities if you need such tags, as follows: (1) Have a system for dynamically rewritting such tags to maintain uniqueness. This is usually a disaster as it (a) invalidates any stored copies of the tags that are not rewritten, and it is usually impossible to be sure there aren't more copies lurking somewhere you failed to update, and (b) invalidates digital signatures. (2) Use some form of hierarhcial qualified tags. Thus the total tag can remain unique even if a part is moved, because its qualification changes. This avoids the digital signature problems of possibility 1. But it destroys the concept of a globally unique anchor embedded in and moving with the data and stored tags are still invalidated by data moves. (3) Construct a lengthy globally unique tag string. This can be done succesfully by using a good enough random number generator and big enough random tags or more sequentially as in the way email messages IDs are created [RFC 822]. Thus, from a strict protocol point of view, only choice 3 works. 3. Examples An example of something designed, to a significant extent, from the document point of view is the X.509v3 Certificate [X509v3]. An example of something that can easily be viewed both ways and where the best results frequently attention to not only the document but also the protocol point of view is the eXtensible Markup Language (XML [XML]). (more to be added) D. Eastlake 3rd [Page 7] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 4. Synthesis of the Points of View There are some merits to each point of view. Certainly the document point of view has some intuitive simplicity and appeal and is fine for applications where it meets the needs. The protocol point of view can come close to encompassing the document point of view as a limiting case. In particular, as the complexity of messages declines to a single payload (perhaps with attachments) and the mutability of the payload declines to some standard binary format that needs no canonicalization and the number of parties and amount of processing as messages are transferred declines and the portion of the message intended for more or less direct human consumption increases, the protocol point of view would be narrowed to something close to the document point of view. Even when the document point of view is questionable, the addition of a few options to a protocol, such as minimal and/or no canonicalication or optional policy statement/pointer inclusion, will usually satisfy the perceived needs of those holding a document point of view. On the other hand, the document point of view is hard to stretch to encompass the protocol case. From an extreme document point of view, canonicalization is wrong, inclusion of human language policy text within every object and a meaning with every adjunct should be mandatory, etc. Failure to incorporate the protocol view point as described above in the design of protocols of realistic complexity may have fatal consequences. 5. Conclusion The author hopes that this document will help explain to those of either point of view where those with the other view are coming from, decrease conflict, and lead to better consensus protocol design. D. Eastlake 3rd [Page 8] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 References [RFC 822] - "Standard for the format of ARPA Internet text messages", D. Crocker, Aug-13-1982. [X509v3] - "ITU-T Recommendation X.509 version 3 (1997), Information Technology - Open Systems Interconnection - The Directory Authentication Framework", ISO/IEC 9594-8:1997. [XML] - Extensible Markup Language (XML) 1.0 Recommendation. T. Bray, J. Paoli, C. M. Sperberg-McQueen. February 1998. D. Eastlake 3rd [Page 9] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 Author's Address The author of this document is: Donald E. Eastlake 3rd Motorola 155 Beaver Street Milford, MA 01757 USA Phone: +1 508-261-5434 (w) +1 508-634-2066 (h) Fax: +1 508-261-4777 (w) EMail: Donald.Eastlake@motorola.com Expiration and File Name This draft expires June 2001. Its file name is . D. Eastlake 3rd [Page 10] INTERNET-DRAFT Protocol versus Document Viewpoints December 2000 D. Eastlake 3rd [Page 11]