Network Working Group                                Donald Eastlake 3rd
INTERNET-DRAFT                                                  Motorola
Expires: June 2001                                         December 2000


                Protocol versus Document Points of View
                -------- ------ -------- ------ -- ----
                 <draft-eastlake-proto-doc-pov-01.txt>


Status of This Document

   This draft is intended to become an Informational RFC.  It's
   distribution is unlimited. Please send comments to the author.

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC 2026.  Internet-Drafts are
   working documents of the Internet Engineering Task Force (IETF), its
   areas, and its working groups.  Note that other groups may also
   distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Abstract

   Two points of view are contrasted: the "document" point of view,
   where objects of interest are like pieces of paper, and the
   "protocol" point of view where objects of interest are like composite
   protocol messages.  While each point of view has its place,
   inappropriate adherence to a purely document point of view is
   detrimental to protocol design.  By understanding both of these
   points of view, conflicts between them will be lessened.


D. Eastlake 3rd                                                 [Page 1]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


Table of Contents

      Status of This Document....................................1
      Abstract...................................................1

      Table of Contents..........................................2

      1. Introduction............................................3
      2. Points of View..........................................3
      2.1 Basic Point of View....................................3
      2.2 The Question of Meaning................................4
      2.3 Meaning and Adjuncts...................................4
      2.4 Processing or Lack Thereof.............................5
      2.5 Canonicalization and Security..........................5
      2.6 Unique Internal Labels.................................6
      3. Examples................................................7
      4. Synthesis of the Points of View.........................8
      5. Conclusion..............................................8

      References.................................................9

      Author's Address..........................................10
      Expiration and File Name..................................10


D. Eastlake 3rd                                                 [Page 2]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


1. Introduction

   Much of the IETF's traditional work has concerned low level binary
   protocol constructs.  These are almost always viewed from the
   protocol point of view as defined below.  But as higher level
   application constructs and syntaxes are involved in the standards
   process, difficulties can arise due to participants who are fixated
   on the document point of view as defined below.  Those practiced in
   and accustomed to one point of view may have difficulty in
   understanding the other.  Even after they understand the other, it is
   very easy to slip back into thinking about things only from their
   accustomed point of view.

   Section 2 below tries to define and explore the differences between
   these points of view.  Section 3 gives some examples.  And Section 4
   tries to synthesize the views and give general design advice in areas
   which can reasonably be viewed either way.


2. Points of View

   The following subsections contrast the document and protocol points
   of view.  Each view is exaggerated for effect.

   The document point of view is indicated in paragraphs headed "DOCUM"
   while the protocol point of view is indicated in paragraphs headed
   "PROTO".


2.1 Basic Point of View

   DOCUM: What is important are complete digital documents viewed by
      people or things which are very close equivalents.  A major
      concern is to be able to present such documents as directly as
      possible to a court or adjudicator should a dispute arise.  Since
      what is presented to the person all that is important, anything
      which can effect this, such as a "style sheet", should be
      considered part of the document.

   PROTO: What is important are bits on the wire generated and consumed
      by well defined computer processes or things which are very close
      equivalents.  Pieces of such messages may ordinarily end up being
      included in or influencing data displayed to a person, but it is
      just as common for no person to ever see any of it.  A message as
      a whole is only viewed by a geek when debugging.  If you actually
      ever have to prove something about such a message in a court,
      there isn't any way to avoid having expert witnesses interpret it.


D. Eastlake 3rd                                                 [Page 3]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


2.2 The Question of Meaning

   DOCUM: The "meaning" of a document is a deep and interesting human
      question.  It is probably necessary for the document to include or
      reference human language policy and/or warranty/disclaimer
      information.  It is reasonable to consult attorneys and require
      some minimal human readable statements to be "within the four
      corners" of the document (i.e., actually embedded in the digital
      structure).

   PROTO: The "meaning" of a protocol message is clear from the protocol
      specification and is frequently defined in terms of the state
      machines of the sender and recipient.  Protocol messages are only
      truly meaningful to the processes producing and consuming them,
      which processes have additional context.  Adding human readable
      text that is not functionally required is silly.  Consulting
      attorneys may needlessly complicate the protocol and in the worst
      case tie any design effort in knots.


2.3 Meaning and Adjuncts

   DOCUM: From a document point of view, at the top level we have the
      equivalent of a person looking at a piece of paper.  So machine
      detectable and processable adjunct items such as digital
      signatures, person's names, dates, etc., must, in general, be self
      documenting as to meaning.  Thus a digital signature needs to
      include what that signature means (is the signer a witness,
      author, guaranteer, or what?).  Similarly, a person's name or date
      might need include what that person's role is or the meaning of
      the date such as editor, author, contributor or date of creation,
      modification, or distribution.  Furthermore, given the
      unrestrained scope of what can be documented, there is a risk of
      trying to enumerate and standardize all possible "semantic tags"
      for each type of adjunct data, which can be a difficult, complex,
      and hazy task.

   PROTO: From a protocol point of view, the semantics of the message
      and every adjunct in it are defined in the protocol specification.
      Thus, if there is a slot for a digital signature, person's name, a
      date, or whatever, the party that is to enter that data, the party
      or parties that are to read it, and its meaning are all pre-
      defined.  Even if there are several possible meanings, the
      specific meaning that applies can be selected by a separate field
      and only the meanings relevant to the particular protocol need be
      considered.  Thus, from the protocol point of view, there is not
      need to expand each adjunct with a meaning field.  Another way to
      look at this is that the meaning of each adjunct, instead of being
      pushed into the adjunct as the document point of view encourages,


D. Eastlake 3rd                                                 [Page 4]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


      is protomoted to the level of the document or the protocol
      specification, resulting in simpler adjuncts.


2.4 Processing or Lack Thereof

   DOCUM: The standard model of a document is as a quasi-static object
      somewhat like a piece of paper.  About all you do to documents is
      transfer them as a whole, from one storage area to another, or add
      attachments.  (Possibly you might want an extract from a document
      or to combine multiple documents into a summary but this isn't the
      common case.)

   PROTO: The standard model of a protocol message is as an ephemeral
      composite object created by a source process and consumed by a
      destination process.  Normally a message is constructed from
      information contained in or pieces of other messages previously
      received by the sending process, as well as local information.


2.5 Canonicalization and Security

   Canonicalization is the transformation of the information in a
   message into a "standard" form, discarding "insignificant"
   information.  For example, encoding into a standard character set or
   changing line endings into a standard encoding and discarding the
   information as to what the original character set or line ending
   encodings were.  Obvious, what is "standard" and what is
   "insignificant" varies with the application or protocol and can be
   tricky to determine.

   DOCUM: From the document point of view, canonicalization is extremely
      suspect if not outright evil.  After all, if you have a piece of
      paper with writing on it, any modification to "standardize" its
      format can be an unauthorized change in the original message as
      created by the author.  From the document point of view, digital
      signatures are like authenticating signatures or seals or time
      stamps on the bottom of the "piece of paper".  They do not justify
      and should not depend on the slightest change in the message
      appearing above them.  Similarly, from the document point of view,
      encryption is just putting the "piece of paper" in a vault that
      only certain people can open, and does not justify any
      standardization or canonicalization of the message.

   PROTO: From the protocol point of view, you know that you just have a
      pile of bits that have never been seen and never will be seen by a
      person.  In some cases, a human sensible representation of some of
      the bits may be shown to a person.  But, for protocols of


D. Eastlake 3rd                                                 [Page 5]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


      realistic complexity, most of the parts of the message will be
      artifacts of encoding, protocol structure, and computer
      representation rather than anything intended for a person to see.
      In theory, the "original" idiosyncratic form of any digitally
      signed part could be conveyed unchange through the computer
      processes which implement the protocol and usefully signed in that
      form, but in practical systems of any complexity, this always
      proves unreasonably difficult for at least some parts of some
      messages. Thus, the signed data must be canonicalized as part of
      the signing and verification processes.  Even if, miraculously, an
      initial system design avoids all cases of signed message part
      reconstruction based on processed data or re-encoding based on
      character set or line ending or capitalization or numeric
      representation or time zones or whatever, later revisions and
      extensions are almost certain to require such reconstruction
      and/or re-encoding.  Because of this, from the protocol point of
      view, canonicalization is a necessity.  It is just a question of
      exactly what canonicalization or canonicalizations.

   Thus, for protocol systems of practical complexity, you are faced
   with the choice of
      (1) doing no canonicalization and having brittle signatures,
   useless due to insignificant failures to verify, or
      (2) doing the sometimes difficult and tricky work of designing an
   appropriate canonicalization or caonnicalizaitons to be used as part
   of signature generation and verification producing robust and useful
   signatures.

   While the application of canonicalization is more obvious with
   digital signatures, it may also apply to encryption, particularly
   encryption of parts of a message.  Sometimes elements of the
   environment where the encrypted data is found effect its
   interpretation.  For example, the character encoding or bindings of
   dummy symbols.  When the data is decrypted, it may be into an
   environment with a different character encoding and dummy symbol
   bindings.  With a plain text message part, it is usually clear what
   of these environmental elements need to be incorporated in or
   conveyed with the message.  But a encrypted message part is opaque.
   Thus some canonical representation that incorporates such
   environmental factors may be needed.


2.6 Unique Internal Labels

   It is sometimes considered desireable to able to reference parts of
   strucutred objects by some sort of "label" or "id" or "tag".  The
   idea is that this forms a fixed "anchor" that can be used "globally",
   at least within an application domain, to reference the tagged part.


D. Eastlake 3rd                                                 [Page 6]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


   DOCUM: From the document point of view, it seems logical to just
      provide for a text tag.  The concept would be that users or
      applications could easily come up with short readable tags.  These
      would probably be meaningful to a person if humanly generated
      (i.e., "Susan") and at least fairly short and systematic if
      automatically generated (i.e., "A123").  The ID attribute type in
      XML [XML] appears to have been thought of this way although it can
      be used in other ways.

   PROTO: From a protocol point of view, unique interal labels look very
      different than they do from a document point of view.  Since
      pieces of different protocol messages may later be combined in a
      variety of ways, previously unique lables may conflict.  There are
      in really only three possibilities if you need such tags, as
      follows:
      (1) Have a system for dynamically rewritting such tags to maintain
          uniqueness.  This is usually a disaster as it (a) invalidates
          any stored copies of the tags that are not rewritten, and it
          is usually impossible to be sure there aren't more copies
          lurking somewhere you failed to update, and (b) invalidates
          digital signatures.
      (2) Use some form of hierarhcial qualified tags.  Thus the total
          tag can remain unique even if a part is moved, because its
          qualification changes.  This avoids the digital signature
          problems of possibility 1.  But it destroys the concept of a
          globally unique anchor embedded in and moving with the data
          and stored tags are still invalidated by data moves.
      (3) Construct a lengthy globally unique tag string.  This can be
          done succesfully by using a good enough random number
          generator and big enough random tags or more sequentially as
          in the way email messages IDs are created [RFC 822].
      Thus, from a strict protocol point of view, only choice 3 works.


3. Examples

   An example of something designed, to a significant extent, from the
   document point of view is the X.509v3 Certificate [X509v3].  An
   example of something that can easily be viewed both ways and where
   the best results frequently attention to not only the document but
   also the protocol point of view is the eXtensible Markup Language
   (XML [XML]).

   (more to be added)


D. Eastlake 3rd                                                 [Page 7]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


4. Synthesis of the Points of View

   There are some merits to each point of view.  Certainly the document
   point of view has some intuitive simplicity and appeal and is fine
   for applications where it meets the needs.

   The protocol point of view can come close to encompassing the
   document point of view as a limiting case.  In particular, as

      the complexity of messages declines to a single payload (perhaps
      with attachments) and

      the mutability of the payload declines to some standard binary
      format that needs no canonicalization and

      the number of parties and amount of processing as messages are
      transferred declines and

      the portion of the message intended for more or less direct human
      consumption increases,

   the protocol point of view would be narrowed to something close to
   the document point of view.  Even when the document point of view is
   questionable, the addition of a few options to a protocol, such as
   minimal and/or no canonicalication or optional policy
   statement/pointer inclusion, will usually satisfy the perceived needs
   of those holding a document point of view.

   On the other hand, the document point of view is hard to stretch to
   encompass the protocol case.  From an extreme document point of view,
   canonicalization is wrong, inclusion of human language policy text
   within every object and a meaning with every adjunct should be
   mandatory, etc.

   Failure to incorporate the protocol view point as described above in
   the design of protocols of realistic complexity may have fatal
   consequences.


5. Conclusion

   The author hopes that this document will help explain to those of
   either point of view where those with the other view are coming from,
   decrease conflict, and lead to better consensus protocol design.


D. Eastlake 3rd                                                 [Page 8]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


References

   [RFC 822] - "Standard for the format of ARPA Internet text messages",
   D. Crocker, Aug-13-1982.

   [X509v3] - "ITU-T Recommendation X.509 version 3 (1997), Information
   Technology - Open Systems Interconnection - The Directory
   Authentication Framework",  ISO/IEC 9594-8:1997.

   [XML] - Extensible Markup Language (XML) 1.0 Recommendation. T. Bray,
   J. Paoli, C. M. Sperberg-McQueen. February 1998.
   <http://www.w3.org/TR/1998/REC-xml-19980210>


D. Eastlake 3rd                                                 [Page 9]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


Author's Address

   The author of this document is:

        Donald E. Eastlake 3rd
        Motorola
        155 Beaver Street
        Milford, MA 01757 USA

        Phone:  +1 508-261-5434 (w)
                +1 508-634-2066 (h)
        Fax:    +1 508-261-4777 (w)
        EMail:  Donald.Eastlake@motorola.com


Expiration and File Name

   This draft expires June 2001.

   Its file name is <draft-eastlake-proto-doc-pov-01.txt>.


D. Eastlake 3rd                                                [Page 10]


INTERNET-DRAFT    Protocol versus Document Viewpoints      December 2000


D. Eastlake 3rd                                                [Page 11]