Internet-Draft                                  K. Calvert, J. Griffioen
                                                  University of Kentucky

Expires January 2001                                           July 2000


                        Internet Concast Service

                     draft-calvert-concast-svc-00.txt 


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   The distribution of this memo is unlimited.  It is filed as <draft-
   calvert-concast-svc-00.txt> and expires January 16, 2001.  Please
   send comments to the authors.


Abstract

   Concast is a many-to-one best-effort network service that allows a
   receiver to treat a group of senders as a single entity, in much the
   same way that IP multicast allows a sender to treat a group of
   receivers as one.  Each concast datagram delivered to a receiver is
   derived from (possibly many) datagrams sent by different members of
   the concast group to that receiver, according to a "merge
   specification".  Concast allows the semantics of this merging
   operation to vary to suit the needs of different applications.  This
   document describes the concast service and presents a framework for
   defining merge semantics.  It also describes the processing of
   concast datagrams by concast-capable routers and hosts.  The concast
   signaling protocol (CSP), an integral part of the concast service, is
   specified in a separate document.


Calvert et al                                                   [Page 1]

INTERNET-DRAFT          Internet Concast Service               July 2000


   Concast is incrementally deployable and backward compatible with IPv4
   and IPv6.  It can be implemented entirely in end systems, but is most
   scalable when supported by routers in the network.


1. Introduction


   Multicast has been an Internet service for many years now [1].  Its
   semantics are simple: when a host sends a packet to a multicast
   address, the network makes its best effort to deliver a copy to all
   hosts in the group.  The network keeps track of receiving hosts'
   locations, and duplicates datagrams as needed while forwarding them
   toward all receivers.  The power of multicast is in its abstraction
   mechanism, which enables a sender to treat an arbitrary number of
   receivers as a single entity.

   Concast is intended to provide a similar abstraction in the reverse
   direction: it enables a receiver to treat an arbitrary number of
   senders as a single entity.  When multiple senders transmit concast
   datagrams to the same receiver, the network makes its best effort to
   "merge" them into a single message for delivery to that receiver.
   The utility of such a service depends on the semantics of the merging
   operation performed by the network layer.  It seems unlikely that any
   single (necessarily application-independent) semantics would have
   sufficiently broad applicability to justify implementation of the
   concast service.  Therefore concast is designed to allow for a broad
   range of merge semantics, all fitting within a certain framework.
   The following examples illustrate a range of possible merge
   semantics:

     o Inverse multicast/duplicate suppression: at most one copy of
       any datagram is delivered to the receiver within a particular
       window of time.

     o Voting: each datagram contains a value chosen by its sender.
       When some threshold number of datagrams has been sent, a single
       datagram containing the value that occurred most often in the
       sent datagrams is delivered.

     o Applying an associative and commutative operator: each datagram
       contains a value. The maximum (minimum, sum, product,
       conjunction, disjunction, bitwise conjunction, bitwise
       disjunction) of the values in all sent datagrams is placed in the
       datagram delivered to the receiver.

   It is envisioned that certain simple merge functions like these will


Calvert et al                                                   [Page 2]

INTERNET-DRAFT          Internet Concast Service               July 2000


   be "hardwired" into the network.  The merge framework defined later
   allows for new merge semantics to be specified simply by defining
   certain functions that make up the framework.  For maximum
   flexibility, receivers could supply such definitions using an
   encoding interpreted by all concast-capable nodes.  The nature of
   this encoding determines the power of the computations permitted for
   merge specifications.

   While concast can be used alone, it is especially useful in
   conjunction with multicast.  Many multicast applications require some
   form of feedback from the receiver set.  For such applications,
   implosion at the multicast source or at internal network nodes is a
   real problem as group sizes grow large, because (in the absence of
   concast) the only way to convey feedback is via unicast datagrams.
   This fundamentally breaks the multicast abstraction by forcing the
   sender to deal with individuals instead of the group as a whole.
   Moreover, in most cases, the feedback recipient is not interested in
   the individuals' information, but rather some function -- for example
   the maximum or minimum -- of the group's information.  Support for
   the concast abstraction allows such "summary information" to be
   provided in a scalable way, by computing it at strategic points along
   the way.

   Support for concast requires modifications to the Internet Protocol
   modules of hosts and routers.  It does not, however, require any
   additional routing or forwarding capabilities beyond those required
   for unicast.  In particular, concast does not depend on multicast in
   any way.  Concast service can be provided on an end-system-only
   basis, though router support is necessary for scalability (in terms
   of the group size supportable without implosion).  Partial deployment
   among routers is beneficial, and indeed most of the scalability and
   implosion-prevention benefits are likely to be attainable by
   deployment of concast at select routers at domain boundaries.  This
   document describes extensions to Version 4 of the Internet Protocol.
   Similar extensions can be defined for IPv6.

   The next section provides an overview of the service and its use.
   Section 3 defines the semantic framework for merging datagrams, and
   gives an example of its use.  Section 4 describes the processing of
   concast datagrams by the IP implementations of  concast-capable
   nodes, in terms of the semantic framework. Security considerations
   are discussed in Section 5.


2. Service Overview


   The unit of concast service is the "flow".  Concast flows are


Calvert et al                                                   [Page 3]

INTERNET-DRAFT          Internet Concast Service               July 2000


   unidirectional: data travels only from the senders to the (single)
   receiver.  Each concast flow is identified by a pair (R,G), where R
   is the (unicast) IP address of the receiver and G is a concast group
   identifier.  Concast group IDs are 32 bit numbers chosen by the
   receiver.  Note that different receiving applications on the same
   host need to use different group IDs so their flows can be
   distinguished.

   Each concast flow has an associated Merge Specification, which is
   chosen by the receiver and specified at flow creation time.  The
   Merge Specification defines the relationship between datagrams
   delivered to the receiver application and those transmitted by the
   senders.

   Thus to use concast, receiver and senders must agree (through some
   out-of-band means) on two things: the concast group ID and the Merge
   Specification.  Senders must transmit datagrams containing
   information in the format expected by the Merge Specification.

   A concast-capable node N maintains state information for each concast
   flow (R,G) passing through it (i.e., for which N is on the path to R
   from some sender participating in the flow).  Responsibility for
   establishment and maintainance of this per-flow information belongs
   to the Concast Signaling Protocol (CSP), which is described in a
   separate document [1].  CSP uses soft-state techniques to ensure that
   the concast service is robustness in the face of route changes.  The
   per-flow information includes the identities of all concast-capable
   nodes "upstream" of N on the flow, and state relevant to the ongoing
   merge processing of messages sent on the flow.

   In contrast to multicast -- as multicast is currently specified and
   implemented in the Internet [2] -- both senders and receiver are
   required to signal the network before using the concast service.
   (Multicast only requires receivers to signal.)  A benefit of this
   "uniform" signaling requirement is that it provides an opportunity
   for authentication and authorization checks on users of the service.
   This is likely to be important since router support for concast
   requires the maintenance of per-flow state.

2.1 Concast Datagram Format

   Concast datagrams are distinguished from ordinary IP datagrams simply
   by the presence of a "Concast ID" option in the IP header.  The
   Concast ID option contains the group number G, and allows concast
   datagrams to be recognized as such by concast-capable routers so they
   can be diverted for processing.  (Concast-oblivious routers do not
   recognize the concast ID option, and simply forward concast datagrams
   as if they were regular unicast datagrams.)


Calvert et al                                                   [Page 4]

INTERNET-DRAFT          Internet Concast Service               July 2000


   The source address field of a concast datagram's IP header contains
   the IP address of the last concast-capable node to process the
   datagram.  This enables a node processing an incoming datagram to
   check that the datagram was forwarded by one of that node's known
   upstream neighbors.  (This check is of course not secure. See Section
   5.)

2.2 Concast Flow Lifecycle

   The normal sequence of events for establishing and using a concast
   flow is as follows:

     1. The receiver creates the flow by supplying its local IP concast
     module with the concast group ID G, the preferred receiver address
     R, and the Merge Specification.

     2. Nodes wishing to join the group and participate as senders do so
     by supplying (R,G) to their local IP concast module. This invokes
     the signaling protocol, which causes flow state (including the
     Merge Specification) to be established along the paths from the
     senders to the receiver.

     3. Senders transmit packets as usual.  Each sender's IP concast
     implementation ensures that each sent concast datagram carries a
     concast group ID option with value G.

     4. As concast datagrams travel hop-by-hop toward the receiver, at
     each concast-capable node (including --at least-- the receiving
     host) they are diverted for concast processing.  This involves
     retrieving the state for the (R,G) flow and carrying out the
     computation defined by the merge specification for that flow.
     Packets are forwarded (toward $R$) after processing only under the
     conditions defined by the merge specification.

     5. When the receiving application receives data from the flow via
     its network API, concast messages appear to have been sent by the
     concast group; they contain the result of merging the messages sent
     by group members.

     6. Senders may leave the concast group at any time.  When a sender
     leaves the group, the signaling protocol is invoked to inform the
     sender's downstream neighbor that one of its upstream neighbors is
     going away.  When a node has no remaining upstream neighbors, it
     recursively informs its downstream neighbor that it is leaving the
     flow. In this way, the concast "tree" grows and shrinks as senders
     join and leave the group.

     7. The receiving application may tear down the flow at any time.


Calvert et al                                                   [Page 5]

INTERNET-DRAFT          Internet Concast Service               July 2000


     The signaling protocol then notifies all upstream neighbors that
     the flow has gone away.  Those nodes inform their neighbors, and so
     on, until all state for the flow has been removed from the system.

   The Concast Signaling Protocol, including the soft-state techniques
   that are used to detect route changes and connectivity problems, is
   defined elsewhere [1].


3. Merge Semantics


   The nature of the merge function determines the utility of the
   concast service for any particular application; different
   applications in general need different semantics.  We therefore the
   general form of the merge computation by defining certain steps to be
   common to all merge functions, and by defining the "shape" of the
   variable parts.

   The merge semantics tells which datagrams within a flow are to be
   merged together, and defines the computation that takes as input the
   (possibly many) messages sent, and produces as output the message
   ultimately delivered to the receiver.

   In the context of a particular concast flow, a Datagram Equivalence
   Class (DEC) is defined as a set of concast datagrams to be merged
   together.  A flow may have datagrams belonging to multiple DECs in
   the network at the same time.  For example, in the inverse multicast
   (duplicate suppression) service mentioned in the first section, two
   packets could be in the same DEC if they have the destination IP
   address, concast group ID, and IP payload.

   Because packets are processed one at a time as they arrive, each
   concast node maintains a "merge state block" (MSB) for each active
   DEC of a flow.  To limit the amount of per-flow state, the size and
   number of active MSBs will in general be limited by the concast
   implementation.  When a merge computation is ``finished'', a concast
   datagram is constructed using information in the MSB, and forwarded
   toward the destination R.

   Based on the foregoing, we can outline the steps in "merge"
   processing, given the merge specification for flow (R,G):

     1. Determine the datagram equivalence class to which the datagram
     belongs.

     2. Retrieve the Merge State Block for that DEC.


Calvert et al                                                   [Page 6]

INTERNET-DRAFT          Internet Concast Service               July 2000


     3. Update the contents of the MSB using the old contents and the
     datagram according to the merge specification.

     4. If the computation is finished according to the merge
     specification, construct and forward an IP datagram with
     destination address equal to R, source address equal to the IP
     address of the interface that leads toward R (or the concast group
     ID if this node is R), the concast group ID option with value G,
     and payload constructed from the MSB according to the merge spec.

3.1 Variable Components of the Merge Specification

   The semantics of merge are defined by giving definitions for certain
   types and methods.  In defining those methods, the following types
   are considered to be  given:

     The type DECTag, of tags that identify Datagram Equivalence
     Classes.

   A Merge Specification consists of precise definitions of the
   following types and functions:

     The type MergeStateBlock, which defines the state information to be
     stored for in-progress merges.  The maximum size of a
     MergeStateBlock must be fixed at the time of the definition.

     The function getTag(), which takes a concast datagram as input and
     returns a DECTag.  This function determines the Datagram
     Equivalence Class to which a given packet belongs.  Typically this
     function will extract a value from a particular location or
     locations in the datagram (header and/or payload).  It might also
     compute a digest of the datagram's payload.

     The function merge(), which takes a MergeStateBlock, a concast
     datagram, and per-flow information and returns an updated
     MergeStateBlock.  This function does the real work of merging,
     combining information from an incoming datagram with information
     derived from previously processed datagrams.

     The predicate done() on MergeStateBlocks, which returns "true" when
     a datagram needs to be constructed and forwarded to R.

     The function buildDatagram(), which takes a MergeStateBlock and
     returns a pair consisting of an 8-bit IP protocol number and a
     concast datagram payload.

   The method of specification of the above functions is beyond the
   scope of this memo.


Calvert et al                                                   [Page 7]

INTERNET-DRAFT          Internet Concast Service               July 2000


3.2 Generic Portion of the Merge Specification

   The fixed component of the merge semantics defines that portion of
   the merge computation that is the same on every node, for every merge
   function.  It is specified by the pseudocode below.

          ProcessDatagram(IPAddr R, ConcastGroupID G, ConcastDatagram m)
          // Generic concast merge processing
          {
           FlowStateBlock fsb;          // flow state for flow (R,G)
           DECTag t;                    // tag for m's DEC
           MergeStateBlock s;

           fsb = lookUpFlow(R,G);        // get relevant flow state
           if (fsb != NULL) {
              t = fsb.getTag(m);         // get the DEC
              s = fsb.findMergeState(t); // get state of in-progress merge
              s = fsb.merge(s,m,fsb);    // merge computation
              if (fsb.done(s)) {         // time to send something on?
                ProtocolNumber p;        // ...of the forwarded datagram
                IPPayload merged;        // partial merge result
                (s,p,merged) = fsb.buildDatagram(s);
                forwardConcastDG(R,G,p,merged); // on toward R
              }
              fsb.saveMergeState(s,t);   // replace old state
           }
   The data type "FlowStateBlock" encapsulates flow-specific information
   that might be useful to the merge computation, for example the list
   of upstream neighbors of the current node in the concast tree.  The
   semantics of the methods "lookUpFlow" and "findMergeState" should be
   clear.

   The method "forwardConcastDG" constructs an IP datagram with a given
   destination address, a source address equal to that of the interface
   that leads toward the given destination, the given given Concast
   Group ID option value, a given protocol number in the protocol field,
   and a given payload, and then forwards that datagram toward the given
   destination.

   The semantics of "saveMergeState" require further discussion.  The
   intent is that an association (t,s) is created between the given tag
   and the given state, replacing previous association (t,s') for this
   flow.  In order to bound the amount of per-flow state kept at a node,
   a limit (MAX_ACTIVE_DECS) is placed on the number of distinct DECTag
   values that may have MSBs bound to them at any instant.  When that
   limit is reached, calls to saveMergeState with a new DECTag value
   will result in some (tag, state) pair being evicted from the state
   store.  Applications can avoid this by ensuring that concast


Calvert et al                                                   [Page 8]

INTERNET-DRAFT          Internet Concast Service               July 2000


   datagrams belonging to at most MAX_ACTIVE_DECS datagram equivalence
   classes are in the network at any time.  The value of MAX_ACTIVE_DECS
   should be globally defined and published, so that applications can
   limit their sending rate to observe this limitation.

3.3 Example Merge Function

   To illustrate the use of the Merge Specification framework, we
   present a definition of the "inverse multicast" (duplicate
   suppression) service mentioned in the Introduction.  To implement
   this service, the network must "remember" each datagram that is
   delivered to the reciever, and suppress subsequent copies without
   forwarding them.  For the purposes of this service, two datagrams
   belonging to the same flow are considered "identical" if they have
   the same payload.  In other words, datagrams with the same payload
   belong to the same DEC.

   In principle, the only state needed for a DEC is that fact that a
   datagram in that DEC has already been forwarded by a node.  However,
   because of the way the generic part of the computation is structured,
   on the first arrival of a datagram from a DEC, the merge state must
   record the fact that no merge state was found for that DEC
   originally.  Therefore we define:
       typedef MergeStateBlock {
          boolean forwarded;
          IPPayload pendingDG;
       }
   The DECTag value is computed by taking the MD5 hash [3] of the
   payload of the given datagram:
       DECTag getTag(ConcastDatagram m)
       {
         return (DECTag) MD5hash(m.payload);
       }
   The merge() function simply records the datagram for forwarding.
       MergeStateBlock merge(MergeStateBlock s, ConcastDatagram m, FSB f)
       {
         if (s==NULL) {
           create a new MergeStateBlock newState;
           newState.forwarded := false;
           newState.pendingDG := m;
           return newState;
         } else
           return s;
       }
   The done() function simply checks whether the packet has already been
   forwarded:
       boolean done(MergeStateBlock s)
       {


Calvert et al                                                   [Page 9]

INTERNET-DRAFT          Internet Concast Service               July 2000


         return NOT(s.forwarded);
       }
   The buildDatagram() function updates the state to indicate that the
   datagram has been forwarded, returns updated state along with the
   protocol number and payload of the datagram:
       (MergeStateBlock,ProtocolNumber,IPPayload)
        buildDatagram(MergeStateBlock s)
        {
          s.forwarded := true;
          return (s,s.pendingDG.IPprotocol,s.pendingDG.IPpayload);
        }

3.4  Discussion

   The definition of particular merge functions using code like that of
   Section 3.3 does not imply that the actual processing should be
   accomplished in software.  For the example above, the computation can
   be very efficiently implemented in hardware, and it is expected that
   routers supporting this merge function would do so.

   However, for flexibility it may also be useful to support merge
   functions given in user-supplied software coded in some restricted-
   but-high-level language, for example a limited subset of Java.
   Obviously some constructs (e.g. recursion, dynamic storage
   allocation, unbounded iteration) should be prohibited in such code.
   On the other hand, certain functionality may be needed by such code.

   For example, in some cases it is useful to initiate merge processing
   via the passage of time, rather than datagram arrival.  This
   capability can be provided by providing a method by which user-
   supplied code can arrange for the last portion of the merge
   processing (beginning with the done() test) to be executed a
   specified amount of time in the future.  To limit overhead, each flow
   is permitted at most one pending timeout-callback at any time.

3.5 Fragmentation

   The notion of a datagram equivalence class is well-defined only for
   complete (unfragemented) IP datagrams. Therefore it is necessary for
   applications using concast to send datagrams that will not be
   fragmented in the network.  This can be achieved either by performing
   path MTU discovery for the path between each sender and the receiver,
   or by sending datagrams smaller than the minimum IP datagram size.


4. Levels of Support


Calvert et al                                                  [Page 10]

INTERNET-DRAFT          Internet Concast Service               July 2000


   A node participating in a concast flow MUST implement some part of
   the the Concast Signaling Protocol (CSP) [1].  The parts of CSP that
   must be implemented depend on the level of concast support provided
   by the node.  Some nodes may only be able to originate concast
   datagrams and thus do not need to implement the receiving or merging
   components of the CSP protocol.  Other nodes may only be able to
   receive concast messages.  Some nodes (in fact we expect most end
   systems) will support both sending and receiving.  Internal network
   nodes need only to support the merge processing described earlier.
   Legacy nodes that do not support concast at all, simply need to
   forward concast packets as if they were unicast.

4.1 Sending Host Processing

   In order for a host to participate as a sender in a concast group, it
   needs to support the portion of the CSP protocol that signals the
   node's intent to join (or leave) as a sender.  Once CSP has
   established the necessary state information to link the sender into
   the concast flow, the sender can begin transmitting concast
   datagrams.  Specifically, senders must mark outgoing packets as
   concast packets requiring hop-by-hop processing.  This is achieved
   simply by inserting a "Concast ID" option in the IP header containing
   the concast group G from which this packet originated.  The packet is
   then routed and transmitted using the standard IP mechanism.

4.2 Receiving Host Processing

   Applications join the group G as a receiver by identifying (via a
   system call) the concast flow to be joined (R,G) and providing the
   Merge Specification for the flow.  Concast receivers must support the
   parts of the CSP protocol that respond to requests for Merge
   Specifications and Join requests.  Once the CSP protocol establishes
   the flow and distributes the Merge Specification, concast datagrams
   will begin arriving at the receiver.  The receiver's IP module must
   recognize the "Concast ID" option and divert the incoming packet for
   merge processing as specified in Section 3.  The only difference
   between merge processing at internal network nodes and merge
   processing at the receiver is the fact that the forwardConcastDG
   function must deliver the resulting packet to a local application
   rather than applying the normal IP routing and sending it over-the-
   wire.

4.3 Per-flow State Considerations

   Because sender-only nodes simply mark outgoing packets as concast
   packets, the state information maintained at such nodes will be
   minimal.  However, because merge state accumulates at internal
   concast-capable nodes and at concast receivers, state size could


Calvert et al                                                  [Page 11]

INTERNET-DRAFT          Internet Concast Service               July 2000


   potentially grow without bound.  Consequently we rely on the
   fsb.saveMergeState function to limit the amount of state information
   any particular flow can consume.  The language used to construct the
   Merge Specification may also impose limits on the amount of state
   information that can be saved.  Assuming the merge state is bounded,
   the state needed to maintain information about flows is similar to
   the state used by shortest-path multicast routing protocols.


5 Security Considerations


   In the absence of strong authentication applied to each packet at
   each concast-capable node, packets can be inserted into a concast
   flow by nodes that have not joined the flow.  This may corrupt the
   information delivered to the receiver. However, the same threat
   exists when unicast is used to deliver the information.

   It is straightforward to add an authentication check to the generic
   merge processing of Section 3.2.  Moreover, the signaling phase
   provides an opportunity for establishment of the necessary security
   associations between neighbors in the concast tree (indeed this is
   one motivation for requiring signaling by all parties).  However, the
   security provided is necessarily hop-by-hop.

   [Discussion of extension of concast tree to be added.]

   In some earlier descriptions of concast, concast datagrams were
   identified by the presence of a group address (e.g. a Class E or
   multicast address) in the Source field of the IP header.  While this
   approach is nicely symmetric with multicast (no individual address is
   associated with the source of the datagram), it conflicts with anti-
   spoofing source checks applied by some nodes in the current Internet.
   Unfortunately, modification of such checks to allow packets with
   concast source addresses to pass opens up the possibility of
   untraceable denial-of-service attacks on concast-capable hosts that
   reside in domains with no concast-capable routers, and therefore the
   method of marking concast datagrams was changed.


6 Acknowledgements


   The authors acknowledge the contributions of Amit Sehgal, Billy
   Mullins, and Su Wen to this work.

   The support of the Defense Advanced Research Projects Agency (DARPA)
   and Air Force Research Laboratory, Air Force Materiel Command, USAF,


Calvert et al                                                  [Page 12]

INTERNET-DRAFT          Internet Concast Service               July 2000


   under agreement number F30602-99-1-0514, is gratefully acknowledged.
   The views and conclusions contained herein are those of the authors
   and should not be interpreted as necessarily representing the
   official policies or endorsements, either expressed or implied, of
   the Defense Advanced Research Projects Agency (DARPA), the Air Force
   Research Laboratory, or the U.S. Government.


References


   [1] Calvert, K. L. and Griffioen, J. N., "Concast Signaling
       Protocol", Internet-Draft, in preparation.

   [2] Deering, S., "Host Requirements for IP Multicasting", RFC 1112,
       August 1989.

   [3] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April
   1992.


Authors' Address:

   Kenneth L. Calvert  (calvert@dcs.uky.edu)
   James N. Griffioen (griff@dcs.uky.edu)
   Computer Science Department
   University of Kentucky
   773 Anderson Hall
   Lexington, KY 40506-0046


Calvert et al                                                  [Page 13]