Internet-Draft K. Calvert, J. Griffioen University of Kentucky Expires January 2001 July 2000 Internet Concast Service draft-calvert-concast-svc-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. The distribution of this memo is unlimited. It is filed as and expires January 16, 2001. Please send comments to the authors. Abstract Concast is a many-to-one best-effort network service that allows a receiver to treat a group of senders as a single entity, in much the same way that IP multicast allows a sender to treat a group of receivers as one. Each concast datagram delivered to a receiver is derived from (possibly many) datagrams sent by different members of the concast group to that receiver, according to a "merge specification". Concast allows the semantics of this merging operation to vary to suit the needs of different applications. This document describes the concast service and presents a framework for defining merge semantics. It also describes the processing of concast datagrams by concast-capable routers and hosts. The concast signaling protocol (CSP), an integral part of the concast service, is specified in a separate document. Calvert et al [Page 1] INTERNET-DRAFT Internet Concast Service July 2000 Concast is incrementally deployable and backward compatible with IPv4 and IPv6. It can be implemented entirely in end systems, but is most scalable when supported by routers in the network. 1. Introduction Multicast has been an Internet service for many years now [1]. Its semantics are simple: when a host sends a packet to a multicast address, the network makes its best effort to deliver a copy to all hosts in the group. The network keeps track of receiving hosts' locations, and duplicates datagrams as needed while forwarding them toward all receivers. The power of multicast is in its abstraction mechanism, which enables a sender to treat an arbitrary number of receivers as a single entity. Concast is intended to provide a similar abstraction in the reverse direction: it enables a receiver to treat an arbitrary number of senders as a single entity. When multiple senders transmit concast datagrams to the same receiver, the network makes its best effort to "merge" them into a single message for delivery to that receiver. The utility of such a service depends on the semantics of the merging operation performed by the network layer. It seems unlikely that any single (necessarily application-independent) semantics would have sufficiently broad applicability to justify implementation of the concast service. Therefore concast is designed to allow for a broad range of merge semantics, all fitting within a certain framework. The following examples illustrate a range of possible merge semantics: o Inverse multicast/duplicate suppression: at most one copy of any datagram is delivered to the receiver within a particular window of time. o Voting: each datagram contains a value chosen by its sender. When some threshold number of datagrams has been sent, a single datagram containing the value that occurred most often in the sent datagrams is delivered. o Applying an associative and commutative operator: each datagram contains a value. The maximum (minimum, sum, product, conjunction, disjunction, bitwise conjunction, bitwise disjunction) of the values in all sent datagrams is placed in the datagram delivered to the receiver. It is envisioned that certain simple merge functions like these will Calvert et al [Page 2] INTERNET-DRAFT Internet Concast Service July 2000 be "hardwired" into the network. The merge framework defined later allows for new merge semantics to be specified simply by defining certain functions that make up the framework. For maximum flexibility, receivers could supply such definitions using an encoding interpreted by all concast-capable nodes. The nature of this encoding determines the power of the computations permitted for merge specifications. While concast can be used alone, it is especially useful in conjunction with multicast. Many multicast applications require some form of feedback from the receiver set. For such applications, implosion at the multicast source or at internal network nodes is a real problem as group sizes grow large, because (in the absence of concast) the only way to convey feedback is via unicast datagrams. This fundamentally breaks the multicast abstraction by forcing the sender to deal with individuals instead of the group as a whole. Moreover, in most cases, the feedback recipient is not interested in the individuals' information, but rather some function -- for example the maximum or minimum -- of the group's information. Support for the concast abstraction allows such "summary information" to be provided in a scalable way, by computing it at strategic points along the way. Support for concast requires modifications to the Internet Protocol modules of hosts and routers. It does not, however, require any additional routing or forwarding capabilities beyond those required for unicast. In particular, concast does not depend on multicast in any way. Concast service can be provided on an end-system-only basis, though router support is necessary for scalability (in terms of the group size supportable without implosion). Partial deployment among routers is beneficial, and indeed most of the scalability and implosion-prevention benefits are likely to be attainable by deployment of concast at select routers at domain boundaries. This document describes extensions to Version 4 of the Internet Protocol. Similar extensions can be defined for IPv6. The next section provides an overview of the service and its use. Section 3 defines the semantic framework for merging datagrams, and gives an example of its use. Section 4 describes the processing of concast datagrams by the IP implementations of concast-capable nodes, in terms of the semantic framework. Security considerations are discussed in Section 5. 2. Service Overview The unit of concast service is the "flow". Concast flows are Calvert et al [Page 3] INTERNET-DRAFT Internet Concast Service July 2000 unidirectional: data travels only from the senders to the (single) receiver. Each concast flow is identified by a pair (R,G), where R is the (unicast) IP address of the receiver and G is a concast group identifier. Concast group IDs are 32 bit numbers chosen by the receiver. Note that different receiving applications on the same host need to use different group IDs so their flows can be distinguished. Each concast flow has an associated Merge Specification, which is chosen by the receiver and specified at flow creation time. The Merge Specification defines the relationship between datagrams delivered to the receiver application and those transmitted by the senders. Thus to use concast, receiver and senders must agree (through some out-of-band means) on two things: the concast group ID and the Merge Specification. Senders must transmit datagrams containing information in the format expected by the Merge Specification. A concast-capable node N maintains state information for each concast flow (R,G) passing through it (i.e., for which N is on the path to R from some sender participating in the flow). Responsibility for establishment and maintainance of this per-flow information belongs to the Concast Signaling Protocol (CSP), which is described in a separate document [1]. CSP uses soft-state techniques to ensure that the concast service is robustness in the face of route changes. The per-flow information includes the identities of all concast-capable nodes "upstream" of N on the flow, and state relevant to the ongoing merge processing of messages sent on the flow. In contrast to multicast -- as multicast is currently specified and implemented in the Internet [2] -- both senders and receiver are required to signal the network before using the concast service. (Multicast only requires receivers to signal.) A benefit of this "uniform" signaling requirement is that it provides an opportunity for authentication and authorization checks on users of the service. This is likely to be important since router support for concast requires the maintenance of per-flow state. 2.1 Concast Datagram Format Concast datagrams are distinguished from ordinary IP datagrams simply by the presence of a "Concast ID" option in the IP header. The Concast ID option contains the group number G, and allows concast datagrams to be recognized as such by concast-capable routers so they can be diverted for processing. (Concast-oblivious routers do not recognize the concast ID option, and simply forward concast datagrams as if they were regular unicast datagrams.) Calvert et al [Page 4] INTERNET-DRAFT Internet Concast Service July 2000 The source address field of a concast datagram's IP header contains the IP address of the last concast-capable node to process the datagram. This enables a node processing an incoming datagram to check that the datagram was forwarded by one of that node's known upstream neighbors. (This check is of course not secure. See Section 5.) 2.2 Concast Flow Lifecycle The normal sequence of events for establishing and using a concast flow is as follows: 1. The receiver creates the flow by supplying its local IP concast module with the concast group ID G, the preferred receiver address R, and the Merge Specification. 2. Nodes wishing to join the group and participate as senders do so by supplying (R,G) to their local IP concast module. This invokes the signaling protocol, which causes flow state (including the Merge Specification) to be established along the paths from the senders to the receiver. 3. Senders transmit packets as usual. Each sender's IP concast implementation ensures that each sent concast datagram carries a concast group ID option with value G. 4. As concast datagrams travel hop-by-hop toward the receiver, at each concast-capable node (including --at least-- the receiving host) they are diverted for concast processing. This involves retrieving the state for the (R,G) flow and carrying out the computation defined by the merge specification for that flow. Packets are forwarded (toward $R$) after processing only under the conditions defined by the merge specification. 5. When the receiving application receives data from the flow via its network API, concast messages appear to have been sent by the concast group; they contain the result of merging the messages sent by group members. 6. Senders may leave the concast group at any time. When a sender leaves the group, the signaling protocol is invoked to inform the sender's downstream neighbor that one of its upstream neighbors is going away. When a node has no remaining upstream neighbors, it recursively informs its downstream neighbor that it is leaving the flow. In this way, the concast "tree" grows and shrinks as senders join and leave the group. 7. The receiving application may tear down the flow at any time. Calvert et al [Page 5] INTERNET-DRAFT Internet Concast Service July 2000 The signaling protocol then notifies all upstream neighbors that the flow has gone away. Those nodes inform their neighbors, and so on, until all state for the flow has been removed from the system. The Concast Signaling Protocol, including the soft-state techniques that are used to detect route changes and connectivity problems, is defined elsewhere [1]. 3. Merge Semantics The nature of the merge function determines the utility of the concast service for any particular application; different applications in general need different semantics. We therefore the general form of the merge computation by defining certain steps to be common to all merge functions, and by defining the "shape" of the variable parts. The merge semantics tells which datagrams within a flow are to be merged together, and defines the computation that takes as input the (possibly many) messages sent, and produces as output the message ultimately delivered to the receiver. In the context of a particular concast flow, a Datagram Equivalence Class (DEC) is defined as a set of concast datagrams to be merged together. A flow may have datagrams belonging to multiple DECs in the network at the same time. For example, in the inverse multicast (duplicate suppression) service mentioned in the first section, two packets could be in the same DEC if they have the destination IP address, concast group ID, and IP payload. Because packets are processed one at a time as they arrive, each concast node maintains a "merge state block" (MSB) for each active DEC of a flow. To limit the amount of per-flow state, the size and number of active MSBs will in general be limited by the concast implementation. When a merge computation is ``finished'', a concast datagram is constructed using information in the MSB, and forwarded toward the destination R. Based on the foregoing, we can outline the steps in "merge" processing, given the merge specification for flow (R,G): 1. Determine the datagram equivalence class to which the datagram belongs. 2. Retrieve the Merge State Block for that DEC. Calvert et al [Page 6] INTERNET-DRAFT Internet Concast Service July 2000 3. Update the contents of the MSB using the old contents and the datagram according to the merge specification. 4. If the computation is finished according to the merge specification, construct and forward an IP datagram with destination address equal to R, source address equal to the IP address of the interface that leads toward R (or the concast group ID if this node is R), the concast group ID option with value G, and payload constructed from the MSB according to the merge spec. 3.1 Variable Components of the Merge Specification The semantics of merge are defined by giving definitions for certain types and methods. In defining those methods, the following types are considered to be given: The type DECTag, of tags that identify Datagram Equivalence Classes. A Merge Specification consists of precise definitions of the following types and functions: The type MergeStateBlock, which defines the state information to be stored for in-progress merges. The maximum size of a MergeStateBlock must be fixed at the time of the definition. The function getTag(), which takes a concast datagram as input and returns a DECTag. This function determines the Datagram Equivalence Class to which a given packet belongs. Typically this function will extract a value from a particular location or locations in the datagram (header and/or payload). It might also compute a digest of the datagram's payload. The function merge(), which takes a MergeStateBlock, a concast datagram, and per-flow information and returns an updated MergeStateBlock. This function does the real work of merging, combining information from an incoming datagram with information derived from previously processed datagrams. The predicate done() on MergeStateBlocks, which returns "true" when a datagram needs to be constructed and forwarded to R. The function buildDatagram(), which takes a MergeStateBlock and returns a pair consisting of an 8-bit IP protocol number and a concast datagram payload. The method of specification of the above functions is beyond the scope of this memo. Calvert et al [Page 7] INTERNET-DRAFT Internet Concast Service July 2000 3.2 Generic Portion of the Merge Specification The fixed component of the merge semantics defines that portion of the merge computation that is the same on every node, for every merge function. It is specified by the pseudocode below. ProcessDatagram(IPAddr R, ConcastGroupID G, ConcastDatagram m) // Generic concast merge processing { FlowStateBlock fsb; // flow state for flow (R,G) DECTag t; // tag for m's DEC MergeStateBlock s; fsb = lookUpFlow(R,G); // get relevant flow state if (fsb != NULL) { t = fsb.getTag(m); // get the DEC s = fsb.findMergeState(t); // get state of in-progress merge s = fsb.merge(s,m,fsb); // merge computation if (fsb.done(s)) { // time to send something on? ProtocolNumber p; // ...of the forwarded datagram IPPayload merged; // partial merge result (s,p,merged) = fsb.buildDatagram(s); forwardConcastDG(R,G,p,merged); // on toward R } fsb.saveMergeState(s,t); // replace old state } The data type "FlowStateBlock" encapsulates flow-specific information that might be useful to the merge computation, for example the list of upstream neighbors of the current node in the concast tree. The semantics of the methods "lookUpFlow" and "findMergeState" should be clear. The method "forwardConcastDG" constructs an IP datagram with a given destination address, a source address equal to that of the interface that leads toward the given destination, the given given Concast Group ID option value, a given protocol number in the protocol field, and a given payload, and then forwards that datagram toward the given destination. The semantics of "saveMergeState" require further discussion. The intent is that an association (t,s) is created between the given tag and the given state, replacing previous association (t,s') for this flow. In order to bound the amount of per-flow state kept at a node, a limit (MAX_ACTIVE_DECS) is placed on the number of distinct DECTag values that may have MSBs bound to them at any instant. When that limit is reached, calls to saveMergeState with a new DECTag value will result in some (tag, state) pair being evicted from the state store. Applications can avoid this by ensuring that concast Calvert et al [Page 8] INTERNET-DRAFT Internet Concast Service July 2000 datagrams belonging to at most MAX_ACTIVE_DECS datagram equivalence classes are in the network at any time. The value of MAX_ACTIVE_DECS should be globally defined and published, so that applications can limit their sending rate to observe this limitation. 3.3 Example Merge Function To illustrate the use of the Merge Specification framework, we present a definition of the "inverse multicast" (duplicate suppression) service mentioned in the Introduction. To implement this service, the network must "remember" each datagram that is delivered to the reciever, and suppress subsequent copies without forwarding them. For the purposes of this service, two datagrams belonging to the same flow are considered "identical" if they have the same payload. In other words, datagrams with the same payload belong to the same DEC. In principle, the only state needed for a DEC is that fact that a datagram in that DEC has already been forwarded by a node. However, because of the way the generic part of the computation is structured, on the first arrival of a datagram from a DEC, the merge state must record the fact that no merge state was found for that DEC originally. Therefore we define: typedef MergeStateBlock { boolean forwarded; IPPayload pendingDG; } The DECTag value is computed by taking the MD5 hash [3] of the payload of the given datagram: DECTag getTag(ConcastDatagram m) { return (DECTag) MD5hash(m.payload); } The merge() function simply records the datagram for forwarding. MergeStateBlock merge(MergeStateBlock s, ConcastDatagram m, FSB f) { if (s==NULL) { create a new MergeStateBlock newState; newState.forwarded := false; newState.pendingDG := m; return newState; } else return s; } The done() function simply checks whether the packet has already been forwarded: boolean done(MergeStateBlock s) { Calvert et al [Page 9] INTERNET-DRAFT Internet Concast Service July 2000 return NOT(s.forwarded); } The buildDatagram() function updates the state to indicate that the datagram has been forwarded, returns updated state along with the protocol number and payload of the datagram: (MergeStateBlock,ProtocolNumber,IPPayload) buildDatagram(MergeStateBlock s) { s.forwarded := true; return (s,s.pendingDG.IPprotocol,s.pendingDG.IPpayload); } 3.4 Discussion The definition of particular merge functions using code like that of Section 3.3 does not imply that the actual processing should be accomplished in software. For the example above, the computation can be very efficiently implemented in hardware, and it is expected that routers supporting this merge function would do so. However, for flexibility it may also be useful to support merge functions given in user-supplied software coded in some restricted- but-high-level language, for example a limited subset of Java. Obviously some constructs (e.g. recursion, dynamic storage allocation, unbounded iteration) should be prohibited in such code. On the other hand, certain functionality may be needed by such code. For example, in some cases it is useful to initiate merge processing via the passage of time, rather than datagram arrival. This capability can be provided by providing a method by which user- supplied code can arrange for the last portion of the merge processing (beginning with the done() test) to be executed a specified amount of time in the future. To limit overhead, each flow is permitted at most one pending timeout-callback at any time. 3.5 Fragmentation The notion of a datagram equivalence class is well-defined only for complete (unfragemented) IP datagrams. Therefore it is necessary for applications using concast to send datagrams that will not be fragmented in the network. This can be achieved either by performing path MTU discovery for the path between each sender and the receiver, or by sending datagrams smaller than the minimum IP datagram size. 4. Levels of Support Calvert et al [Page 10] INTERNET-DRAFT Internet Concast Service July 2000 A node participating in a concast flow MUST implement some part of the the Concast Signaling Protocol (CSP) [1]. The parts of CSP that must be implemented depend on the level of concast support provided by the node. Some nodes may only be able to originate concast datagrams and thus do not need to implement the receiving or merging components of the CSP protocol. Other nodes may only be able to receive concast messages. Some nodes (in fact we expect most end systems) will support both sending and receiving. Internal network nodes need only to support the merge processing described earlier. Legacy nodes that do not support concast at all, simply need to forward concast packets as if they were unicast. 4.1 Sending Host Processing In order for a host to participate as a sender in a concast group, it needs to support the portion of the CSP protocol that signals the node's intent to join (or leave) as a sender. Once CSP has established the necessary state information to link the sender into the concast flow, the sender can begin transmitting concast datagrams. Specifically, senders must mark outgoing packets as concast packets requiring hop-by-hop processing. This is achieved simply by inserting a "Concast ID" option in the IP header containing the concast group G from which this packet originated. The packet is then routed and transmitted using the standard IP mechanism. 4.2 Receiving Host Processing Applications join the group G as a receiver by identifying (via a system call) the concast flow to be joined (R,G) and providing the Merge Specification for the flow. Concast receivers must support the parts of the CSP protocol that respond to requests for Merge Specifications and Join requests. Once the CSP protocol establishes the flow and distributes the Merge Specification, concast datagrams will begin arriving at the receiver. The receiver's IP module must recognize the "Concast ID" option and divert the incoming packet for merge processing as specified in Section 3. The only difference between merge processing at internal network nodes and merge processing at the receiver is the fact that the forwardConcastDG function must deliver the resulting packet to a local application rather than applying the normal IP routing and sending it over-the- wire. 4.3 Per-flow State Considerations Because sender-only nodes simply mark outgoing packets as concast packets, the state information maintained at such nodes will be minimal. However, because merge state accumulates at internal concast-capable nodes and at concast receivers, state size could Calvert et al [Page 11] INTERNET-DRAFT Internet Concast Service July 2000 potentially grow without bound. Consequently we rely on the fsb.saveMergeState function to limit the amount of state information any particular flow can consume. The language used to construct the Merge Specification may also impose limits on the amount of state information that can be saved. Assuming the merge state is bounded, the state needed to maintain information about flows is similar to the state used by shortest-path multicast routing protocols. 5 Security Considerations In the absence of strong authentication applied to each packet at each concast-capable node, packets can be inserted into a concast flow by nodes that have not joined the flow. This may corrupt the information delivered to the receiver. However, the same threat exists when unicast is used to deliver the information. It is straightforward to add an authentication check to the generic merge processing of Section 3.2. Moreover, the signaling phase provides an opportunity for establishment of the necessary security associations between neighbors in the concast tree (indeed this is one motivation for requiring signaling by all parties). However, the security provided is necessarily hop-by-hop. [Discussion of extension of concast tree to be added.] In some earlier descriptions of concast, concast datagrams were identified by the presence of a group address (e.g. a Class E or multicast address) in the Source field of the IP header. While this approach is nicely symmetric with multicast (no individual address is associated with the source of the datagram), it conflicts with anti- spoofing source checks applied by some nodes in the current Internet. Unfortunately, modification of such checks to allow packets with concast source addresses to pass opens up the possibility of untraceable denial-of-service attacks on concast-capable hosts that reside in domains with no concast-capable routers, and therefore the method of marking concast datagrams was changed. 6 Acknowledgements The authors acknowledge the contributions of Amit Sehgal, Billy Mullins, and Su Wen to this work. The support of the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory, Air Force Materiel Command, USAF, Calvert et al [Page 12] INTERNET-DRAFT Internet Concast Service July 2000 under agreement number F30602-99-1-0514, is gratefully acknowledged. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA), the Air Force Research Laboratory, or the U.S. Government. References [1] Calvert, K. L. and Griffioen, J. N., "Concast Signaling Protocol", Internet-Draft, in preparation. [2] Deering, S., "Host Requirements for IP Multicasting", RFC 1112, August 1989. [3] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April 1992. Authors' Address: Kenneth L. Calvert (calvert@dcs.uky.edu) James N. Griffioen (griff@dcs.uky.edu) Computer Science Department University of Kentucky 773 Anderson Hall Lexington, KY 40506-0046 Calvert et al [Page 13]