Internet Engineering Task Force IPTEL WG INTERNET-DRAFT ronald davis draft-ietf-iptel-pgrp-framework-00.txt Lucent Technologies 20 November 1998 Expires 20 May 1999 A Framework for a Peer Gatekeeper Routing Protocol STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), it's areas, and it's working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract Within the ITU H.323 recommendation, the gatekeeper is a central point of communication for H.323 elements within a zone. All elements within the zone establish a communication channel with the gatekeeper over a registration, admission, and status (RAS) channel. Elements within a zone register with the gatekeeper and are subsequently able to communicate with other elements in the zone, or outside of the zone. What is needed in addition to the normal registration and connection admission procedures of H.323 is a means of acquiring information about elements in other zones. This document describes the framework for a peer gatekeeper routing protocol (pgrp) which allows gatekeepers to exchange information with other gatekeepers about elements in their respective zones. PGRP is a protocol supports the exchange of information among davis [Page 1] Internet Draft PGRP Framework Expires 20 May 1999 gatekeepers which may be used to make call routing decisions in a network. PGRP attempts to extend reliability concepts from telecommunications by incorporating maintenance state information exchange. This allows call routing decisions to be based upon the operational state of elements in the network. This feature is particularly useful in the H.323 gatekeeper mediated call model where a call may be completed to a pool of connection endpoints. An example of such a scenario is in the use of an H.323 to connect telecom end offices which connect to multiple H.323 gateways. In this context, pgrp is also able to support call distribution to balance the call load among the connection gateways serving an end office. Table of Contents 1. Introduction...............................................3 2. Peer Gatekeeper Routing Protocol...........................5 2.1 Protocol Overview..........................................5 2.2 Network Initialization....................................10 2.2.1 Gatekeeper Initialization.................................10 2.2.1.1 Zone Registration.........................................10 2.2.1.2 Locating a Topology Server................................11 2.2.1.3 gatekeeper <--> topology server information exchange......11 2.2.2 support for redundant gatekeeper architecture.............12 2.3 scaling the topology server architecture..................12 2.3.1 topology server initialization............................14 2.3.1.1 hello protocol description................................14 2.3.1.2 election of designated and backup topology servers........16 2.3.1.3 topology state information exchange.......................17 2.3.2 requesting topology state updates.........................19 2.3.3 topology state information updates........................20 davis [Page 2] Internet Draft PGRP Framework Expires 20 May 1999 2.4 summary of system initialization..........................21 3. recovery from system errors...............................23 3.1 introduction..............................................23 3.2 aging of topology state information.......................24 3.3 failure of designated topology server.....................25 3.4 failure of backup topology server.........................26 3.5 topology server failure detected by a gatekeeper..........26 3.6 gatekeeper failure detected by a topology server..........28 3.7 gatekeeper failure detected by elements in zone...........29 3.7.1 detection of simultaneous registration in multiple zones..29 3.7.2 handling of existing calls by a new gatekeeper............29 3.8 Element Failure Detected by the Gatekeeper................30 3.9 Connection Failure Between Elements in the Same Zone......30 3.10 Connection Failure Between Elements in the Different Zones30 3.11 Connection Failure Between Elements in the Different Areas31 4. References................................................31 5. Full Copyright Statement..................................31 6. security..................................................32 7. Author's Address..........................................32 1. Introduction ITU recommendation H.323 supports mechanisms by which H.323 terminals and gateways may register with a gatekeeper [1]. Through this exchange, the gatekeeper derives direct knowledge of the elements in it's zone based upon the information exchanged during the registration process [2]. This information provides the gatekeeper with direct knowledge of the elements in the zone which the davis [Page 3] Internet Draft PGRP Framework Expires 20 May 1999 gatekeeper is able to use in routing calls among elements in the zone. What is needed in addition to information obtains through the normal registration and connection admission procedures of H.323 is a means of acquiring information about elements in other zones. There are two means of gaining this information: 1. Static provisioning: in which the gatekeeper and/or elements within the zone maintain a static table containing information about all elements in all zones. 2. Gatekeeper information exchange: in which gatekeepers acquire information about elements in other zones through a peer exchange of information with other gatekeepers. One of the key disadvantages of the static provisioning approach is in it's inability to capture dynamic state information about the network. For example, if certain elements are unavailable, or otherwise out of service, there is no direct means for a call initiating element to take this information into account when making decisions about how to route a call through the network. While H.323 was initially proposed as an access-type protocol along the lines of Q.931 a number of proposals have emerged for the use of H.323 networks to provide tandem interconnection between telecom end offices using gateways as originating and terminating endpoints in the H.323 network. In this context, there may be multiple H.323 gateways serving a given end office. Thus, for a network consisting of multiple zones, it is useful for the gatekeeper serving the originating endpoint (or originating gatekeeper) to have information about terminating endpoints in another zone in order to route a call to a selected terminating endpoint - information which is not available to the originating gatekeeper through normal H.323 registration procedures. PGRP provides a mechanism by which gatekeepers may acquire knowledge of both static and dynamic information from other zones. Some of the key concepts in pgrp are: - Zone: The basic information unit in pgrp. As defined in ITU recommendation H.323, the zone consists of a gatekeeper and davis [Page 4] Internet Draft PGRP Framework Expires 20 May 1999 H.323 elements (endpoints, and gateways) which establish RAS channel communications with the gatekeeper. - Topology Server: An element which distributes information among gatekeepers in an area. This exchange allows the gatekeeper to obtain information to be used for establishing connectivity to elements outside of the zone. - Area: A collection of zones associated with a topology server. Gatekeepers within an area establish a client-server relationship with the topology server serving the area. - Network: A collection of areas. Topology servers exchange information with one another about their respective areas. In turn, each topology server distributes the information to each of the gatekeepers within it's area. As a result, each gatekeeper has a network level view of connectivity among elements in the network. - Maintenance of Dynamic State Information: PGRP allows gatekeepers to be kept up to date about changes in the network. This includes changes in operational state of existing elements, addition of new elements, and removal of existing elements. PGRP allows gatekeepers to discover network/transport addresses of H.323 elements in other zones. In addition, pgrp extends reliable system concepts traditionally used in telecom networks by incorporating in-service/out-of-service operational state information about elements. Thus, pgrp allows gatekeepers to route calls based not only upon static address/location information, but upon dynamic operational state information. PGRP is even useful in either when using either direct, or gatekeeper mediated call models. In the direct call model, pgrp gives the originating gatekeeper information that allows intelligent selection of a destination endpoint for a call. In the gatekeeper mediated call model, prgp allows an originating gatekeeper to route a call to a specific terminating gatekeeper. This capability is useful at large end offices which may be connected to gateways which are members of multiple H.323 zones. This selection may be based upon either the operational state as represented by the zone's gatekeeper, or it may be based upon considerations of distribution of calls among the zones davis [Page 5] Internet Draft PGRP Framework Expires 20 May 1999 connecting to the destination end office. 2. Peer Gatekeeper Routing Protocol 2.1 Protocol Overview In pgrp, the gatekeeper aggregates information collected from the registrations of individual elements in it's zone to form a zone view of connectivity, or topology state view, among elements in the zone. Information aggregated in this topology state view includes: - ELEMENT ADDRESS-this is the transport address for the element being registered. - ELEMENT IDENTIFIER-this is a name for the element being registered. unlike the element transport address, this identifier refers to a physical/logical element. Thus, an element with interfaces on multiple local area networks will have a different element addresses for each interface but one element identifier. - ELEMENT TYPE-identifies the element as being either a gateway, terminal, gatekeeper, mcu, or vendor specific. - LOGICAL ELEMENT IDENTIFIER-this field allows the topology server to treat distinct elements in a zone having the same logical element identifier as a single logical entity. This field applies to the gatekeeper element type only and is primarily useful in redundant gaetkeeper architectures as described in section 2.2.2. - REGISTRATION TIME TO LIVE-indicates the period of time for which the registration is valid at the gatekeeper. The element must reregister before this time interval in order to maintain membership in the zone. To prevent global synchronization which could produce floods of registrations at the same time, the reregistering element is to use a randomly selected scale factor between 0.75 and 0.9 times this value for determining when to reregister. After each registration a news scale factor is to be selected to determine when the next registration is to occur. davis [Page 6] Internet Draft PGRP Framework Expires 20 May 1999 - ZONE IDENTIFIER-which identifies the zone in which the element is being registered. this identifier is to be unique in the network and will take the value of the gatekeeper ip address by default. This field is useful in allowing gatekeepers to detect element interfaces that are registered to more than one gatekeeper at the same time (which is an h.323 protocol violation). - LIST OF ELEMENTS-with which the registrant is able to connect. For each element in the list the gatekeeper maintains a connection state which represents the state of connectivity between the registrant and each element in the list. This connection state indicates whether connection between the registering element to each of the elements in the list individually is enabled or disabled. By default, it is assumed that all elements are able to establish connections with one another. - OPERATIONAL STATE-which indicates the operational state of the registering element to determine whether the element may or may not be available to originate or terminate calls. Currently identified states are: - ACTIVE: the element is in service and able to send or receive new and established calls; - OUT OF SERVICE: the element is not available for receiving new calls and is no longer able to process any established calls. Any established calls are terminated when the element enters this state; - OUT OF SERVICE-TRANSITION: the element is not available for completing new calls but is able to continue processing established calls. - ADMINISTRATIVE STATE-which provides supplementary information related to the operational state. Administrative states include: davis [Page 7] Internet Draft PGRP Framework Expires 20 May 1999 - MANUAL: indicates that the current operational state of the element is the result of manual action at an element management system; - AUTOMATIC: indicats that the current operational state of the element is the result of automatic fault recovery actions. - OVERLOAD STATE-gives an indication of the call processing load being placed upon the element. This information may be used by the gatekeeper to perform call load balancing among elements. Overload states are: low, medium, and high. - SERVICE MAPPING-this includes mapping of network addresses to E.164 addresss, or other information related to the services associated with a given alias address. In pgrp a topology server establishes a client server communication with each of the gatekeepers in an area. each gatekeeper, then, sends a summary of it's topology state view to the topology server. the topology server in turns integrates the individual topology state views from each of the zones into an overall network topology state view which is then advertised to each of the gatekeepers. in this way, the gatekeepers acquire enough knowledge to route connections between any elements in the network. The LIST OF ELEMENTS consists of a linked list of elements which describes the state of connectivity between the registering element (as identified by the `element id' and `element address' fields) and each of the elements in the linked list. If the linked list member is a gateway, terminal, or mcu, this linked list contains the following information for each element entry: - ELEMENT TYPE of the linked list member. - ELEMENT ADDRESS of the linked list member. - CONNECTIVITY STATE-which describes the ability of the registering element to connect to the element address in the entry. Values for this parameter are: davis [Page 8] Internet Draft PGRP Framework Expires 20 May 1999 - ENABLED: which indicates that the registering element is able to establish direct connection to the element; - DISABLED: which indicates that the registering element is not currently able to establish direct connection to the element. The service state of the element, however, is not OUT OF SERVICE. This implies that other elements are believed able to connect to this element. In this case, the registering element is to occasionally test for network layer connectivity to the element. An element whose service state is OUT OF SERVICE is to be removed from the linked list. - OVERLOAD STATE of the linked list member. - SERVICE MAPPING information pertaining to the linked list member. If the linked list member is a gatekeeper, the entry is used for gatekeeper mediated signalling. In this case the element entry contains the following information: - ELEMENT TYPE-in gatekeeper mediated signalling, the gatekeeper is acting as a proxy for elements within it's zone. In this case, the element type field represents the type of element to which the gatekeeper will mediate signalling within it's zone. In this context, the gatekeeper may mediate signalling for multiple elements of this type by representing a single entry in the LIST OF ELEMENTS linked list. This allows hiding of the details of the contents within a zone by abstracting only the information relating to the types of services available within the zone for broadcast across the network. Elements aggregated by a single entry in this list must have common ELEMENT TYPE, and SERVICE MAPPING fields. - ELEMENT ADDRESS of the gatekeeper which is to mediate signalling to the element type. - CONNECTIVITY STATE which, in this case, indicates the ability of the registering element to establish connectivity with the davis [Page 9] Internet Draft PGRP Framework Expires 20 May 1999 gatekeeper in another zone. - OPERATIONAL STATE-in gatekeeper mediated signalling, the gatekeeper represents an aggregate operational state for the elements being represented in this entry. As such, the operational state of the aggregate set of elements is ACTIVE as long as there is at least one element in the aggregated group whose operational state is active. Since signalling is gatekeeper mediated, elements outside of the zone do not need to know the details of the operational state of individual elements within the zone. It is the responsibility of the gatekeeper to mediate signalling and to relay received messages to elements within the zone which are in an operational state to receive them. - OVERLOAD STATE-as with the OPERATIONAL STATE parameter, this represents an aggregated view of the overload state for the represented elements. The aggregate group is in the LOW overload state as long as there is at least one element within the zone in that state. As with the OPERATIONAL STATE parameter evaluation, elements outside of the zone do not need to know the details of elements within the zone. - SERVICE MAPPING information pertaining to aggregate group. All elements aggregated in a common LIST OF ELEMENTS entry must support a common service mapping. It is possible, however, for elements to support services in addition to this common service mapping which results in the possibility that a given element in a zone may be represented by multiple entries in the linked list. This detail, however, is hidden from elements outside of the zone. 2.2 Network Initialization The details of how elements select and register with a gatekeeper are beyond the scope of this document. However, it is assumed that there is a procedure by which this is done. The elements of interest for pgrp are those elements which are actively engaged in the distribution of topology state information across the network: gatekeepers, and topology servers. In the following, we will describe how a network is to initialize within the context of pgrp. 2.2.1 Gatekeeper Initialization davis [Page 10] Internet Draft PGRP Framework Expires 20 May 1999 2.2.1.1 Zone Registration Upon initialization the gatekeeper enters a registration_wait state. in this state it is waiting for elements to register for membership in it's zone. registrations are effected when an h.323 registration request (rrq) message is received from an element and when the gatekeeper returns to the element an h.323 registration confirmation (rcf) message. after the registration_wait period of time has elapsed, the gatekeeper leaves the registration_wait state and aggregates a topology state view of it's zone based upon the registrations which have been received during the wait state. 2.2.1.2 Locating a Topology Server If the gatekeeper is configured to join an area upon initialization, then it next enters topology_server_locate state. in this state the gatekeeper sends a topology_server_request (TSR) message over the topology server multicast address (TS_MULTI_ADDR). All topology servers register to this multicast address upon initialiation [3]. the TSR message is sent by the gatekeeper to request a topology server. the message identifies the gatekeeper which is sending the message and the topology area in which it is seeking membership. the topology server which serves the requested area will send a topology_server_confirmation (TSC) to the gatekeeper which indicates the topology server for the area. The serving topology server returns to confirmation as a unicast message sent directly to the requesting gatekeeper. In the event of failure to locate a topology server, the gatekeeper may be configured with a list of areas to which it may seek topology server membership. After a provisionable number of attempts to locate a topology server in a given area, the gatekeeper may attempt to locate a topology server for another area. As an alternative to dynamic topology server location the gatekeeper may, by static provisioning, be configured with a prioritized list of potential topology servers. 2.2.1.3 gatekeeper <--> topology server information exchange after locating a topology server, the gatekeeper initiates communication by sending a topology_state_channel_open (TSC_OPEN) message to the topology server. in response, the topology server sends a TSC_open message back to the gatekeeper. davis [Page 11] Internet Draft PGRP Framework Expires 20 May 1999 after this exchange of TSC_open messages the gatekeeper and topology server enter the topology_server_connection_established state. in this state the gatekeeper has established a bidirectional communication channel with the topology server. next, the gatekeeper and topology server engage in a topology state information exchange (described in section 2.3.1.3). since the relationship between gatekeeper and topology server is client-server and not peer (as is the case with topology servers), the roles for the gatekeeper and topology server are predetermined: the gatekeeper acts in the controller role while the topology server acts in the responder role. 2.2.2 support for redundant gatekeeper architecture Reliability considerations may lead to the deployment of a redundant gatekeeper architecture in which there is an ACTIVE gatekeeper, and a STANDBY gatekeeper which is able to take over management of the zone should the active gatekeeper fail. the topology server architecture provide a update mechanism to keep the topology state databases of the redundant gatekeepers in synchronization. in order to achieve this, the standby gatekeeper, gatekeeper2, declares membership in the same zone in it's registration with the topology server to which the active gatekeeper, gatekeeper1, has registered. Both gatekeepers must also have the same LOGICAL ELEMENT IDENTIFIER field which allows the topology server to distinguish the two gatekeepers from a single gatekeeper (represented by an ELEMENT IDENTIFIER field) with multiple ELEMENT ADDRESS values. the topology server will send the same information to both gatekeepers, effecting synchronization between the pair of gatekeepers. The topology server is not concerned with which elements register with gatekeeper1, versus gatekeeper2 but through the use of the LOGICAL ELEMENT IDENTIFIER field, it is able to determine if an element has simultaneously registered with both gatekeepers. this mechanism is also able to support a pool of gatekeepers. 2.3 scaling the topology server architecture to provide scaleability as the number of zones in a network increases, pgrp supports a distributed topology server architecture in which multiple topology servers exist in the network. in this case, the network is divided into areas with each area consisting of davis [Page 12] Internet Draft PGRP Framework Expires 20 May 1999 one or more zones. a zone is to reside entirely within an area and is to be a member of only one area at any time. the gatekeeper establishes a client-server relationship with the topology server within it's area. topology servers among the areas form relationships for information exchange across a multicast channel. among the topology servers a designated topology server (DTS) is elected which has the responsibility for distributing information among the topology servers. the election of the designated topology server is described in section 2.3.1.2. through information exchange, each topology server is able to aggregate topology state information from among the areas in the network to form a network level topology state view. the topology servers then distribute this view among the gatekeepers in their respective areas. the topology server model requires information to be provisioned at the gatekeeper for each zone. such information includes: - topology server identifier-this is a prioritized list of network address for the topology servers to be used by the gatekeeper. this can refer to a separate processor whose sole function is aggregation of topology information, or it can be the address at a gatekeeper for implementations which colocate topology server and gatekeeper functionality in a single unit. address 0.0.0.0 means that there is no topology server. in this case the gatekeeper has knowledge of it's own zone and no other zone. furthermore, it also indicates that the gatekeeper will not accept information about other zones. - topology area identifier-identifies the topology area in which the gatekeeper is a member. this is typically set to the ip address of the primary topology server. topology area 0.0.0.0 indicates that the gatekeeper belongs to a single area network. in this information provisioned at the topology server includes: - topology area exchange-this is a boolean value which indicates whether the topology server is enabled to exchange information with other topology servers. if topology area exchange is not enabled, the topology server will not exchange information with topology servers in other areas. furthermore, it will not davis [Page 13] Internet Draft PGRP Framework Expires 20 May 1999 participate in the hello protocol and will, therefore, not be discovered by other topology servers. - topology server multicast address (TS_MULTI_ADDR)-this is a multicast address used by the topology servers to discover other topology servers using the hello protocol. this address is also used by the designated topology server to distribute topology state updates among the topology servers. - designated topology server multicast address (DTS_MULTI_ADDR)- this multicast address is used by topology servers to send information to both the designated topology server and the backup designated topology server. - topology server priority-this value is used during the hello protocol to elect a designated topology server. the designated topology server is the only topology server which will form peering relationships with the other topology servers for the exchange of topology state information across zones. the designated topology server distributes topology update information using the topology multicast address. 2.3.1 topology server initialization prior to engaging in exchanges with other topology servers, an initializing topology server waits a preliminary_wait period of time for gatekeepers enlist in the area and to send their topology state views to the topology server. after this period of time expires, the topology server has acquired a view of it's area. it is then able to begin to exchange area views with other topology servers. this exchange begins with the hello protocol. 2.3.1.1 hello protocol description the hello protocol begins with the topology server sending HELLO packets over the topology server multicast address. an initializing topology server starts out with no knowledge of what other topology servers are present in the network. these HELLO packets contain the following information: - topology server identifier-which identifies the topology server sending the HELLO packet. davis [Page 14] Internet Draft PGRP Framework Expires 20 May 1999 - topology area identifier-identifies the topology area being served by the topology server. - HELLO packet interval-which identifies the frequency with which HELLO packets will be sent by this topology server. - peer server dead interval-specifies the period of time after which a peer topology server will be considered dead if no HELLO packets are received from that peer. - MINIMUM MESSAGE TRANSFER INTERVAL-specifies the minimum interval for transmission of non-HELLO messages between topology servers. This mechanism allows for the throttling of message traffic between topology servers. Message types which may be throttled are: - Generic Error messages: there are messages which indicate a possible error condition within an area that is detected outside of the area. - Topology State Update messages: the types of TSU messages which may be throttled are described in section 2.3.3. - designated topology server (DTS)-specifies the topology server which this server considers to be the designated topology server. this server will establish a peering relationship with this server and send and topology state updates to this designated topology server for distribution among the set of topology servers. - backup topology server (BTS)-specifies the topology server which this server considers to be the backup topology server. the backup topology server will become the designated topology server in the event of failure of the designated topology server (or, if the designated topology server relinquishes it's designated status for any reason). davis [Page 15] Internet Draft PGRP Framework Expires 20 May 1999 - known topology servers-this is a list of topology server identifiers known to the server sending the HELLO packet. this information is significant in informing receiving topology servers that their presense is known to the sender and that a topology state information exchange may begin between the two servers. at initialization, the topology server begins sending HELLO packets at regular intervals. in order to avoid global synchronization with multiple topology servers predictably sending HELLO packets at the same time, the topology server will use a randomly selected scale factor between 0.75 and 1.0 to determine the exact time to send each HELLO packet. 2.3.1.2 election of designated and backup topology servers when the topology server begins sending HELLO packets it transitions to an initialization_wait state during which the designated topology server and backup topology server will be determined. if designated and backup topology servers have already been elected, then these will be accepted by the initializaing topology server. if there is no DTS and BTS, they are determined by election. the designated topology server and backup topology server serve a central role in the topology state information distribution architecture. all other topology servers send topology state information to these servers which then distribute the received information among all the topology servers. this structure allows a topology server to have to only know about two topology servers (designated and backup). for a network of n topology servers, a total of 2n information exchange relationships are needed to distribute information across the network. if there were no designated and backup topology server roles, each topology server would have to establish individual information exchange relationships with every other topology server. in this case, a network of n topology servers would produce a total of n*(n- 1)/2 information exchange relationships. information exchange in this model uses two different multicast addresses. information is sent to the designated topology server and the backup topology server using the designated topology server multicast address. topology state updats are distributed to the other topology servers by the designated topology server only using the topology server multicast address. davis [Page 16] Internet Draft PGRP Framework Expires 20 May 1999 in the election procedure the BTS is elected first. the topology server inspects the HELLO packet packets received from each topology server. if the topology server sees it's topology server id in the known topology servers list in the HELLO packet the packet is inspected to determine if the topology server which sent the HELLO packet is declaring itself to be the BTS. if it is, the topology server id and topology server priority fields are evaluated. if, during this waiting state, multiple topology servers declare themselves to be the BTS, the topology server with the highest priority is selected. in the event of a tie, the topology server with the highest topology server id is selected. if no topology servers has declared candidacy to be a BTS, then the topology server with the highest priority is selected. topology servers which declare themselves to be the DTS are ineligible for consideration as BTS. next, the DTS is selected. again, the if multiple topology servers declare themselves to be the DTS, the topology server with the highest priority is selected. if no topology servers declare themselves to be the DTS, the BTS is selected as the DTS. the algorithm is repeated but this time the previous BTS has been promoted to DTS and is now eliminated for consideration as the BTS. the remaining list of contending BTS is evaluated and a new BTS selected. the new DTS and BTS register for membership at the DTS multicast address. a topology server declares BTS or DTS value of 0 if it wishes to not declare a known BTS or DTS. when initializing a topology server, 0 is the initial value used for BTS and DTS. a topology server with priority value of 0 is ineligible for election to either BTS or DTS. 2.3.1.3 topology state information exchange once the designated and backup topology server have been determined, the initializing topology server enters the information_exchange_negotiation state. topology state information exchange is based upon communication between topology servers in which one acts as a controller while the other acts as a responder. the controller polls the responder for topology state information which is sent from the responder only upon being polled by the controller. the negotiation begins with the initiating topology server sending a topology_exchange_negotiation (tpen) packet to the designated and davis [Page 17] Internet Draft PGRP Framework Expires 20 May 1999 backup topology servers over the topology server multicast channel. this packet contains the topology server id and topology server priority. in addition, the TPEN packet contains a control/response bit which the topology server set to declare itself to be the controller of the information exchange. `state' in this context refers to the status of a dialog between a given pair of topology servers in the network. in these information exchanges, one of the topology servers is the designated topology server and it's state with respect to the dialog should reflect that of the other topology server entity in the dialog. however, since the HELLO packet exchange is not reliable, at the beginning of this negotiation stage the entities in the dialog are not certain of the role of the other. therefore, each topology server in this negotiation stage sends a TPEN packet asserting itself to be the controller. upon receipt of an TPEN packet, the receiving topology server inspects the topology server priority field in the packet to determine which entity has the higher priority. the higher priority topology server is declared the controller in the exchange. information exchange begins while still in the information exchange negotiation state with the controlling topology server sending a topology state advertisment (TSA). the TSA is sequence numbered. the TSA message is comprised of element descriptors which are headers that describe the state of elements within the the network. the TSA message header identifies the sending of the message. element descriptors contain the following: 1. element address: as described above. 2. timestamp: which indicates the last time that a state change occurred on the element referenced by the element address. there may be multiple TSAs required to exchange topology state information between topology servers. if there are multiple TSAs to be sent, the more bit is set in the packet. the control/respond bit is also set to indicate the TSA sender's role in the exchange. in this case the bit is set to indicate the control role. all TSAs also contain the topology server priority of the sender. when a responding topology server receives this TSA it replies with it's own TSA. the sequence number in the response TSA is to be the davis [Page 18] Internet Draft PGRP Framework Expires 20 May 1999 same as that of the received TSA which triggered the response. the control/respond bit in the reply TSA is set to indicate a respond role. during this exchange of TSAs, only a single TSA is to be outstanding from a topology server at a time. TSAs are sequenced numbered and acknowledged by the responding topology server by sending a TSA with the same sequence number as was in the control TSA. this procedure ensures a reliable exchange of topology state information between the topology servers. this exchange also negotiates the control/respond roles for the exchange. if, in the above exchange, both TSAs indicated that they still each considered themselves to be the controller, the TSAs would be retransmitted until the roles are successfully negotiated. after determination of the controlling and responding entities in the ensuing information exchange, the initializing topology server enters the information_exchange state. while in the information_exchange state, the topology server notes the timestamp associated with each element descriptor and compares it against information in its current topology state database. if the element does not currently exist in the database, the topology server puts the element descriptor on a topology state request list. if the element does currently exist but the timestamp in the element descriptor indicates that it is more recent than what is currently in the topology state database, the topology server also puts the element descriptor on the topology state request list. TSA exchange continues in the information exchange state until both topology servers have sent element descriptors for all known H.323 elements. 2.3.2 requesting topology state updates after completion of information exchange the topology servers inspect their topology state request lists. if the list is not empty, the topology server sends a topology state request (tsr) message to request updated information about any elements on the list. the tsr contains the list of element descriptors received during the information exchange procedure. when a topology server receives a tsr message it is to retrieve the topology state information that it has for the elements and package them in a topology state update (tsu) message. the tsu is flooded among the topology servers: the designated topology server floods davis [Page 19] Internet Draft PGRP Framework Expires 20 May 1999 it's TSUs using the topology server multicast address; other (non- DTS/non-BTS) topology servers ensure that their TSUs get to both DTS and BTS by using the DTS multicast address. flooding ensures that all topology servers get the latest topology state information because it allows each topology server to see updated topology state information even if it did not initiate the request for the information. if we assume that the topology state information is synchronized among the topology servers, then if one topology server sends a tsr to the designated topology server all of the other topology servers can be expected to follow with TSRs for the same information. so a flooded tsu over the topology server multicast address provides a more efficient method of responding to a tsr from any topology server than would be the case if the designated topology server sent only a unicast response to each topology server individually. in order that topology state updates be reliable, all TSUs must be acknowledged by a topology state acknowledgement (TSACK) message. the TSACK contains only those element descriptors whose receipt is being acknowledged by the sending topology server. thus, if a topology server receives a tsu message which contains unrecognized topology state information, the TSACK message will reflect the fact that not all elements were successfully received by that topology server. unrecognized elements may then be retransmitted in another tsu message. Retransmitted TSUs are unicast to the topology server(s) requiring message retransmission. TSACK messages are unicast by the DTS and BTS to the topology server which sent the TSU. TSACK messages are sent by non-DTS/non-BTS topology servers using the designated topology server multicast address. this allows the DTS, BTS, and each other topology server to stay in topology state database synchronization with a single message transmission. A non-DTS/non-BTS topology server expects to receive TSACKs from both DTS and BTS in response to a sent TSU. The DTS and BTS expect to receive TSACKs from all non-DTS/non-BTS topology servers in response to a sent TSU. Since only the DTS sends TSUs the designated topology server also expects to receive a TSACK from the backup topology server. 2.3.3 topology state information updates gatekeepers are to send topology state update messages to the topology server when a new element registers with the gatekeeper or davis [Page 20] Internet Draft PGRP Framework Expires 20 May 1999 when a registered element changes operational state. in order to prevent a flood of messages to the topology servers, TSUs from gatekeepers may be throttled by setting of a minimum_tsu_interval (mti) parameter. this prevents the gatekeeper from sending TSUs at intervals shorter than this period. after this period a single summary tsu is sent summarizing state change information over the intervening period. optional parameters associated with the mti parameter enable different types of tsu information to be throttled. the types of tsu information include: - new registration-reporting of registration of new elements. - network connectivity change-the gatekeeper is advertising a new (network address, network mask) tuple for the zone. - out of service to active state change-reporting of existing elements which have changed from an oos to active maintenance state. - active to out of service state change-reporting of existing elements which have changed from an active to oos maintenance state. the optional parameter allows, for example, throttling of new registrations and network connectivity changes while state changes of existing elements are reported immediately. this throttling procedure is to be bidirectional with the capability to throttle tsu’s from the topology server to each of the gatekeepers. these parameters are communicated between the gatekeeper and the topology server during the topology_state_channel_open message exchange. 2.4 summary of system initialization the following is a summary of the steps toward system initialization. in this summary, same numbered steps indicate tasks which may be performed independently: 1. gatekeeper initialization davis [Page 21] Internet Draft PGRP Framework Expires 20 May 1999 1. topology server initialization. 2. topology server registers to the topology server multicast address. 3. gatekeeper generates a topology state map for the zone based upon registrations. 4. topology server sends HELLO packets over the topology server multicast address to discover other topology servers. 5. gatekeeper locates a topology server serving the gatekeeper’s topology area. 6. a backup topology server is elected. 7. gatekeeper opens a two way dialog with the topology server. 8. the backup topology server registers to the designated topology server multicast address. 9. gatekeeper and topology server exchange information: gatekeeper reports zone topology state information to topology server and topology server sends information aggregated from other zones to gatekeeper. 10. a designated topology server is elected 11. the designated topology server registers to the designated topology server multicast address. 12. topology server and designated topology server negotiate topology state information exchange relationship. davis [Page 22] Internet Draft PGRP Framework Expires 20 May 1999 13. topology server and designated topology server exchange topology state advertisements. 14. topology server sends topology state updates to designated and backup topology servers over the designated topology server multicast address. 15. designated topology server sends topology state updates to all topology servers over the topology server multicast address. 16. topology state information is to be refreshed periodically based upon zone reregistrations. aged information is be be marked as invalid in the topology state database. 3. recovery from system errors 3.1 introduction this section considers possible failure modes and the strategies for recovering from them. failure modes may be grouped into two categories: 1. failures associated with elements: topology servers, gatekeepers, gateways, terminals, &c. 2. topology state synchronization errors: inconsistent topology state information among elements. in this section, the following failure modes are considered: 1. aging of topology state database entries 2. failure of the designated topology server 3. failure of the backup topology server davis [Page 23] Internet Draft PGRP Framework Expires 20 May 1999 4. topology server failure detected by a gatekeeper 5. gatekeeper failure detected by a topology server 6. gatekeeper failure detected by elements in zone 7. Element failure detected by a gatekeeper 8. connection failure between elements in the same zone 9. connection failure between elements in different zones-same topology server area 10. connection failure between elements in different zones- different topology server areas 3.2 aging of topology state information as elements reregister with the gatekeeper in their zone, and as gatekeepers in turn send topology state advertisements to their topology servers, the timestamp associated with each element entry is updated to reflect the most recent element registrations. if, however, an element fails to reregister with it's gatekeeper, or if the gatekeeper fails to send TSAs to the topology server the topology state information is `aged out'. if an element within a zone does not reregister within the registration lifetime as specified by the gatekeeper when the element registered, the gatekeeper is to age out the element registration. in response, the gatekeeper is to send an h.323 unregistration request (urq) to the element. upon receipt of the urq, the element is to respond with an unregistration confirmation (ucf) to the gatekeeper. the gatekeeper then invalidates the topology state entry that it maintained for the element and sends a topology state update message containing information about the change in state for the element to it's topology server. the topology server forwards the information on to the designated topology server which distributes the information among the other topology servers. each topology server in turn distributes the information among the gatekeepers in the respective davis [Page 24] Internet Draft PGRP Framework Expires 20 May 1999 areas. 3.3 failure of designated topology server a failure of a topology server is detected by other topology servers when the interval between it's HELLO packets exceeds the peer_server_dead interval specified in the HELLO packet. when this occurs with the designated topology server the topology servers reenter the initialization wait state to elect a replacement. in this case, the backup topology server is elected as designated topology server and a new backup topology server is elected. this new backup topology server registers to join the designated topology server multicast channel. this realignment to elect new designated and backup topology servers causes a temporary disruption in the ability of the topology servers to distribute topology state information across the network since only the designated topology server does the distribution of topology state information across the network of topology servers. the backup topology server, however, does receive all messages being sent to the failed designated topology server. since each topology server is expecting an acknowledgement from both the designated and backup topology server, the topology servers retransmit their topology state messages as unicasts to the designated topology server until a new DTS has been elected. the sending of retransmissions as unicasts spares elements from having to process message traffic for which it has already returned an acknowledgement. the backup topology server expects the designated topology server to distribute the topology server updates to the network of topology servers. when this doesn't happen, the backup topology server does the distribution of updates when it is elected to the role of designated topology server. this procedures allows topology state information to be distributed without loss of information, albeit with some delay, even if the designated topology server is lost. when a topology server is declared dead it is removed from the list of known topology servers in the hello message of the detecting topology servers. this makes it possible for the implicated topology server to detect when other topology servers have declared it to be dead. in addition, entries in the topology state database which are associated with the topology server are invalidated. this action effectively isolates the area served by the dead topology server from the network. note that the election scheme is based upon the assumption that all davis [Page 25] Internet Draft PGRP Framework Expires 20 May 1999 topology servers are able to communicate with one another across the topology server multicast channel. it is assumed that if a topology server is not able to communicate with another topology server over this channel that other topology servers will have the same difficulty since they are using a common multicast channel for communication as opposed to individual point to point communication between pairs of topology servers as would be the case if each topology server formed a peering relationship with every other topology server. consequently, if one topology declares another topology server to be dead due to a lack of hello message exchanges, the other topology servers will make the same declaration. thus, we do not expect an exception case where one topology server declares another topology server to be declared dead while other topology servers consider the implicated topology server to still be alive. if the topology area exchange parameter at a topology server is set to disable topology information exchange the topology server is to cease sending HELLO packets to other topology servers and is to ignore traffic from other topology servers. in addition, the topology server is to invalidate topology state information associated with all other areas. this effectively allows the area to isolate itself from all other areas in the network. 3.4 failure of backup topology server a failure of the backup topology server results in each of the topology servers reverting to the initialization_wait state, and in the subsequent election of a new backup topology server. the existing designated topology server remains the same. 3.5 topology server failure detected by a gatekeeper in order to reduce the amount of message traffic that is propagated across the network the gatekeeper only sends information to the topology server when there is a topology state view change within the zone. thus, topology servers do not age topology state information as do the gatekeepers. however, there is still a need to determine the status of the channel between the gatekeeper and the topology server even if there is no information exchange for extended periods of time. to this end the gatekeeper and topology server are to send KEEPALIVE mesages across the channel at regular intervals as determined during the topology_state_channel_open message exchange. if the topology server does not send a KEEPALIVE message to the gatekeeper within the specified interval, the gatekeeper is to determine the topology server to be dead. in this case, the gatekeeper is to perform the following steps: davis [Page 26] Internet Draft PGRP Framework Expires 20 May 1999 1. the gatekeeper is to close the communication channel with the topology server by sending a topology_state_channel_close message; 2. the topology server is to send a topology state channel close message to the gatekeeper in response; 3. if the gatekeeper is not provisioned for standalone zone operation (i.e. primary topology server id set to value 0.0.0.0) the gatekeeper is to attempt to join another area if it is provisioned to do so. in which case it is either provisioned with a backup topology server or an alternate area and can use procedures to locate the topology server serving that area. in any case, the gatekeeper is to continue attempting to join an area until successful. upon joining a new area, the subsequent exchange of topology state information between the gatekeeper and new topology server, will allow the topology server to detect that the elements in the zone were previously registered in another area based upon the current information in the topology state database. in this case the new topology server is to take the following actions: 1. compare the timestamps of the element descriptors in the topology state advertisement sent from the gatekeeper with current topology state database information; 2. if the timestamp of an element descriptor in the TSA is no more recent than information in the topology state database, the topology server is to update the topology state view to represent the current area in which the element is a member; 3. if the timestamp in an element descriptor in the TSA is more recent than the information in the topology state database, the topology server is to send a topology state request to the gatekeeper to get further information on the element; 4. the topology server sends the more recent topology state update information to the designated topology server which distributes the information among the other topology servers. davis [Page 27] Internet Draft PGRP Framework Expires 20 May 1999 if the gatekeeper is provisioned for standalone zone operation, no further action is taken by the gatekeeper after closing of the channel to the topology server. provisioning of the gatekeeper for standalone zone operation, where it was previously not so provisioned, is to trigger the gatekeeper to initiate closing of an open channel to the topology server. 3.6 gatekeeper failure detected by a topology server if the gatekeeper does not send a KEEPALIVE message to it's topology server within the specified interval, the topology server is to determine the gatekeeper to be dead. in this case, the topology server invalidates all topology state information associated with the zone. from this point the process is as described in follows: 1. the topology server is to close the communication channel with the gatekeeper by sending a topology_state_channel_close message to the gatekeeper; 2. the gatekeeper is to send a topology state channel close message to the topology server in response. 3. the topology server which invalidates the zone information is to send a topology state update message to the designated topology server; 4. the designated topology server sends topology state update messages to each of the topology servers reflecting the change in topology state view; 5. the topology servers then send topology state update messages to each of the gatekeepers in it's area; if the gatekeeper is not provisioned for standalone zone operation, the gatekeeper is to take actions to allow the elements in the zone to reregister with another gatekeeper. in this event, the gatekeeper to unregister each of the elements in it's zone by sending h.323 urq messages over the ras channel. each element is to respond with a ucf message to the gatekeeper and subsequently reregister with another davis [Page 28] Internet Draft PGRP Framework Expires 20 May 1999 gatekeeper if they are provisioned with an alternative gatekeeper. 3.7 gatekeeper failure detected by elements in zone 3.7.1 detection of simultaneous registration in multiple zones Failure of a gatekeeper may prompt elements in the zone to register with a gatekeeper in a different zone. The specific mechanisms though which this happens is beyond the scope of this document. However, should the element register in a different zone, the gatekeeper serving that zone is to detect a topology state information change in it's zone. the gatekeeper next sends a topology state update message to the topology server. if the topology server detects registration of the element address in another zone it reconciles the conflict between registrations using procedures described in section 3.5. upon resolving this conflict, the topology server is to send a generic error message to the `losing' gatekeeper with error cause: redundant registration in multiple zones. the generic error message is to implicate the element for which the redundant registration was detected. in response, the gatekeeper is to issue an unregistration request message to the implicated element. in the event that the `losing' gatekeeper is in a different area, the topology server detecting the registration conflict is to send the generic error message to the topology server which serves the area in which the `losing' gatekeeper is located. this procedure ensures that the losing gatekeeper receives only one generic error message from it's topology server as opposed to being potentially bombarded with multiple messages from every topology server detecting the registration conflict. if the element registers in the new zone using a different network address, no registration conflict is detected by the topology server and the original registration is either aged by the original gatekeeper or invalidated by the topology server serving the original gatekeeper. A minimum message transmission interval may be defined during the TSC_OPEN message exchage when the channel between the gatekeeper and the topology server is established. This parameter may be used to throttle GENERIC_ERROR message traffic over the channel. 3.7.2 handling of existing calls by a new gatekeeper at the time or registration to a new gatekeeper the element may have existing communications which were initiated based upon arq/acf davis [Page 29] Internet Draft PGRP Framework Expires 20 May 1999 exchanges with the previous gatekeeper. in such case, the new gatekeeper will have no state information associated with those existing communications as the pgrp does not attempt to disseminate call state information. in such cases the element is to not attempt to repeat the admission request procedure with the new gatekeeper for any communications in progress prior to registration. subsequent ras channel messages (such as disengage requests, &c.) are to be sent to the new gatekeeper. 3.8 Element Failure Detected by the Gatekeeper Element failure is detected by a gatekeeper through the H.323 message exchange over the RAS channel. If the gatekeeper declares an element to have failed by this mechanism, the element is to have it's operational state changed to OUT OF SERVICE and all connectivity to this element is to be represented as being in the DISABLED state. The gatekeeper is to also send a TOPOLOGY STATE UPDATE message to the topology server reflecting the change of operational state. 3.9 Connection Failure Between Elements in the Same Zone In the event that the gatekeeper determines that an originating element has failed in an attempt to communicate with a target element within the same zone, the gatekeeper is to update the topology state information for the originating element to indicate that it's connectivity with the target element is represented as being in the DISABLED state. The gatkeeper is to also send a TOPOLOGY STATE UPDATE message to the topology server, however, this information is of local (to the zone) interest only and is not propagated to other zones. 3.10 Connection Failure Between Elements in the Different Zones But Same Area Element failure is detected by a gatekeeper through the H.323 message exchange over the RAS channel. If the gatekeeper declares an element to have failed by this mechanism, the element is to have it's operational state changed to OUT OF SERVICE and all connectivity to this element is to be represented as being in the DISABLED state. The gatekeeper is to also send a TOPOLOGY STATE UPDATE message to the topology server reflecting the change of operational state. At this point this information is of originating zone interest only. The gatekeeper is to also send a GENERIC ERROR message to the topology server with cause: connection failure. This message is to implicate the target element for which the connection attempt failed. davis [Page 30] Internet Draft PGRP Framework Expires 20 May 1999 The topology server, upon determining that the implicated element is within it's area, is to send the GENERIC ERROR message to the gatekeeper serving the zone of the target element. The receiving gatekeeper may use this information to perform diagnostics to determine whether the target element has failed. If the target element has indeed failed this case is handled as described in section 3.8. 3.11 Connection Failure Between Elements in the Different Areas The gatekeeper serving the originating element follows the procedure as described in section 3.10. This time, however, the topology server determines that the target element is in an area served by another topology server. The originating topology server then sends the GENERIC ERROR message to the destination topology server. The destination topology server then locates the zone in which the target element is a member and sends the GENERIC ERROR message to the gatekeeper for that zone. The receiving gatekeeper follows a procedure as described in section 3.10. 4. References [1] ITU Recommendation H.323, ITU-T, 1998 [2] Call Signalling Protocols and Media Stream Packetization for Packet Based Multimedia Communications Systems, ITU Recommendation H.225, ITU-T, February 1998. [3] Fenner, W., Internet Group Management Protocol, Version 2, RFC 2236, November 1997. [4] Moy, J., OSPF Version 2, STD 54, RFC 2328, April 1998. [5] Rekhter, Y., and T. Li, A Border Gateway Protocol 4 (BGP-4), RFC 1771, March 1995. 5. Full Copyright Statement Copyright (C) The Internet Society (1996). All Rights Reserved. This document and translations of it may be copied and furnished to davis [Page 31] Internet Draft PGRP Framework Expires 20 May 1999 others, and derivative works that comment on or otherwise explain it or assist in it's implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or it's successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 6. security authentication and security issues for pgrp are to be addressed in a future version of this document. 7. Author's Address ronald h. davis Lucent Technologies 2000 North Naperville Road Naperville, IL 60566-7033 Phone: 630.979.1720 email: ronald.h.davis@lucent.com davis [Page 32]