Internet Draft Bala Rajagopalan draft-bala-protection-restoration-signaling- Debanjan Saha 00.txt Tellium, Inc. Expires on: 5/14/2002 G. Bernstein Ciena Corp. Vishal Sharma Metanoia, Inc. Ayan Banerjee John Drake Jonathan Lang Calient Networks Jennifer Yates Guangzhi Li AT&T Signaling for Protection and Restoration in Optical Mesh Networks Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 except that the right to produce derivative works is not granted. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract Protection and restoration of switched connections under tight time constraints is a challenging problem in optical mesh networks. This draft describes different local and end-to-end protection modes for connections, and the message flow required for protection and restoration-related signaling. Expires on 5/14/2002 Page 1 draft-bala-protection-restoration-signaling-00.txt 2. Introduction Protection and restoration of switched connections under tight time constraints is a challenging problem in optical mesh networks. Such a network consists of optical or photonic cross-connects (referred to as "nodes") connected in a general topology [1]. Restoration typically involves the activation of an alternate (or "protection") path for a connection when a failure is encountered in the primary (or working) path. A path for a connection (working or protection) is characterized by an ingress port, an egress port, and a set of intermediate nodes and links through which the connection is routed. The working and protection paths are typically resource-disjoint (e.g., node or link disjoint) other than the ingress and egress ports which remain the same. A bi-directional link between neighboring nodes is usually realized as a pair of unidirectional links. The end-to-end path for a bi- directional connection therefore consists of a series of bi- directional segments between the source and destination nodes, traversing intermediate nodes. The following distinction is made between the terms "protection" and "restoration", even though these terms are often used interchangeably [2]. Protection is defined as the paradigm whereby a dedicated protection path is pre-established for a connection, and the connection is merely switched at the endpoints from the working to the protection path after a failure. The term restoration, on the other hand, is used to denote the paradigm whereby a protection path for a connection may be selected apriori, but its establishment occurs only after a failure in the working path. This distinction is subtle, and both protection and restoration require signaling. Protection can be "local span" or "end-to-end". Local span protection refers to the protection of the link (and hence connection segments routed over the link) between two neighboring switches. End-to-end protection refers to the protection of an entire connection from the ingress to the egress port. A connection may be subject to both local span protection (for each of its segments) and end-to-end protection (when local protection does not succeed or is not desired). Under local span and end-to-end protection schemes, it may be required that when a failure affects any one direction of the connection, both directions of the connection are switched to a new link or path, respectively. In the following, therefore, any reference to a "link" indicates a bi- directional link (realized as a pair of uni-directional links), unless noted otherwise. 2.1 Local Span Protection Considering local-span protection, suppose a connection segment is routed over link i between two nodes A and B. The following protection modes may be used: Expires on 8/22/01 Page 2 draft-bala-protection-restoration-signaling-00.txt 1+1 (unidirectional): A dedicated link j is pre-assigned to protect working link i. Connection traffic is simultaneously sent on both links and received independently by A and B from one of the functioning links, i or j. Thus, it is possible that nodes A and B may be receiving traffic from different links. 1+1 (bi-directional): A dedicated link j is pre-assigned to protect link i. Connection traffic is simultaneously sent on both links and under normal conditions, the traffic from link i is received by nodes A and B (in the appropriate directions). A failure affecting link i results in both A and B switching to the traffic on link j in the respective directions. 1:N: A dedicated link j between A and B is pre-assigned to protect a set of N links (which includes i). A failure affecting any link in this set results in the corresponding traffic being restored to link j. Clearly, if more than one link in the set of N links are concurrently affected by failures, the traffic on only one of the N links may be restored over link j. M:N (with pre-configured protection groups): A protection group of M+N links consists of a set of N links protected by a set of M other links, with M < N. (Link i must be one of the N links in a protection group, and link j is one of the M links). A failure in any of the N links results in traffic being switched to one of the (available) M links. The number of protection groups between A and B, the value of M and N, as well as the specific links in each protection group are pre-configured. Since M < N, it is possible that not all failed links in the set of N links may be protected from the same failure event. M:N (pooled protection links): Under this mode, a total of M links are assigned to protect a total of N other links between A and B, where M+N is the *total* number of links between A and B. (Link i must be one of the set of N links, and link j is one of the set of M links). This mode thus differs from the previous where there could be multiple hard-configured protection groups and working links in one group cannot be protected by protection links in another group. Furthermore, under this mode, the number of protection links need not be pre-configured and may vary depending on the demand from working traffic. Indeed, any available link can be used for protection purposes during a failure event. Thus, this is a more general and flexible protection mode compared to the previous. For administrative purposes, however, the value of M, N, and the specific links in the set of M protection links may be pre-determined. As before, since M < N, it is possible that not all failed links in the set of N links may be protected from the same failure event. Expires on 8/22/01 Page 3 draft-bala-protection-restoration-signaling-00.txt Since M:N (with pre-configured protection groups) and 1:N are special cases of M:N (pooled protection links), we focus on the signaling requirements for M:N protection with pooled protection links. 2.1 End-to-End Protection and Restoration Considering end-to-end protection, suppose a connection's primary path is from an ingress port in node A to an egress port in node B over a set of intermediate nodes. The following protection and restoration modes may be used: 1+1 (unidirectional) protection: A dedicated, resource-disjoint alternate path is pre-established to protect the connection. Connection traffic is simultaneously sent on both paths and received from one of the functional paths by the end nodes, A and B. 1+1 (bi-directional) protection: A dedicated, resource-disjoint alternate path is pre-established to protect the connection. Connection traffic is simultaneously sent on both paths; under normal conditions, the traffic from the primary path is received by nodes A and B (in the appropriate directions). A failure affecting the primary path results in both A and B switching to the traffic on the back-up path in the respective directions Shared mesh restoration: An alternate path is pre-assigned to protect the connection, but the resources along the alternate path may be shared among multiple connections being protected (based on criteria described later). In this case, the resources are allocated in real-time for one of the protected connections whose primary path is affected by a failure. If more than one connection sharing a resource is concurrently affected by a failure, only one of them will be allocated the shared resource. New protocol mechanisms are required to realize both 1+1 (bi- directional) protection and shared mesh restoration in optical networks. Specifically, 1+1 (bi-directional) protection requires coordination between the end nodes to switch to the protection path, and shared mesh restoration additionally involves the intermediate nodes in the protection path. The aim of this draft is to define the message flows for M:N pooled local span protection, 1+1 (bi-directional)protection and shared mesh restoration. The main requirements on these protocols are simplicity and speed. The latency requirement on switching to protection paths is typically specified in tens to hundreds of milliseconds, the performance depending on the number of hops involved [2]. Expires on 8/22/01 Page 4 draft-bala-protection-restoration-signaling-00.txt Sections 3 and 4 describe local span and end-to-end protection protocols in detail. Section 5 describes certain administrative procedures related to restoration. Section 6 presents some discussion items and Section 7 presents the conclusions. 3. Local Span Protection Local span protection is described with respect to two neighboring nodes A and B. The scenario considered for local span protection (M:N with pooled protection links) is as follows: o At any point in time, there are two sets of links between A and B, i.e., a working set of N (bi-directional) links carrying traffic subject to protection and a protection set of M (bi- directional) links. A protection link may have no traffic on it, or it may be carrying traffic that could be preempted. There is no a priori relationship between the two sets of links, but the value of M and N may be pre-configured. The specific links in the protection set MAY be pre-configured to be physically diverse to avoid the possibility that failure events affect a large proportion of protection links (along with working links). o When a link in the working set is affected by a failure, the traffic on it is diverted to a link in the protection set, if such a link is available. Note that such a link might consist of more than one connection e.g., an OC-192 link carrying four OC-48 connections. o More than one link in the working set may be affected by the same failure event. In this case, there may not be an adequate number of protection links to accommodate all of the affected traffic carried by failed working links. The set of affected working links that are actually restored over available protection links is then subject to policies (e.g., based on relative priority of working traffic). These policies are not specified in this draft. o Each node is assumed to have an identifier, called the Node ID. Each node is also assumed to have the mapping of its local link (or port) ID to the corresponding ID at the neighbor. This mapping could be configured, or obtained automatically using a neighbor discovery procedure (e.g., LMP [3]). o When traffic must be diverted from a failed link in the working set to a protection link, the decision as to which protection link is chosen is always made by one of the nodes, A or B. As per this draft, the node with the numerically higher Node ID is considered the "master" and it is required to both apply any policies and select specific protection links to divert working traffic. The other node is considered the "slave". The determination of the master and the slave may be based on Expires on 8/22/01 Page 5 draft-bala-protection-restoration-signaling-00.txt configured information, or as a result of running a neighbor discovery procedure. o Failure events themselves are assumed to be detected by lower layer mechanisms (e.g., SONET). Since the bi-directional links are formed by a pair of unidirectional links, a failure in the link from A to B is typically detected by B and a failure in the opposite direction is detected by A. It is possible that a failure simultaneously affects both directions of the bi- directional link. In this case, A and B will concurrently detect failures, in the B-to-A direction and in the A-to-B direction, respectively. The basic steps in local span protection are as follows: 1. If the master detects a failure of a working link, it autonomously invokes a process to allocate a protection link to the affected traffic. 2. If the slave detects a failure of a working link, it must inform the master of the failure. The master then invokes the same procedure as above to allocate a protection link. (It is possible that the master has itself detected the same failure, for example, a failure simultaneously affecting both directions of a link). 3. Once the master has determined the identity of the protection link, it indicates this to the slave and requests the switchover of the traffic. Prior to this, if the protection link is carrying traffic that could be preempted, the master stops using the link for this traffic (i.e., the traffic is dropped by the master and not forwarded into or out of the protection link). 4. The slave sends an acknowledgement to the master. Prior to this, if the selected protection link is carrying traffic that could be preempted, the slave stops using the link for this traffic (i.e., the traffic is dropped by the slave and not forwarded into or out of the protection link). It then starts sending the (failed) working link traffic on the selected protection link. 5. When the master receives the acknowledgement, it starts sending and receiving the (failed) working link traffic over the new link. From the description above, it is clear that local span restoration may require up to three messages for each working link being switched: a failure indication message, a switchover request message and a switchover response message. The following identifiers are also needed: 3.1 Identifiers Expires on 8/22/01 Page 6 draft-bala-protection-restoration-signaling-00.txt Node ID: An identifier that uniquely identifies each node in the network. Link ID: An identifier that uniquely identifies a bi-directional link at the sending and the receiving node. The messages are as follows. All these messages must be transmitted reliably from the message source to the message destination (master or slave). 3.2 Failure Indication Message This message is sent from the slave to the master to indicate the failure of one or more working links. (This message may not be necessary when the underlying link technology itself provides for such a notification). The number of links included in the message would depend on the number of failures detected within a window of time by the sending node. A node may choose to send separate failure indication messages in the interest of completing the restoration for a given link within an implementation-dependent time constraint. The ID of the failed link is the identification used at the slave node. The master must convert this to the corresponding ID at its side. 3.3 Switchover Request Message This message is sent from the master to the slave (reliably) to indicate whether the traffic on the failed working link can be switched to a free link. If so, the ID of the free link must be indicated. The link IDs are based on the identification used at the master. The slave must convert them to the corresponding local IDs. The message ID uniquely identifies the message at the master. A link being protected may carry multiple connections. Since the entire working link is switched to a protection link, it may be possible for the connections on the working link to be mapped to the protection link by the master and slave without coordination (e.g., if the channel assignments (i.e., "labels") are the same on the working and protect links). Optionally, if it is necessary, the channel assignments (labels) may be explicitly coordinated between the master and the slave (e.g., when a smaller capacity link is protected by a larger capacity link). In this case, the Switchover Request message should carry the new label mappings selected by the master. Expires on 8/22/01 Page 7 draft-bala-protection-restoration-signaling-00.txt The master may not be able to find protection lines to accommodate all failed working links. Thus, if this message is generated in response to a Failure Indication message from the slave then the set of failed links in the message may be a sub-set of the links received in the Failure Indication message. Depending on time constraints, the master may switch the set of failed links in smaller batches. Thus, A failure event may result in the master sending more than one Switchover Request message to the same slave node. 3.4 Switchover Response Message This message is sent from the slave to the master (reliably) to indicate the completion (or failure) of switchover at the slave. In this message, the slave may indicate that it cannot switch over to the corresponding free link for some reason. The action to be taken by the master in this case is undefined (for example, the master may abort the switchover of the traffic on the failed working link, and perhaps trigger end-to-end protection). 3.5 Preventing Unintended Connections An unintended connection occurs when traffic from the wrong source is delivered to a receiver. These should be prevented during protection switching. This is a concern only when the protection link is being used to carry (unprotected) traffic that could be preempted. In this case, it must be ensured that the traffic being switched from the failed working link to the protection link is not delivered to the receiver of the traffic preempted. Thus, in the message flow described above, the master should disconnect (any) preempted traffic on the selected protection link before sending the Switchover Request. The slave should also disconnect preempted traffic before sending the Switchover Response. In addition, the slave should start receiving traffic for the protected connection from the protection link. Finally, the master should start sending protected traffic on the protection link upon receipt of the Switchover Response. 4. End-to-End Protection One of the significant differences between end-to-end protection and local span protection (as considered in this draft) is that the former is on a per-connection basis while the latter is on a per- link basis. In other words, span protection switches over the entire traffic on a link which may consist of multiple connections. End-to- end protection, on the other hand, switches over individual connections. In this case, there is a working connection path and a protection path. Expires on 8/22/01 Page 8 draft-bala-protection-restoration-signaling-00.txt Another difference between end-to-end and local protection is that signaling messages may have to be transmitted multiple hops to effect restoration. The signaling messages are transmitted to the source of the connection. The messages are typically forwarded along the connection path, working or protection, where it is assumed that there is a control channel between each pair of intermediate nodes. If the optical network has routing intelligence, some of these messages can also be routed over other paths. There are two cases to be considered: signaling for bi-directional 1+1 protection and for shared mesh restoration. The description below is in the context of an end-to-end connection between a source node A and a destination node B. 4.1 Bi-directional 1+1 Protection Under bi-directional 1+1 protection, the connection traffic is being sent on both working and protection paths by A and B, but received only from the working path. After a failure event, signaling between A and B is required to ensure that both A and B start receiving from the protection path. A node in the working path detects a failure event. Such a node must send a failure indication signal towards the source of the connection. This message may be forwarded along the working path, or routed over a different path if the network has general routing intelligence. Mechanisms provided by the lower layer may also be used for this, if available. The action when the source node is notified of a failure is as follows: o Start receiving from the protection path. At the same time, send a message to the destination node to enable switching at the destination. The action when the destination node receives the above message is as follows: o Start receiving from the protection path. At the same time, send an acknowledgement to the source node. (These two messages may be forwarded along the protection path if no other routing intelligence is available in the network) 4.1.1 Identifiers Connection ID: A unique ID for each connection. Source ID: ID of the source (e.g., IP address). Destination ID: ID of the destination (e.g., IP address). Expires on 8/22/01 Page 9 draft-bala-protection-restoration-signaling-00.txt 4.1.2 Nodal Information Each node that is on the working or protection path of a connection must at least have knowledge of the connection identifier, the previous and next nodes in the connection path and the type of protection being afforded to the connection (i.e., 1+1 or shared). This is so that restoration-related messages may be forwarded properly. The optical network may also have additional routing intelligence. In this case, messages may be forwarded along paths different than the connection path. The nodal information may be assembled when the working and protection paths of the connections are provisioned using signaling, or may be configured in the case of NMS-based provisioning. The information must remain until the connection is explicitly de- provisioned. 4.1.3 End-to-End Failure Indication Message This message is sent (reliably) by an intermediate node towards the source of a connection. For instance, such a node might have attempted local span protection and failed. This message may not be necessary if the lower layer provides mechanisms for detection of connection failure by the endpoints. Consider a node detecting a link failure. The node must determine the identities of all connections that are affected by the failure of the link, and send an end-to-end failure indication message to the source of each connection. Each intermediate node receiving such a message must determine the appropriate next node to forward the message such that the message would reach the connection source. Furthermore, if an intermediate node is itself generating a failure indication message, there should be a mechanism to suppress all but one source of failure indication messages. Finally, the failure indication message must be sent reliably from the node detecting the failure to the connection source. Reliability may be achieved, for example, by re-transmitting the message until an acknowledgement is received. 4.1.4 End-to-End Failure Acknowledge Message This message is sent by the source node in response to an End-to-End failure indication message. This message is sent to the originator of the failure indication message. The acknowledge message should be sent for each failure indication message received. Each intermediate node receiving the acknowledge message must forward it towards the destination of the message. 4.1.5 End-to-End Switchover Request Message Expires on 8/22/01 Page 10 draft-bala-protection-restoration-signaling-00.txt This message is generated by the source node receiving an indication of failure in a connection. It is sent to the connection destination, and it carries the Connection ID of the connection being restored. This message must indicate whether the source is able to switch over to the protection path or not. If the source is not able to switchover, the destination may not also switch over. The End-to-End Switchover message must be sent reliably from the source to the destination of the connection. 4.1.6 End-to-End Switchover Response Message This message is sent by the destination node receiving an End-to-End Switchover Request message towards the source of the connection. This message should indicate the Connection ID of the connection being switched over. This message must be transmitted in response to each End-to-End Switchover Request message received. 4.2 Shared Mesh Restoration Shared mesh restoration requires prior soft-reservation of capacity along the protection path [4]. Furthermore, after a failure event, the protection path must be explicitly activated. This requires actions at each intermediate node along the protection path. It is possible that a protection path may not be successfully activated when multiple, concurrent failure events occur. In this case, shared mesh restoration capacity may be claimed for more than one failed connection and the protection path can be activated only for one of them (at most). For implementing shared mesh restoration, the identifier and nodal information related to signaling along the control path are as defined for 1+1 protection in Sections 5.1.1 and 5.1.2. In addition, each node must also keep information needed to establish the data plane of the protection path. This information could be fine- grained, indicating the cross-connect that must be established to activate the protection path for each connection, as follows: { Connection ID, , } The precise nature of the Port, Channel, etc. information would depend on the type of node and connection (The Generalized MPLS signaling draft describes different type of switches [5]). On the other hand, this information could be coarse-grained, indicating { Connection ID, , } Expires on 8/22/01 Page 11 draft-bala-protection-restoration-signaling-00.txt In this case, a specific component link and channel on the TE link is allocated only when the protection path is activated. While the coarser specification allows some flexibility in selection of the precise resource to activate, it also brings in more complexity in decision making and signaling during the time-critical restoration phase. Furthermore, the procedures for the assignment of bandwidth to protection paths must take into account the total resources in a TE link so that single-failure survivability requirements are satisfied. 4.2.1 End-to-End Failure Indication and Acknowledgement The End-to-End failure indication and acknowledgement procedures and messages are as defined in Sections 5.1.3 and 5.1.4. 4.2.2 End-to-End Switchover Request This message is generated by the source node receiving an indication of failure in a connection. It is sent to the connection destination along the protection path, and it carries the Connection ID of the connection being restored. This message must allow intermediate nodes to record whether they are able to activate the (shared) protection path. If any intermediate node is not able to establish cross-connects for the protection path then it is desirable that no other node in the path establishes cross-connects for the path. This would allow shared mesh restoration paths to be efficiently utilized. This requirement implies that switchover to the protection path occurs in two phases: in the forward phases, the Switchover Request message indicates the switching over action to intermediate nodes in the protection path and collects information as to their ability to switch over. In the reverse phase, the actual switchover occurs if all nodes in the path indicate their ability to switch over. The End-to-End Switchover message must be sent reliably from the source to the destination of the connection along the protection path. 4.2.3 End-to-End Switchover Response This message is sent by the destination node receiving an End-to-End Switchover Request message towards the source of the connection, along the protection path. This message should indicate the id of the connection being switched over, and whether all intermediate nodes have agreed to switch over (as determined in the forward phase using the Switchover Request message). This message must be transmitted in response to each End-to-End Switchover Request message received. 5. Reversion and other Administrative Procedures Expires on 8/22/01 Page 12 draft-bala-protection-restoration-signaling-00.txt Reversion refers to the process of moving a connection back to the original working path from its protection path after the former is restored after a failure. Reversion applies both to local span and end-to-end path protected connections. Reversion is desired for the following reasons. First, the routing of the protection path often may not be as efficient as the routing of the working path. Second, moving a connection to its working path allows the protection resources to be used to protect other connections. Reversion implies that a working path remains allocated to the connection that was originally routed over it even after a failure. It is important to have mechanisms that allow reversion to be performed without disrupting service to the customer. This can be achieved if reversion is implemented using a "bridge-and-switch" approach (often referred to as make-before-break). The basic steps involved in bridge-and-switch are: 1. The source node commences the process by "bridging" the signal onto both the working and the protection paths (or links in the case of span protection). 2. Once the bridging process is complete, the source node sends a Bridge and Switch Request message to the destination, identifying the connection and other information necessary to perform reversion. Upon receipt of this message, the destination selects the signal from the working path. At the same time, it bridges the transmitted signal onto both the working and protection paths. 3. The destination then sends a Bridge and Switch Response message to the source confirming the completion of the operation. 4. When the source receives this message, it switches to receive from the working path, and stops transmitting traffic on the protection path. The source then sends a Bridge and Switch Completed message to the destination confirming that the connection has been reverted. 5. Upon receipt of this message, the destination stops transmitting along the protection path and de-activates the connection along this path. The de-activation procedure should remove the cross- connections along the protection path (and frees the resources to be used for restoring other failures. Administrative procedures other than reversion include the ability to force a switchover (from working to protect or vice versa), and locking out switchover, i.e., preventing a connection from moving from working to protect or vice versa administratively. These administrative conditions have to be supported by signaling. 6. Discussion 6.1 Relationship between Local and End-to-End Protection Procedures In general, local protection may be attempted before invoking end- to-end protection. The exception to this is when end-to-end 1+1 protection is used for a connection. In this case, it is better to Expires on 8/22/01 Page 13 draft-bala-protection-restoration-signaling-00.txt directly invoke end-to-end protection since alternate path resources are already active for the connection. Thus, the general guideline that may be considered is to note the protection type of connections in intermediate nodes during provisioning, and invoke local span protection only for working links carrying connections that are not 1+1 protected end-to-end. This implies that when a working link carries more than one connection, all the connections must have the same end-to-end protection type. The provisioning process must ensure this. If this is not possible then local span protection may be invoked for working links that have at least one connection that is not end-to- end 1+1 protected. 6.2 Connection Priorities During Protection The local protection procedure described in this draft switches all the connections on a failed working link onto a protection link. The advantage of this approach is that the signaling between nodes is at the level of links and not at the level of connections. This is beneficial if a link could potentially carry a number of connections. On the other hand, it limits flexibility, since a working link must carry connections of similar priority. Otherwise, it is not possible to ensure that higher priority connections are favored over lower priority connections when a failure event affects more than one working link and there are fewer protection links than the number of failed working links. Also, under the above failure scenario, a decision must be made as to which working links (and therefore connections) are chosen to be protected and in what priority order. In general, a node might detect failures sequentially, i.e., all failed working links may not be detected simultaneously, but only sequentially. In this case, as per the proposed signaling procedures, connections on a working link may be switched over to a given protection link, but another failure (of a working link carrying higher priority connections) may be detected soon afterwards. In this case, the new connections may bump the ones previously switched over the protection link. In the case of end-to-end shared mesh restoration, priorities may be implemented for allocating shared link resources under multiple failure scenarios. Note that shared mesh restoration works under the assumption that the primary path of connections whose backups share resources are SRLG-disjoint [1]. Under single-failure scenarios, this would ensure that exactly one connection will "claim" the allocated (shared) resource. But under multiple failure scenarios, more than one connection can claim shared resources. If such resources are allocated to a lower priority connection, they may have to be reclaimed and allocated to a higher priority connection. Furthermore, the lower priority connection must be de-provisioned along the protection path (this can be done using the signaling mechanisms developed for provisioning, rather than restoration signaling). The proposed signaling mechanisms can support Expires on 8/22/01 Page 14 draft-bala-protection-restoration-signaling-00.txt connection-priority based allocation of shared resources during restoration signaling (specifically, during the Switchover Response step). A way to simplify end-to-end shared mesh restoration is to allocate shared resources to connections of the same priority. This way, a connection will not be first allocated shared resources and then bumped from the protection path. 6.3 Routing Aspects To compute end-to-end protection paths, it is necessary to know which network resources can be used. For end-to-end 1+1 protection, any free resource in the network can be used. In this regard, the computation of the working and the protection paths is similar. For shared mesh restoration, however, it is necessary to know the availability of shareable as well as free resources. Generally, protection paths may share resources if the corresponding working paths will not be affected by the same failure. Thus, to determine shareable resources for a given protection path optimally, it is necessary to know full information about other working paths. Maintaining this sort of information may be suitable in a centralized routing implementation, but it may be not be scaleable under distributed routing. Under distributed routing, heuristics are often used to provision shared protection paths [12]. The specific routing information to be propagated and the signaling for the provisioning of shared protection paths are topics to be dealt with in separate drafts. 6.4 Multi-Domain Restoration When an end-to-end connection follows a path through multiple routing or administrative domains, it may be required to consider an intermediate form of restoration, called "intra-domain end-to-end restoration". With this approach, a failure within a domain would result in end-to-end restoration between the connection ingress and egress points within the domain (perhaps after local span restoration is attempted). When this fails, or if a failure occurs in an inter-domain link, full end-to-end restoration could be attempted (inter-domain links could also be subject to local span protection). This type of a structured approach for restoration is particularly useful in the near term when an optical network may be constructed by interconnecting multi-vendor optical subnetworks [1]. In this case, intra-domain restoration may be proprietary, with standard restoration signaling implemented between border nodes. But this type of restoration also requires some hardware support at the border nodes. 6.5 Optical mesh restoration and MPLS-based recovery Expires on 8/22/01 Page 15 draft-bala-protection-restoration-signaling-00.txt Over the past year or so, there has been considerable work on MPLS-based recovery under the auspices of the MPLS WG (see, for example, [6-11]), with a framework document [6] being adopted as a WG document. The terminology outlined at the start of this document is also explained in the MPLS-recovery framework document [6], in the context of MPLS LSP-based recovery. The failure indication message of Section 4, is quite similar to the failure indication signal (FIS) defined in [7], and elaborated on in [10] and [11]. A difference between the schemes and message formats discussed in this document and those presented in [7], [10], and [11], is that these documents focus primarily on MPLS LSP restoration. As such, the messages defined therein contain explicit label information for packet LSPs, which is not required in optical networks. Further, [7] does not specifically cover the case of the coordinated signaling required for local span protection and for M:N protection with pooled protection links, which are central to this proposal. 6.6 Implementation Considerations As described in this draft, restoration signaling does not require any central actions (such as admission control or centralized resource allocation) within a node for end-to-end protection. Local span protection may require the consideration of all available protection link resources at the master. End-to-end protection, which is more difficult from a latency perspective, can be controlled by distributing multiple, independent protocol instances in an node such that each instance covers a subset of connections passing through an node. Such optimizations would depend on the architecture of the systems implementing the proposed protocol. 7. Conclusion In this draft, the signaling message flows for protection and restoration in optical mesh networks was described. The types of protection modes considered were local span protection and end-to- end protection, 1+1 and shared. Specific protocol realization of the message flows will be described in other drafts. 8. References 1. B. Rajagopalan, et al., "IP over Optical Networks: A Framework", draft-ietf-ipo -framework-00.txt. 2. W.S Lai, et al., "Network Hierarchy and Multilayer Survivability," Internet Draft, draft-team-tewg-restore-hierarchy-00.txt, July, 2001. Expires on 8/22/01 Page 16 draft-bala-protection-restoration-signaling-00.txt 3. J. P. Lang, et al, "Link Management Protocol", draft-ietf-mpls- lmp-02.txt. 4. G. Li, et. al., "RSVP-TE Extensions For Shared-Mesh Restoration in Transport Networks," draft--li-shared-mesh-restoration-00.txt. 5. P. Ashwood-Smith, et al., "Generalized MPLS: Signaling Functional Specification," draft-ietf-mpls-generalized-signaling-06.txt. 6. Makam, et al, "Framework for MPLS-based Recovery," draft-ietf- mpls-recovery-frmwrk-03.txt. 7. K. Owens et al, "A Path Protection/Restoration Mechanism for MPLS Networks," draft-chang-mpls-path-protection-03.txt. 8. Kini, S., et al, "Shared Backup Label Switched Path Restoration," draft-kini-restoration-shared-backup-01.txt. 9. Hellstrand, F., and Andersson, L., "Extensions to CR-LDP and RSVP- TE for setup of pre-established recovery tunnels," draft- hellstrand-recovery-merge-01.txt. 10. K. Owens et al, "Extensions to RSVP-TE for MPLS Path Protection," draft-chang-mpls-rsvpte-path-protection-ext-01.txt. 11. K. Owens et al, "Extensions to CR-LDP for MPLS Path Protection," draft-owens-mpls-crldp-path-protection-ext-01.txt. 12. S. Sengupta and R. Ramamurthy, "Capacity Efficient Distributed Routing of Mesh-Restored Lightpaths in Optical Networks," Proc. IEEE Globecom 2001, November, 2001. Expires on 8/22/01 Page 17 draft-bala-protection-restoration-signaling-00.txt 9. Author Information Bala Rajagopalan Greg Bernstein Debanjan Saha Ciena Corp. Tellium, Inc. 10480 Ridgeview Ct. 2 Crescent Pl. Cupertino, CA 94014 Ocean Port, NJ 07757 Email: Greg@ciena.com Email: {braja, dsaha}@tellium.com Vishal Sharma Ayan Banerjee Metanoia, Inc. John Drake 305 Elan Village Lane, Unit 121 Jonathan Lang San Jose, CA 95134 Calient Networks Email: V.Sharma@ieee.org 5853 Rue Ferrari San Jose, CA 95138 Email: {abanerjee, Jdrake, jplang}@calient.net Jennifer Yates Guangzhi Li AT&T 180 Park Ave. Florham Park, NJ 07932 Email: {jyates, gli}@research.att.com Expires on 8/22/01 Page 18