Internet Draft                                  Bala Rajagopalan
draft-bala-protection-restoration-signaling-    Debanjan Saha
00.txt                                            Tellium, Inc.
Expires on: 5/14/2002                           G. Bernstein
                                                  Ciena Corp.
                                                Vishal Sharma
                                                  Metanoia, Inc.
                                                Ayan Banerjee
                                                John Drake
                                                Jonathan Lang
                                                  Calient Networks
                                                Jennifer Yates
                                                Guangzhi Li
                                                  AT&T


   Signaling for Protection and Restoration in Optical Mesh Networks


Status of this Memo


   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026 except that the right to
   produce derivative works is not granted.

   Internet Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of
   six months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet- Drafts
   as reference material or to cite them other than as "work in
   progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


1. Abstract

   Protection and restoration of switched connections under tight time
   constraints is a challenging problem in optical mesh networks. This
   draft describes different local and end-to-end protection modes for
   connections, and the message flow required for protection and
   restoration-related signaling.


                         Expires on 5/14/2002                   Page 1

          draft-bala-protection-restoration-signaling-00.txt


2. Introduction

   Protection and restoration of switched connections under tight time
   constraints is a challenging problem in optical mesh networks. Such
   a network consists of optical or photonic cross-connects (referred
   to as "nodes") connected in a general topology [1]. Restoration
   typically involves the activation of an alternate (or "protection")
   path for a connection when a failure is encountered in the primary
   (or working) path. A path for a connection (working or protection)
   is characterized by an ingress port, an egress port, and a set of
   intermediate nodes and links through which the connection is routed.
   The working and protection paths are typically resource-disjoint
   (e.g., node or link disjoint) other than the ingress and egress
   ports which remain the same.

   A bi-directional link between neighboring nodes is usually realized
   as a pair of unidirectional links. The end-to-end path for a bi-
   directional connection therefore consists of a series of bi-
   directional segments between the source and destination nodes,
   traversing intermediate nodes.

   The following distinction is made between the terms "protection" and
   "restoration", even though these terms are often used
   interchangeably [2]. Protection is defined as the paradigm whereby a
   dedicated protection path is pre-established for a connection, and
   the connection is merely switched at the endpoints from the working
   to the protection path after a failure. The term restoration, on the
   other hand, is used to denote the paradigm whereby a protection path
   for a connection may be selected apriori, but its establishment
   occurs only after a failure in the working path. This distinction is
   subtle, and both protection and restoration require signaling.

   Protection can be "local span" or "end-to-end". Local span
   protection refers to the protection of the link (and hence
   connection segments routed over the link) between two neighboring
   switches. End-to-end protection refers to the protection of an
   entire connection from the ingress to the egress port. A connection
   may be subject to both local span protection (for each of its
   segments) and end-to-end protection (when local protection does not
   succeed or is not desired). Under local span and end-to-end
   protection schemes, it may be required that when a failure affects
   any one direction of the connection, both directions of the
   connection are switched to a new link or path, respectively. In the
   following, therefore, any reference to a "link" indicates a bi-
   directional link (realized as a pair of uni-directional links),
   unless noted otherwise.

2.1 Local Span Protection

   Considering local-span protection, suppose a connection segment is
   routed over link i between two nodes A and B. The following
   protection modes may be used:


                          Expires on 8/22/01                    Page 2

          draft-bala-protection-restoration-signaling-00.txt


        1+1 (unidirectional): A dedicated link j is pre-assigned to
        protect working link i. Connection traffic is simultaneously
        sent on both links and received independently by A and B from
        one of the functioning links, i or j. Thus, it is possible that
        nodes A and B may be receiving traffic from different links.

        1+1 (bi-directional): A dedicated link j is pre-assigned to
        protect link i. Connection traffic is simultaneously sent on
        both links and under normal conditions, the traffic from link i
        is received by nodes A and B (in the appropriate directions). A
        failure affecting link i results in both A and B switching to
        the traffic on link j in the respective directions.

        1:N: A dedicated link j between A and B is pre-assigned to
        protect a set of N links (which includes i). A failure
        affecting any link in this set results in the corresponding
        traffic being restored to link j. Clearly, if more than one
        link in the set of N links are concurrently affected by
        failures, the traffic on only one of the N links may be
        restored over link j.

        M:N (with pre-configured protection groups): A protection group
        of M+N links consists of a set of N links protected by a set of
        M other links, with M < N. (Link i must be one of the N links
        in a protection group, and link j is one of the M links). A
        failure in any of the N links results in traffic being switched
        to one of the (available) M links. The number of protection
        groups between A and B, the value of M and N, as well as the
        specific links in each protection group are pre-configured.
        Since M < N, it is possible that not all failed links in the
        set of N links may be protected from the same failure event.

        M:N (pooled protection links): Under this mode, a total of M
        links are assigned to protect a total of N other links between
        A and B, where M+N is the *total* number of links between A and
        B. (Link i must be one of the set of N links, and link j is one
        of the set of M links). This mode thus differs from the
        previous where there could be multiple hard-configured
        protection groups and working links in one group cannot be
        protected by protection links in another group. Furthermore,
        under this mode, the number of protection links need not be
        pre-configured and may vary depending on the demand from
        working traffic. Indeed, any available link can be used for
        protection purposes during a failure event. Thus, this is a
        more general and flexible protection mode compared to the
        previous. For administrative purposes, however, the value of M,
        N, and the specific links in the set of M protection links may
        be pre-determined. As before, since M < N, it is possible that
        not all failed links in the set of N links may be protected
        from the same failure event.


                          Expires on 8/22/01                    Page 3

          draft-bala-protection-restoration-signaling-00.txt


   Since M:N (with pre-configured protection groups) and 1:N are
   special cases of M:N (pooled protection links), we focus on the
   signaling requirements for M:N protection with pooled protection
   links.

2.1 End-to-End Protection and Restoration

   Considering end-to-end protection, suppose a connection's primary
   path is from an ingress port in node A to an egress port in node B
   over a set of intermediate nodes. The following protection and
   restoration modes may be used:

        1+1 (unidirectional) protection: A dedicated, resource-disjoint
        alternate path is pre-established to protect the connection.
        Connection traffic is simultaneously sent on both paths and
        received from one of the functional paths by the end nodes, A
        and B.

        1+1 (bi-directional) protection: A dedicated, resource-disjoint
        alternate path is pre-established to protect the connection.
        Connection traffic is simultaneously sent on both paths; under
        normal conditions, the traffic from the primary path is
        received by nodes A and B (in the appropriate directions). A
        failure affecting the primary path results in both A and B
        switching to the traffic on the back-up path in the respective
        directions

        Shared mesh restoration: An alternate path is pre-assigned to
        protect the connection, but the resources along the alternate
        path may be shared among multiple connections being protected
        (based on criteria described later). In this case, the
        resources are allocated in real-time for one of the protected
        connections whose primary path is affected by a failure. If
        more than one connection sharing a resource is concurrently
        affected by a failure, only one of them will be allocated  the
        shared resource.

   New protocol mechanisms are required to realize both 1+1 (bi-
   directional) protection and shared mesh restoration in optical
   networks. Specifically, 1+1 (bi-directional) protection requires
   coordination between the end nodes to switch to the protection path,
   and shared mesh restoration additionally involves the intermediate
   nodes in the protection path.

   The aim of this draft is to define the message flows for M:N pooled
   local span protection, 1+1 (bi-directional)protection and shared
   mesh restoration. The main requirements on these protocols are
   simplicity and speed. The latency requirement on switching to
   protection paths is typically specified in tens to hundreds of
   milliseconds, the performance depending on the number of hops
   involved [2].


                          Expires on 8/22/01                    Page 4

          draft-bala-protection-restoration-signaling-00.txt


   Sections 3 and 4 describe local span and end-to-end protection
   protocols in detail. Section 5 describes certain administrative
   procedures related to restoration. Section 6 presents some
   discussion items and Section 7 presents the conclusions.


3. Local Span Protection

   Local span protection is described with respect to two neighboring
   nodes A and B. The scenario considered for local span protection
   (M:N with pooled protection links) is as follows:

   o    At any point in time, there are two sets of links between A and
        B, i.e., a working set of N (bi-directional) links carrying
        traffic subject to protection and a protection set of M (bi-
        directional) links. A protection link may have no traffic on
        it, or it may be carrying traffic that could be preempted.
        There is no a priori relationship between the two sets of
        links, but the value of M and N may be pre-configured. The
        specific links in the protection set MAY be pre-configured to
        be physically diverse to avoid the possibility that failure
        events affect a large proportion of protection links (along
        with working links).

   o    When a link in the working set is affected by a failure, the
        traffic on it is diverted to a link in the protection set, if
        such a link is available. Note that such a link might consist
        of more than one connection e.g., an OC-192 link carrying four
        OC-48 connections.

   o    More than one link in the working set may be affected by the
        same failure event. In this case, there may not be an adequate
        number of protection links to accommodate all of the affected
        traffic carried by failed working links. The set of affected
        working links that are actually restored over available
        protection links is then subject to policies (e.g., based on
        relative priority of working traffic). These policies are not
        specified in this draft.

   o    Each node is assumed to have an identifier, called the Node ID.
        Each node is also assumed to have the mapping of its local link
        (or port) ID to the corresponding ID at the neighbor. This
        mapping could be configured, or obtained automatically using a
        neighbor discovery procedure (e.g., LMP [3]).

   o    When traffic must be diverted from a failed link in the working
        set to a protection link, the decision as to which protection
        link is chosen is always made by one of the nodes, A or B. As
        per this draft, the node with the numerically higher Node ID is
        considered the "master" and it is required to both apply any
        policies and select specific protection links to divert working
        traffic. The other node is considered the "slave". The
        determination of the master and the slave may be based on

                          Expires on 8/22/01                    Page 5

          draft-bala-protection-restoration-signaling-00.txt


        configured information, or as a result of running a neighbor
        discovery procedure.

   o    Failure events themselves are assumed to be detected by lower
        layer mechanisms (e.g., SONET). Since the bi-directional links
        are formed by a pair of unidirectional links, a failure in the
        link from A to B is typically detected by B and a failure in
        the opposite direction is detected by A. It is possible that a
        failure simultaneously affects both directions of the bi-
        directional link. In this case, A and B will concurrently
        detect failures, in the B-to-A direction and in the A-to-B
        direction, respectively.

   The basic steps in local span protection are as follows:

   1.   If the master detects a failure of a working link, it
        autonomously invokes a process to allocate a protection link to
        the affected traffic.

   2.   If the slave detects a failure of a working link, it must
        inform the master of the failure. The master then invokes the
        same procedure as above to allocate a protection link. (It is
        possible that the master has itself detected the same failure,
        for example, a failure simultaneously affecting both directions
        of a link).

   3.   Once the master has determined the identity of the protection
        link, it indicates this to the slave and requests the
        switchover of the traffic. Prior to this, if the protection
        link is carrying traffic that could be preempted, the master
        stops using the link for this traffic (i.e., the traffic is
        dropped by the master and not forwarded into or out of the
        protection link).

   4.   The slave sends an acknowledgement to the master. Prior to
        this, if the selected protection link is carrying traffic that
        could be preempted, the slave stops using the link for this
        traffic (i.e., the traffic is dropped by the slave and not
        forwarded into or out of the protection link). It then starts
        sending the (failed) working link traffic on the selected
        protection link.

   5.   When the master receives the acknowledgement, it starts sending
        and receiving the (failed) working link traffic over the new
        link.

   From the description above, it is clear that local span restoration
   may require up to three messages for each working link being
   switched: a failure indication message, a switchover request message
   and a switchover response message. The following identifiers are
   also needed:

3.1  Identifiers

                          Expires on 8/22/01                    Page 6

          draft-bala-protection-restoration-signaling-00.txt


   Node ID: An identifier that uniquely identifies each node in the
   network.

   Link ID: An identifier that uniquely identifies a bi-directional
   link at the sending and the receiving node.

   The messages are as follows. All these messages must be transmitted
   reliably from the message source to the message destination (master
   or slave).

3.2 Failure Indication Message

   This message is sent from the slave to the master to indicate the
   failure of one or more working links. (This message may not be
   necessary when the underlying link technology itself provides for
   such a notification).

   The number of links included in the message would depend on the
   number of failures detected within a window of time by the sending
   node. A node may choose to send separate failure indication messages
   in the interest of completing the restoration for a given link
   within an implementation-dependent time constraint.

   The ID of the failed link is the identification used at the slave
   node. The master must convert this to the corresponding ID at its
   side.


3.3  Switchover Request Message

   This message is sent from the master to the slave (reliably) to
   indicate whether the traffic on the failed working link can be
   switched to a free link. If so, the ID of the free link must be
   indicated.

   The link IDs are based on the identification used at the master. The
   slave must convert them to the corresponding local IDs. The message
   ID uniquely identifies the message at the master.

   A link being protected may carry multiple connections. Since the
   entire working link is switched to a protection link, it may be
   possible for the connections on the working link to be mapped to the
   protection link by the master and slave without coordination (e.g.,
   if the channel assignments (i.e., "labels") are the same on the
   working and protect links). Optionally, if it is necessary, the
   channel assignments (labels) may be explicitly coordinated between
   the master and the slave (e.g., when a smaller capacity link is
   protected by a larger capacity link). In this case, the Switchover
   Request message should carry the new label mappings selected by the
   master.

                          Expires on 8/22/01                    Page 7

          draft-bala-protection-restoration-signaling-00.txt


   The master may not be able to find protection lines to accommodate
   all failed working links. Thus, if this message is generated in
   response to a Failure Indication message from the slave then the set
   of failed links in the message may be a sub-set of the links
   received in the Failure Indication message. Depending on time
   constraints, the master may switch the set of failed links in
   smaller batches. Thus, A failure event may result in the master
   sending more than one Switchover Request message to the same slave
   node.

3.4  Switchover Response Message

   This message is sent from the slave to the master (reliably) to
   indicate the completion (or failure) of switchover at the slave.

   In this message, the slave may indicate that it cannot switch over
   to the corresponding free link for some reason. The action to be
   taken by the master in this case is undefined (for example, the
   master may abort the switchover of the traffic on the failed working
   link, and perhaps trigger end-to-end protection).

3.5  Preventing Unintended Connections

   An unintended connection occurs when traffic from the wrong source
   is delivered to a receiver. These should be prevented during
   protection switching. This is a concern only when the protection
   link is being used to carry (unprotected) traffic that could be
   preempted. In this case, it must be ensured that the traffic being
   switched from the failed working link to the protection link is not
   delivered to the receiver of the traffic preempted. Thus, in the
   message flow described above, the master should disconnect (any)
   preempted traffic on the selected protection link before sending the
   Switchover Request. The slave should also disconnect preempted
   traffic before sending the Switchover Response. In addition, the
   slave should start receiving traffic for the protected connection
   from the protection link. Finally, the master should start sending
   protected traffic on the protection link upon receipt of the
   Switchover Response.


4.  End-to-End Protection

   One of the significant differences between end-to-end protection and
   local span protection (as considered in this draft) is that the
   former is on a per-connection basis while the latter is on a per-
   link basis. In other words, span protection switches over the entire
   traffic on a link which may consist of multiple connections. End-to-
   end protection, on the other hand, switches over individual
   connections. In this case, there is a working connection path and a
   protection path.


                          Expires on 8/22/01                    Page 8

          draft-bala-protection-restoration-signaling-00.txt


   Another difference between end-to-end and local protection is that
   signaling messages may have to be transmitted multiple hops to
   effect restoration. The signaling messages are transmitted to the
   source of the connection. The messages are typically forwarded along
   the connection path, working or protection, where it is assumed that
   there is a control channel between each pair of intermediate nodes.
   If the optical network has routing intelligence, some of these
   messages can also be routed over other paths.

   There are two cases to be considered: signaling for bi-directional
   1+1 protection and for shared mesh restoration. The description
   below is in the context of an end-to-end connection between a source
   node A and a destination node B.

4.1  Bi-directional 1+1 Protection

   Under bi-directional 1+1 protection, the connection traffic is being
   sent on both working and protection paths by A and B, but received
   only from the working path. After a failure event, signaling between
   A and B is required to ensure that both A and B start receiving from
   the protection path.

   A node in the working path detects a failure event. Such a node must
   send a failure indication signal towards the source of the
   connection. This message may be forwarded along the working path, or
   routed over a different path if the network has general routing
   intelligence. Mechanisms provided by the lower layer may also be
   used for this, if available.

   The action when the source node is notified of a failure is as
   follows:

   o    Start receiving from the protection path. At the same time,
        send a message to the destination node to enable switching at
        the destination.

   The action when the destination node receives the above message is
   as follows:

   o    Start receiving from the protection path. At the same time,
        send an acknowledgement to the source node.

   (These two messages may be forwarded along the protection path if no
   other routing intelligence is available in the network)

4.1.1  Identifiers

   Connection ID: A unique ID for each connection.

   Source ID: ID of the source (e.g., IP address).

   Destination ID: ID of the destination (e.g., IP address).


                          Expires on 8/22/01                    Page 9

          draft-bala-protection-restoration-signaling-00.txt


4.1.2  Nodal Information

   Each node that is on the working or protection path of a connection
   must at least have knowledge of the connection identifier, the
   previous and next nodes in the connection path and the type of
   protection being afforded to the connection (i.e., 1+1 or shared).
   This is so that restoration-related messages may be forwarded
   properly. The optical network may also have additional routing
   intelligence. In this case, messages may be forwarded along paths
   different than the connection path.

   The nodal information may be assembled when the working and
   protection paths of the connections are provisioned using signaling,
   or may be configured in the case of NMS-based provisioning. The
   information must remain until the connection is explicitly de-
   provisioned.

4.1.3  End-to-End Failure Indication Message

   This message is sent (reliably) by an intermediate node towards the
   source of a connection. For instance, such a node might have
   attempted local span protection and failed. This message may not be
   necessary if the lower layer provides mechanisms for detection of
   connection failure by the endpoints.

   Consider a node detecting a link failure. The node must determine
   the identities of all connections that are affected by the failure
   of the link, and send an end-to-end failure indication message to
   the source of each connection. Each intermediate node receiving such
   a message must determine the appropriate next node to forward the
   message such that the message would reach the connection source.
   Furthermore, if an intermediate node is itself generating a failure
   indication message, there should be a mechanism to suppress all but
   one source of failure indication messages. Finally, the failure
   indication message must be sent reliably from the node detecting the
   failure to the connection source. Reliability may be achieved, for
   example, by re-transmitting the message until an acknowledgement is
   received.

4.1.4  End-to-End Failure Acknowledge Message

   This message is sent by the source node in response to an End-to-End
   failure indication message. This message is sent to the originator
   of the failure indication message. The acknowledge message should be
   sent for each failure indication message received.

   Each intermediate node receiving the acknowledge message must
   forward it towards the destination of the message.

4.1.5  End-to-End Switchover Request Message


                          Expires on 8/22/01                   Page 10

          draft-bala-protection-restoration-signaling-00.txt


   This message is generated by the source node receiving an indication
   of failure in a connection. It is sent to the connection
   destination, and it carries the Connection ID of the connection
   being restored. This message must indicate whether the source is
   able to switch over to the protection path or not. If the source is
   not able to switchover, the destination may not also switch over.

   The End-to-End Switchover message must be sent reliably from the
   source to the destination of the connection.

4.1.6  End-to-End Switchover Response Message

   This message is sent by the destination node receiving an End-to-End
   Switchover Request message towards the source of the connection.
   This message should indicate the Connection ID of the connection
   being switched over.

   This message must be transmitted in response to each End-to-End
   Switchover Request message received.

4.2  Shared Mesh Restoration

   Shared mesh restoration requires prior soft-reservation of capacity
   along the protection path [4]. Furthermore, after a failure event,
   the protection path must be explicitly activated. This requires
   actions at each intermediate node along the protection path. It is
   possible that a protection path may not be successfully activated
   when multiple, concurrent failure events occur. In this case, shared
   mesh restoration capacity may be claimed for more than one failed
   connection and the protection path can be activated only for one of
   them (at most).

   For implementing shared mesh restoration, the identifier and nodal
   information related to signaling along the control path are as
   defined for 1+1 protection in Sections 5.1.1 and 5.1.2. In addition,
   each node must also keep information needed to establish the data
   plane of the protection path. This information could be fine-
   grained, indicating the cross-connect that must be established to
   activate the protection path for each connection, as follows:

   {    Connection ID, <Incoming Port, Channel etc>, <Outgoing Port,
        Channel, etc>  }

   The precise nature of the Port, Channel, etc. information would
   depend on the type of node and connection (The Generalized MPLS
   signaling draft describes different type of switches [5]).

   On the other hand, this information could be coarse-grained,
   indicating

   { Connection ID, <Incoming TE link>, <Outgoing TE link> }


                          Expires on 8/22/01                   Page 11

          draft-bala-protection-restoration-signaling-00.txt


   In this case, a specific component link and channel on the TE link
   is allocated only when the protection path is activated. While the
   coarser specification allows some flexibility in selection of the
   precise resource to activate, it also brings in more complexity in
   decision making and signaling during the time-critical restoration
   phase. Furthermore, the procedures for the assignment of bandwidth
   to protection paths must take into account the total resources in a
   TE link so that single-failure survivability requirements are
   satisfied.

4.2.1  End-to-End Failure Indication and Acknowledgement

   The End-to-End failure indication and acknowledgement procedures and
   messages are as defined in Sections 5.1.3 and 5.1.4.

4.2.2  End-to-End Switchover Request

   This message is generated by the source node receiving an indication
   of failure in a connection. It is sent to the connection destination
   along the protection path, and it carries the Connection ID of the
   connection being restored. This message must allow intermediate
   nodes to record whether they are able to activate the (shared)
   protection path. If any intermediate node is not able to establish
   cross-connects for the protection path then it is desirable that no
   other node in the path establishes cross-connects for the path. This
   would allow shared mesh restoration paths to be efficiently
   utilized. This requirement implies that switchover to the protection
   path occurs in two phases: in the forward phases, the Switchover
   Request message indicates the switching over action to intermediate
   nodes in the protection path and collects information as to their
   ability to switch over. In the reverse phase, the actual switchover
   occurs if all nodes in the path indicate their ability to switch
   over.

   The End-to-End Switchover message must be sent reliably from the
   source to the destination of the connection along the protection
   path.


4.2.3 End-to-End Switchover Response

   This message is sent by the destination node receiving an End-to-End
   Switchover Request message towards the source of the connection,
   along the protection path. This message should indicate the id of
   the connection being switched over, and whether all intermediate
   nodes have agreed to switch over  (as determined in the forward
   phase using the Switchover Request message).

   This message must be transmitted in response to each End-to-End
   Switchover Request message received.

5. Reversion and other Administrative Procedures


                          Expires on 8/22/01                   Page 12

          draft-bala-protection-restoration-signaling-00.txt


   Reversion refers to the process of moving a connection back to the
   original working path from its protection path after the former is
   restored after a failure. Reversion applies both to local span and
   end-to-end path protected connections. Reversion is desired for the
   following reasons. First, the routing of the protection path often
   may not be as efficient as the routing of the working path.  Second,
   moving a connection to its working path allows the protection
   resources to be used to protect other connections.

   Reversion implies that a working path remains allocated to the
   connection that was originally routed over it even after a failure.
   It is important to have mechanisms that allow reversion to be
   performed without disrupting service to the customer. This can be
   achieved if reversion is implemented using a "bridge-and-switch"
   approach (often referred to as make-before-break).

   The basic steps involved in bridge-and-switch are:

   1.  The source node commences the process by "bridging" the signal
       onto both the working and the protection paths (or links in the
       case of span protection).
   2.  Once the bridging process is complete, the source node sends a
       Bridge and Switch Request message to the destination, identifying
       the connection and other information necessary to perform
       reversion. Upon receipt of this message, the destination selects
       the signal from the working path. At the same time, it bridges the
       transmitted signal onto both the working and protection paths.
   3.  The destination then sends a Bridge and Switch Response message to
       the source confirming the completion of the operation.
   4.  When the source receives this message, it switches to receive from
       the working path, and stops transmitting traffic on the protection
       path. The source then sends a Bridge and Switch Completed message
       to the destination confirming that the connection has been
       reverted.
   5.  Upon receipt of this message, the destination stops transmitting
       along the protection path and de-activates the connection along
       this path. The de-activation procedure should remove the cross-
       connections along the protection path (and frees the resources to
       be used for restoring other failures.

   Administrative procedures other than reversion include the ability
   to force a switchover (from working to protect or vice versa), and
   locking out switchover, i.e., preventing a connection from moving
   from working to protect or vice versa administratively. These
   administrative conditions have to be supported by signaling.

6.  Discussion

6.1  Relationship between Local and End-to-End Protection Procedures

   In general, local protection may be attempted before invoking end-
   to-end protection. The exception to this is when end-to-end 1+1
   protection is used for a connection. In this case, it is better to


                          Expires on 8/22/01                   Page 13

          draft-bala-protection-restoration-signaling-00.txt


   directly invoke end-to-end protection since alternate path resources
   are already active for the connection.

   Thus, the general guideline that may be considered is to note the
   protection type of connections in intermediate nodes during
   provisioning, and invoke local span protection only for working
   links carrying connections that are not 1+1 protected end-to-end.
   This implies that when a working link carries more than one
   connection, all the connections must have the same end-to-end
   protection type. The provisioning process must ensure this. If this
   is not possible then local span protection may be invoked for
   working links that have at least one connection that is not end-to-
   end 1+1 protected.

6.2   Connection Priorities During Protection

   The local protection procedure described in this draft switches all
   the connections on a failed working link onto a protection link. The
   advantage of this approach is that the signaling between nodes is at
   the level of links and not at the level of connections. This is
   beneficial if a link could potentially carry a number of
   connections. On the other hand, it limits flexibility, since a
   working link must carry connections of similar priority. Otherwise,
   it is not possible to ensure that higher priority connections are
   favored over lower priority connections when a failure event affects
   more than one working link and there are fewer protection links than
   the number of failed working links.

   Also, under the above failure scenario, a decision must be made as
   to which working links (and therefore connections) are chosen to be
   protected and in what priority order. In general, a node might
   detect failures sequentially, i.e., all failed working links may not
   be detected simultaneously, but only sequentially. In this case, as
   per the proposed signaling procedures, connections on a working link
   may be switched over to a given protection link, but another failure
   (of a working link carrying higher priority connections) may be
   detected soon afterwards. In this case, the new connections may bump
   the ones previously switched over the protection link.

   In the case of end-to-end shared mesh restoration, priorities may be
   implemented for allocating shared link resources under multiple
   failure scenarios. Note that shared mesh restoration works under the
   assumption that the primary path of connections whose backups share
   resources are SRLG-disjoint [1]. Under single-failure scenarios,
   this would ensure that exactly one connection will "claim" the
   allocated (shared) resource. But under multiple failure scenarios,
   more than one connection can claim shared resources. If such
   resources are allocated to a lower priority connection, they may
   have to be reclaimed and allocated to a higher priority connection.
   Furthermore, the lower priority connection must be de-provisioned
   along the protection path (this can be done using the signaling
   mechanisms developed for provisioning, rather than restoration
   signaling). The proposed signaling mechanisms can support

                          Expires on 8/22/01                   Page 14

          draft-bala-protection-restoration-signaling-00.txt


   connection-priority based allocation of shared resources during
   restoration signaling (specifically, during the Switchover Response
   step).

   A way to simplify end-to-end shared mesh restoration is to allocate
   shared resources to connections of the same priority. This way, a
   connection will not be first allocated shared resources and then
   bumped from the protection path.

6.3  Routing Aspects

   To compute end-to-end protection paths, it is necessary to know
   which network resources can be used. For end-to-end 1+1 protection,
   any free resource in the network can be used. In this regard, the
   computation of the working and the protection paths is similar. For
   shared mesh restoration, however, it is necessary to know the
   availability of shareable as well as free resources. Generally,
   protection paths may share resources if the corresponding working
   paths will not be affected by the same failure. Thus, to determine
   shareable resources for a given protection path optimally, it is
   necessary to know full information about other working paths.
   Maintaining this sort of information may be suitable in a
   centralized routing implementation, but it may be not be scaleable
   under distributed routing. Under distributed routing, heuristics are
   often used to provision shared protection paths [12]. The specific
   routing information to be propagated and the signaling for the
   provisioning of shared protection paths are topics to be dealt with
   in separate drafts.

6.4  Multi-Domain Restoration

   When an end-to-end connection follows a path through multiple
   routing or administrative domains, it may be required to consider an
   intermediate form of restoration, called "intra-domain end-to-end
   restoration". With this approach, a failure within a domain would
   result in end-to-end restoration between the connection ingress and
   egress points within the domain (perhaps after local span
   restoration is attempted). When this fails, or if a failure occurs
   in an inter-domain link, full end-to-end restoration could be
   attempted (inter-domain links could also be subject to local span
   protection).

   This type of a structured approach for restoration is particularly
   useful in the near term when an optical network may be constructed
   by interconnecting multi-vendor optical subnetworks [1]. In this
   case, intra-domain restoration may be proprietary, with standard
   restoration signaling implemented between border nodes. But this
   type of restoration also requires some hardware support at the
   border nodes.

6.5 Optical mesh restoration and MPLS-based recovery


                          Expires on 8/22/01                   Page 15

          draft-bala-protection-restoration-signaling-00.txt


     Over the past year or so, there has been considerable work on
     MPLS-based recovery under the auspices of the MPLS WG (see, for
     example, [6-11]), with a framework document [6] being adopted as a
     WG document.

     The terminology outlined at the start of this document is also
     explained in the MPLS-recovery framework document [6], in the
     context of MPLS LSP-based recovery.

     The failure indication message of Section 4, is quite similar to
     the failure indication signal (FIS) defined in [7], and elaborated
     on in [10] and [11]. A difference between the schemes and message
     formats discussed in this document and those presented in [7],
     [10], and [11], is that these documents focus primarily on MPLS
     LSP restoration. As such, the messages defined therein contain
     explicit label information for packet LSPs, which is not required
     in optical networks. Further, [7] does not specifically cover the
     case of the coordinated signaling required for local span
     protection and for M:N protection with pooled protection links,
     which are central to this proposal.

6.6  Implementation Considerations

   As described in this draft, restoration signaling does not require
   any central actions (such as admission control or centralized
   resource allocation) within a node for end-to-end protection. Local
   span protection may require the consideration of all available
   protection link resources at the master. End-to-end protection,
   which is more difficult from a latency perspective, can be
   controlled by distributing multiple, independent protocol instances
   in an node such that each instance covers a subset of connections
   passing through an node. Such optimizations would depend on the
   architecture of the systems implementing the proposed protocol.


7.  Conclusion

   In this draft, the signaling message flows for protection and
   restoration in optical mesh networks was described. The types of
   protection modes considered were local span protection and end-to-
   end protection, 1+1 and shared. Specific protocol realization of the
   message flows will be described in other drafts.


8. References


   1. B. Rajagopalan, et al., "IP over Optical Networks: A Framework",
      draft-ietf-ipo -framework-00.txt.

   2. W.S Lai, et al., "Network Hierarchy and Multilayer Survivability,"
      Internet Draft, draft-team-tewg-restore-hierarchy-00.txt, July,
      2001.

                          Expires on 8/22/01                   Page 16

          draft-bala-protection-restoration-signaling-00.txt


   3. J. P. Lang, et al, "Link Management Protocol", draft-ietf-mpls-
      lmp-02.txt.

   4. G. Li, et. al., "RSVP-TE Extensions For Shared-Mesh Restoration in
      Transport Networks," draft--li-shared-mesh-restoration-00.txt.

   5. P. Ashwood-Smith, et al., "Generalized MPLS: Signaling Functional
      Specification," draft-ietf-mpls-generalized-signaling-06.txt.

   6. Makam, et al, "Framework for MPLS-based Recovery," draft-ietf-
      mpls-recovery-frmwrk-03.txt.

   7. K. Owens et al, "A Path Protection/Restoration Mechanism for MPLS
      Networks," draft-chang-mpls-path-protection-03.txt.

   8. Kini, S., et al, "Shared Backup Label Switched Path Restoration,"
      draft-kini-restoration-shared-backup-01.txt.

   9. Hellstrand, F., and Andersson, L., "Extensions to CR-LDP and RSVP-
      TE for setup of pre-established recovery tunnels," draft-
      hellstrand-recovery-merge-01.txt.

   10. K. Owens et al, "Extensions to RSVP-TE for MPLS Path
       Protection," draft-chang-mpls-rsvpte-path-protection-ext-01.txt.

   11. K. Owens et al, "Extensions to CR-LDP for MPLS Path
       Protection," draft-owens-mpls-crldp-path-protection-ext-01.txt.

   12. S. Sengupta and R. Ramamurthy, "Capacity Efficient Distributed
       Routing of Mesh-Restored Lightpaths in Optical Networks," Proc.
       IEEE Globecom 2001, November, 2001.


                          Expires on 8/22/01                   Page 17

          draft-bala-protection-restoration-signaling-00.txt


9. Author Information

   Bala Rajagopalan                       Greg Bernstein
   Debanjan Saha                          Ciena Corp.
     Tellium, Inc.                        10480 Ridgeview Ct.
     2 Crescent Pl.                       Cupertino, CA 94014
     Ocean Port, NJ 07757                 Email: Greg@ciena.com
     Email: {braja, dsaha}@tellium.com

   Vishal Sharma                          Ayan Banerjee
   Metanoia, Inc.                         John Drake
   305 Elan Village Lane, Unit 121        Jonathan Lang
   San Jose, CA 95134                     Calient Networks
   Email: V.Sharma@ieee.org               5853 Rue Ferrari
                                          San Jose, CA 95138
                                          Email: {abanerjee, Jdrake,
                                                  jplang}@calient.net

   Jennifer Yates
   Guangzhi Li
   AT&T
   180 Park Ave.
   Florham Park, NJ 07932
   Email: {jyates, gli}@research.att.com


                          Expires on 8/22/01                   Page 18