Dynamic Host Configuration (DHC)                            T. Mrugalski
Internet-Draft                                                       ISC
Intended status: Standards Track                              K. Kinnear
Expires: September 6, 2012                                         Cisco
                                                           March 5, 2012


                         DHCPv6 Failover Design
             draft-mrugalski-dhc-dhcpv6-failover-design-00

Abstract

   DHCPv6 defined in [RFC3315] does not offer server redundancy.  This
   document defines a design for DHCPv6 failover, a mechanism for
   running two servers on the same network with capability for either
   server to take over clients' leases in case of server failure or
   network partition.  This is a DHCPv6 Failover design document, it is
   not protocol specification document.  It is a second document in a
   planned series of three documents.  DHCPv6 failover requirements are
   specified in [requirements].  A protocol specification document is
   planned to follow this document.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 6, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents


Mrugalski & Kinnear     Expires September 6, 2012               [Page 1]

Internet-Draft           DHCPv6 Failover Design               March 2012


   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Requirements Language  . . . . . . . . . . . . . . . . . . . .  4
   2.  Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Goals  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.1.  Additional Requirements  . . . . . . . . . . . . . . . . .  5
   4.  Protocol Overview  . . . . . . . . . . . . . . . . . . . . . .  6
     4.1.  NORMAL State Overview  . . . . . . . . . . . . . . . . . .  7
     4.2.  COMMUNICATION-INTERRUPTED State Overview . . . . . . . . .  7
     4.3.  PARTNER-DOWN State Overview  . . . . . . . . . . . . . . .  8
     4.4.  RECOVERING State Overview  . . . . . . . . . . . . . . . .  8
   5.  Connection Management  . . . . . . . . . . . . . . . . . . . .  8
     5.1.  Creating Connections . . . . . . . . . . . . . . . . . . .  8
     5.2.  Endpoint Identification  . . . . . . . . . . . . . . . . . 10
   6.  Resource Allocation  . . . . . . . . . . . . . . . . . . . . . 11
     6.1.  Proportional Allocation  . . . . . . . . . . . . . . . . . 12
     6.2.  Independent Allocation . . . . . . . . . . . . . . . . . . 12
     6.3.  Determining Allocation Approach  . . . . . . . . . . . . . 13
       6.3.1.  IPv6 Addresses . . . . . . . . . . . . . . . . . . . . 13
       6.3.2.  IPv6 Prefixes  . . . . . . . . . . . . . . . . . . . . 13
   7.  Failover Mechanisms  . . . . . . . . . . . . . . . . . . . . . 13
     7.1.  Time Skew  . . . . . . . . . . . . . . . . . . . . . . . . 13
     7.2.  Time expression  . . . . . . . . . . . . . . . . . . . . . 14
     7.3.  Lazy updates . . . . . . . . . . . . . . . . . . . . . . . 14
     7.4.  MCLT concept . . . . . . . . . . . . . . . . . . . . . . . 14
       7.4.1.  MCLT example . . . . . . . . . . . . . . . . . . . . . 16
     7.5.  Unreachability detection . . . . . . . . . . . . . . . . . 17
     7.6.  Sending Data . . . . . . . . . . . . . . . . . . . . . . . 17
       7.6.1.  Required Data  . . . . . . . . . . . . . . . . . . . . 18
       7.6.2.  Optional Data  . . . . . . . . . . . . . . . . . . . . 18
     7.7.  Receiving Data . . . . . . . . . . . . . . . . . . . . . . 18
       7.7.1.  Conflict Resolution  . . . . . . . . . . . . . . . . . 18
       7.7.2.  Acknowledging Reception  . . . . . . . . . . . . . . . 19
   8.  Endpoint States  . . . . . . . . . . . . . . . . . . . . . . . 19
     8.1.  State Machine Initialization . . . . . . . . . . . . . . . 19
     8.2.  NORMAL State Operation . . . . . . . . . . . . . . . . . . 19
     8.3.  COMMUNICATION-INTERRUPTED State Operation  . . . . . . . . 19
     8.4.  PARTNER-DOWN State Operation . . . . . . . . . . . . . . . 19
     8.5.  RECOVERING State Operation . . . . . . . . . . . . . . . . 19
     8.6.  State Transitions  . . . . . . . . . . . . . . . . . . . . 19
   9.  Proposed extensions  . . . . . . . . . . . . . . . . . . . . . 19


Mrugalski & Kinnear     Expires September 6, 2012               [Page 2]

Internet-Draft           DHCPv6 Failover Design               March 2012


     9.1.  Active-active mode . . . . . . . . . . . . . . . . . . . . 20
   10. Dynamic DNS Considerations . . . . . . . . . . . . . . . . . . 20
   11. Reservations and failover  . . . . . . . . . . . . . . . . . . 20
   12. Protocol entities  . . . . . . . . . . . . . . . . . . . . . . 20
     12.1. Failover Protocol  . . . . . . . . . . . . . . . . . . . . 21
     12.2. Protocol constants . . . . . . . . . . . . . . . . . . . . 21
   13. Open questions . . . . . . . . . . . . . . . . . . . . . . . . 21
   14. Security Considerations  . . . . . . . . . . . . . . . . . . . 21
   15. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
   16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
   17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
     17.1. Normative References . . . . . . . . . . . . . . . . . . . 22
     17.2. Informative References . . . . . . . . . . . . . . . . . . 22
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23


Mrugalski & Kinnear     Expires September 6, 2012               [Page 3]

Internet-Draft           DHCPv6 Failover Design               March 2012


1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


2.  Glossary

   This is a supplemental glossary that combined with definitions in
   Section 3 [requirements].

   o  Failover endpoint - The failover protocol allows for there to be a
      unique failover 'endpoint' per partner per role per relationship
      (where role is primary or secondary and the relationship is
      defined by the relationship-name).  This failover endpoint can
      take actions and hold unique states.  Typically, there is a one
      failover endpoint per partner (server), although there may be
      more.  'Server' and 'failover endpoint' are synonymous only if the
      server participates in only one failover relationship.  However,
      for the sake of simplicity 'Server' is used throughout the
      document to refer to a failover endpoint unless to do so would be
      confusing.

   o  Failover transmission - all messages exchanged between partners.

   o  Independent Allocation - a prefix allocation algorithm to split
      the available pool of resources between the primary and secondary
      servers that is particularly well suited for vast pools (i.e. when
      available resources are not expected to deplete).  See Section 6.2
      for details.

   o  Primary Server

   o  Proportional Allocation - a prefix allocation algorithm to split
      the available free leases between the primary and secondary
      servers that is particularly well suited for more limited
      resources.  See Section 6.1 for details.

   o  Resource - an IPv6 address or a IPv6 prefix.

   o  Secondary Server

   o  Server - A DHCPv6 server that implements DHCPv6 failover.
      'Server' and 'failover endpoint' as synonymous only if server
      participates in only one failover relationship.


Mrugalski & Kinnear     Expires September 6, 2012               [Page 4]

Internet-Draft           DHCPv6 Failover Design               March 2012


3.  Goals

   The failover protocol design provides a means for cooperating DHCPv6
   servers to work together to provide a DHCPv6 service with
   availability that is increased beyond that which could be provided by
   a single DHCPv6 server operating alone.  It is designed to protect
   DHCPv6 clients against server unreachability, including server
   failure and network partition.  It is possible to deploy exactly two
   servers that are able to continue providing a lease on an IPv6
   address or on an IPv6 prefix without the DHCPv6 client experiencing
   lease expiration or a reassignment of a lease to a different IPv6
   address in the event of failure by one or the other of the two
   servers.

   This protocol defines active-passive mode, sometimes also called hot
   standby model.  This means that during normal operation one server is
   active (i.e. actively responds to clients' requests) while the second
   is passive (i.e. it does receive clients' requests, but does not
   respond to them and only maintains a copy of lease database and is
   ready to take over incoming queries in case of primary server
   failure).  Active-active mode (i.e. both servers actively handling
   clients' requests) is currently not supported for the sake of
   simplicity.  Such mode may be defined as an exension at a later time.

   The failover protocol is designed to provide lease stability for
   leases with lease times beyond a short period.  Due to the additional
   overhead required, failover is not suitable for leases shorter than
   30 seconds.  The DHCPv6 Failover protocol MUST NOT be used for leases
   shorter than 30 seconds.

   This design attempts to fulfill all DHCPv6 failover requirements
   defined in [requirements].

3.1.  Additional Requirements

   The following requirements are not related to failover mechanism in
   general, but rather to this particular design.

   1.  Minimize Asymmetry - while there are two distinct roles in
       failover (primary and secondary server), the differences between
       those two roles should be as small as possible.  This will yield
       a simpler design as well as a simpler implementation of that
       design.


Mrugalski & Kinnear     Expires September 6, 2012               [Page 5]

Internet-Draft           DHCPv6 Failover Design               March 2012


4.  Protocol Overview

   The DHCPv6 Failover Protocol is defined as a communication between
   failover partners with all associated algorithms and mechanisms.
   Failover communication is conducted over a TCP connection established
   between the partners.  The protocol reuses the framing format
   specified in Section 5.1 of DHCPv6 Bulk Leasequery [RFC5460], but
   uses different message types.  Additional failover-specific message
   types will be defined.  All information is sent over the connection
   as typical DHCPv6 Options, following format defined in Section 22.1
   of [RFC3315].

   After initialization, the primary server establishes a TCP connection
   with its partner.  The primary server sends a CONNECT message with
   initial parameters.  Secondary server responds with CONNECTACK.

   Depending on the failover state of each partner, they MUST initiate
   one of the binding update procedures.  Each server MAY send an UPDREQ
   message to request its partner to send all updates that have not been
   sent yet (this case applies when partner has an existing database and
   wants to update it).  Alternatively, a server MAY choose to send an
   UPDREQALL message to request a full lease database transmission
   including all leases (this case applies in case of booting up new
   server after installation, corruption or complete loss of database,
   or other catastrophic failure).

   Servers exchange lease information by using BNDUPD messages.
   Depending on local and remote state of a lease, a server may either
   accept or reject the update.  Reception of lease update information
   is confirmed by responding with BNDACK message with appropriate
   status.  The majority of the messages sent over a failover TCP
   connection consists of BNDUPD and BNDACK messages.

   A subset of available resources (addresses or prefixes) is reserved
   for secondary server use.  This is required for handling a case where
   both servers are able to communicate with clients, but unable to
   communicate with each other.  After initial connection is
   established, the secondary server requests a pool of available
   addresses by sending a POOLREQ message.  The primary server assigns a
   pool to the secondary by transmitting a POOLRESP message and then
   sending a series of BNDUPD messages.  The secondary server may
   initiate such pool request at any time when maintaining communication
   with primary server.

   Failover servers use a lazy update mechanism to update their failover
   partner about changes to their lease state database.  After a server
   performs any modifications to its lease state database (assign a new
   lease, extend an existing one, release or expire a lease), it sends


Mrugalski & Kinnear     Expires September 6, 2012               [Page 6]

Internet-Draft           DHCPv6 Failover Design               March 2012


   its response to the client's request first (performing the "regular"
   DHCPv6 operation) and then informs its failover partner using a
   BNDUPD message.  This BNDUPD message SHOULD be sent soon after the
   response is sent to the DHCPv6 client, but there is no specific
   requirement of a minimum time in which to do so.

   The major problem with lazy update mechanism is the case when the
   server crashes after sending response to client, but before sending
   the lazy update to its partner (or when communication between
   partners is interrupted).  To solve this problem, concept known as
   the Maximum Client Lead Time (MCLT) (initially designed for DHCPv4
   failover) is used.  The MCLT is the maximum amount of time that one
   server can extend a lease for a client's binding beyond the time
   known by its failover partner.  See Section 7.4 for detailed
   desciption how MCLT affects assigned lease times.

   Servers verify each others availability by periodically exchanging
   CONTACT messages.  See Section 7.5 for discussion about detecting
   partner's unreachability.

   A server that is being shut down transmits a DISCONNECT message,
   closes the connection with its failover partner and stops operation.
   A Server SHOULD transmit any pending lease updates before
   transmitting DISCONNECT message.

4.1.  NORMAL State Overview

   During normal operation when two partners are communicating, both
   remain in NORMAL state.  All incoming requests are processed by the
   primary server and the secondary server receives appropriate updates.
   While operating in NORMAL state server a must switch to
   COMMUNICATIONS-INTERRUPTED if communication with its partner is
   severed.  If its partner closes connection using DISCONNECT message,
   server moves immediately to either COMMUNICATIONS-INTERRUPTED state
   or to PARTNER-DOWN state, as configured by the operator.

4.2.  COMMUNICATION-INTERRUPTED State Overview

   When a server discovers that its partner is not reachable, it
   switches into COMMUNICATIONS-INTERRUPTED state.  In that state a
   server MUST NOT extend any lease time more than the MCLT beyond the
   lease time known by its failover partner.  A server will extend
   leases that it previously assigned using the regular RENEW mechanism
   as clients will send their communications to this server (using a
   multicasted RENEW message with server's DUID or using unicasted RENEW
   message if configured).  A server MUST also extend leases assigned by
   its partner.  This is accomplished by replying to clients' REBIND
   messages.  Again, a server MUST NOT extend a lease by more than


Mrugalski & Kinnear     Expires September 6, 2012               [Page 7]

Internet-Draft           DHCPv6 Failover Design               March 2012


   configured MCLT value beyond the time known by its partner.  While in
   COMMUNICATIONS-INTERRUPTED state, each server MUST assign new leases
   only from its own pool.  If a server is operating in COMMUNICATION-
   INTERRUPTED state and establishes connection with its partner (either
   by successfully completing periodic connection attempt or receiving
   an incoming connection from its partner), the server moves either
   into RECOVERING state or NORMAL state, depending on the state that
   its failover partner server is in.  When a server moves into NORMAL
   state, it automatically sends all updated lease information to its
   failover partner.  A server may be also administratively switched to
   PARTNER-DOWN state from COMMUNICATIONS-INTERRUPTED state.

4.3.  PARTNER-DOWN State Overview

   While in PARTNER-DOWN state, server has a guarantee that its partner
   is not serving any leases.  In such a case, it MUST extend existing
   leases that it knows about and may assign new leases from its own
   pool or the pool assigned to its partner.  Since it knows that its
   partner is not extending any leases and does not assign new leases,
   it may extend leases by times longer than MCLT.  It MUST NOT
   reallocate any existing IP address to a new client until that lease
   has expired and the server has waited the MCLT beyond the lease's
   expiration.  This mode of operation is similar to the operation of a
   stand-alone DHCPv6 server and it does not offer any redundancy.  In
   this state server SHOULD periodically attempt to connect to its
   failover partner.  Once connection with its partner is established,
   the partner will swith to RECOVERING state.  After the partner
   finishes its recovery and moves to RECOVER-DONE state, both servers
   will move to NORMAL state.

4.4.  RECOVERING State Overview

   This transitional state represents a state, where two servers
   established connection, but one server needs to be updated with
   information prior to resuming normal operation.  Upon entering the
   state, both servers transmit UPDREQ or UPDREQALL (depending on state
   of its local lease database).  Once all lease information is
   exchanges, the recovering server moves to RECOVER-DONE state and then
   both servers switch to NORMAL state.  While in RECOVER state, a
   server is not allowed to respond to clients.


5.  Connection Management

5.1.  Creating Connections

   Every server implementing the failover protocol SHOULD attempt to
   connect to all of its partners periodically, where the period is


Mrugalski & Kinnear     Expires September 6, 2012               [Page 8]

Internet-Draft           DHCPv6 Failover Design               March 2012


   implementation dependent and SHOULD be configurable.  In the event
   that a connection has been rejected by a CONNECTACK message with a
   reject-reason option contained in it or a DISCONNECT message, a
   server SHOULD reduce the frequency with which it attempts to connect
   to that server but it SHOULD continue to attempt to connect
   periodically.

   When a connection attempt succeeds, if the server generating the
   connection attempt is a primary server for that relationship, then it
   MUST send a CONNECT message down the connection.  If it is not a
   primary server for the relationship, then it MUST just drop the
   connection and wait for the primary server to connect to it.

   When a connection attempt is received, the only information that the
   receiving server has is the IP address of the partner initiating a
   connection.  It also knows whether it has the primary role for any
   failover relationships with the connecting server.  If it has any
   relationships for which it is a primary server, it should initiate a
   connection of its own to the partner server, one for each primary
   relationship it has with that server.

   If it has any relationships with the connecting server for which it
   is a seconary server, it should just await the CONNECT message to
   determine which relationship this connection is to serve.

   If it has no secondary relationships with the connecting server, it
   SHOULD drop the connection.

   To summarize -- a primary server MUST use a connection that it has
   initiated in order to send a CONNECT message.  Every server that is a
   secondary server in a relationship attempts to create a connection to
   the server which is primary in the relationship, but that connection
   is only used to stimulate the primary server into recognizing that
   the secondary server is ready for operation.  The reason behind this
   is that the secondary server has no way to communicate to the primary
   server which relationship a connection is designed to serve.

   A server which has multiple secondary relationships with a primary
   server SHOULD only send one stimulus connection attempt to the
   primary server.

   Once a connection is established, the primary server MUST send a
   CONNECT message across the connection.  A secondary server MUST wait
   for the CONNECT message from a primary server.  If the secondary
   server doesn't receive a CONNECT message from the primary server in
   an installation dependent amount of time, it MAY drop the connection
   and send another stimulus connection attempt to the primary server.


Mrugalski & Kinnear     Expires September 6, 2012               [Page 9]

Internet-Draft           DHCPv6 Failover Design               March 2012


   Every CONNECT message includes a TLS-request option, and if the
   CONNECTACK message does not reject the CONNECT message and the TLS-
   reply option says TLS MUST be used, then the servers will immediately
   enter into TLS negotiation.

   Once TLS negotiation is complete, the primary server MUST resend the
   CONNECT message on the newly secured TLS connection and then wait for
   the CONNECTACK message in response.  The TLS-request and TLS-reply
   options MUST NOT appear in either this second CONNECT or its
   associated CONNECTACK message as they had in the first messages.

   The second message sent over a new connection (either a bare TCP
   connection or a connection utilizing TLS) is a STATE message.  Upon
   the receipt of this message, the receiver can consider communications
   up.

   A secondary server MUST NOT respond to the closing of a TCP
   connection with a blind attempt to reconnect -- there may be another
   TCP connection to the same failover partner already in use.

5.2.  Endpoint Identification

   The proper operation of the failover protocol requires more than the
   transmission of messages between one server and the other.  Each
   endpoint might seem to be a single DHCPv6 server, but in fact there
   are situations where additional flexibility in configuration is
   useful.  A failover endpoint is always associated with a set of
   DHCPv6 prefixes that are configured on the DHCPv6 server where the
   endpoint appears.  A DHCPv6 prefix MUST NOT be associated with more
   than one failover endpoint.

   The failover protocol SHOULD be configured with one failover
   relationship between each pair of failover servers.  In this case
   there is one failover endpoint for that relationship on each failover
   partner.  This failover relationship MUST have a unique name.

   There is typically little need for addtional relationships between
   any two servers but there MAY be more than one failover relationship
   between two servers -- however each MUST have a unique relationship
   name.

   Any failover endpoint can take actions and hold unique states.

   This document frequently describes the behavior of the protocol in
   terms of primary and secondary servers, not primary and secondary
   failover endpoints.  However, it is important to remember that every
   'server' described in this document is in reality a failover endpoint
   that resides in a particular process, and that several failover end-


Mrugalski & Kinnear     Expires September 6, 2012              [Page 10]

Internet-Draft           DHCPv6 Failover Design               March 2012


   points may reside in the same server process.

   It is not the case that there is a unique failover endpoint for each
   prefix that participates in a failover relationship.  On one server,
   there is (typically) one failover endpoint per partner, regardless of
   how many prefixes are managed by that combination of partner and
   role.  Conversely, on a particular server, any given prefix will be
   associated with exactly one failover endpoint.

   When a connection is received from the partner, the unique failover
   endpoint to which the message is directed is determined solely by the
   IP address of the partner, the relationship-name, and the role of the
   receiving server.


6.  Resource Allocation

   Currently there are two allocation algorithms defined for resources
   (addresses or prefixes).  Additional allocation schemes may be
   defined as future extensions.

   1.  Proportional Allocation - This allocation algorithm is a direct
       application of algorithm defined in [dhcpv4-failover] to DHCPv6.
       Available resources are split between primary and secondary
       server.  Released resources are always returned to primary
       server.  Primary and secondary servers may initiate a rebalancing
       procedure, when disparity between resources available to each
       server reaches a preconfigured threshold.  Only resources that
       are not leased to any clients are "owned" by one of the servers.
       This algorithm is particularly well suited for scenarios where
       amount of available resources is limited, as may be the case for
       prefix delegation.  See Section 6.1 for details.

   2.  Independent Allocation - This allocation algorithm assumes that
       available resources are split between primary and secondary
       servers as well.  In this case, however, resources are assigned
       to a specific server for all time, regardless if they are
       available or currently used.  This algorithm is much simpler than
       proportional allocation, because resource imbalance doesn't have
       to be checked and there is no rebalancing for independent
       allocation.  This algorithm is particularly well suited for
       scenarios where the there is an abundance of available resources
       which is typically the case for DHCPv6 address allocation.  See
       Section 6.2 for details.


Mrugalski & Kinnear     Expires September 6, 2012              [Page 11]

Internet-Draft           DHCPv6 Failover Design               March 2012


6.1.  Proportional Allocation

   In this allocation scheme, each server has its own pool of available
   resources.  Note that a resource is not "owned" by a particular
   server throughout its entire lifetime.  Only a resource which is
   available is "owned" by a particular server -- once it has been
   leased to a client, it is not owned by either failover partner.  When
   it finally becomes available again, it will be owned initially by the
   primary server, and it may or may not be allocated to the secondary
   server by the primary server.

   So, the flow of a resource is as follows: initially a resource is
   owned by the primary server.  It may be allocated to the secondary
   server if it is available, and then it is owned by the secondary
   server.  Either server can allocate available resources which they
   own to clients, in which case they cease to own them.  When the
   client releases the resource or the lease on it expires, it will
   again become available and will be owned by the primary.

   A resource will not become owned by the server which allocated it
   initially when it is released or the lease expires because, in
   general, that server will have had to replenish its pool of available
   resources well in advance of any likely lease expirations.  Thus,
   having a particular resource cycle back to the secondary might well
   put the secondary more out of balance with respect to the primary
   instead of enhancing the balance of available addresses or prefixes
   between them.

   TODO: Need to rework this v4-specific vocabulary to v6, once we
   decide how things will look like in v6.

   When they are used, these proportional pools are used for allocation
   when in every state but PARTNER-DOWN state.  In PARTNER-DOWN state a
   failover server can allocate from either pool.  This allocation and
   maintenance of these address pools is an area of some sensitivity,
   since the goal is to maintain a more or less constant ratio of
   available addresses between the two servers.

   TODO: Reuse rest of the description from section 5.4 from
   [dhcpv4-failover] here.

6.2.  Independent Allocation

   In this allocation scheme, available resources are split between
   servers.  Available resources are split between the primary and
   secondary servers as part of initial connection establishment.  Once
   resources are allocated to each server, there is no need to reassign
   them.  This algorithm is simpler than proportional allocation since


Mrugalski & Kinnear     Expires September 6, 2012              [Page 12]

Internet-Draft           DHCPv6 Failover Design               March 2012


   it requires no less initial communicagtion and does not require a
   rebalancing mechanism, but it assumes that the pool assigned to each
   server will never deplete.  That is often a reasonable assumption for
   IPv6 addresses (e.g. servers are often assigned a /64 pool that
   contains many more addresses than existing electronic devices on
   Earth).  This allocation mechanism SHOULD be used for IPv6 addresses,
   unless configured address pool is small or is otherwise
   administratively limited.

   Once each server is assigned a resource pool during initial
   connection establishment, it may allocate assigned resources to
   clients.  Once a client release a resource or its lease is expired,
   the returned resource returns to pool for the same server.  Resources
   never changes servers.

   During COMMUNICATION-INTERRUPTED events, a partner MAY continue
   extending existing leases when requested by clients.  A healthy
   partner MUST NOT lease resources that were assigned to its downed
   partner and later released by a client unless it is in PARTNER-DOWN
   state.

6.3.  Determining Allocation Approach

6.3.1.  IPv6 Addresses

6.3.2.  IPv6 Prefixes


7.  Failover Mechanisms

   This section lays out an overview of the communication between
   partners and other mechanisms required for failover operation.  As
   this is a design document, not a protocol specification, high level
   ideas are presented without implementation specific details (e.g.
   lack of on-wire formats).  Implementation details will be specified
   in a separate draft.

7.1.  Time Skew

   Partners exchange information about known lease states.  To reliably
   compare a known lease state with an update received from a partner,
   servers must be able to reliably compare the times stored in the
   known lease state with the times received in the update.  Although a
   simple approach would be to require both partners to use synchronized
   time, e.g. by using NTP, such a service may become unavailable in
   some scenarios that failover expects to cover, e.g. network
   partition.  Therefore a mechanism to measure and track relative time
   differences between servers is necessary.  To do so, each message


Mrugalski & Kinnear     Expires September 6, 2012              [Page 13]

Internet-Draft           DHCPv6 Failover Design               March 2012


   MUST contain FO_TIMESTAMP option that contains the timestamp of the
   transmission in the time context of the transmitter.  The
   transmitting server MUST set this as close to the actual transmission
   as possible.  The receiving partner MUST store its own timestamp of
   reception event as close to the actual reception as possible.  The
   received timestamp information is then compared with local timestamp.

   To account for packet delay variation (jitter), the measured
   difference is not used directly, but rather the moving average of
   last TIME_SKEW_PKTS_AVG packets time difference is calculated.  This
   averaged value is referred to as the time skew.  Note that the time
   skew algorithm allows cooperation between clients with completely
   desynchronized clocks as well as those whose desynchronization itself
   is not constant.

7.2.  Time expression

   Timestamps are expressed as number of seconds since midnight (UTC),
   January 1, 2000, modulo 2^32.  Note: that is the same approach as
   used in creation of DUID-LLT (see Section 9.2 of [RFC3315]).

   Time differences are expressed in seconds and are signed.

7.3.  Lazy updates

   Lazy update refers to the requirement placed on a server implementing
   a failover protocol to update its failover partner whenever the
   binding database changes.  A failover protocol which didn't support
   lazy update would require the failover partner update to complete
   before a DHCPv6 server could respond to a DHCPv6 client request.  The
   lazy update mechanism allows a server to allocate a new or extend an
   existing lease and then update its failover partner as time permits.

   Although the lazy update mechanism does not introduce additional
   delays in server response times, it introduces other difficulties.
   The key problem with lazy update is that when a server fails after
   updating a client with a particular lease time and before updating
   its partner, the partner will believe that a lease has expired even
   though the client still retains a valid lease on that address or
   prefix.

7.4.  MCLT concept

   In order to handle problem introduced by lazy updates (see
   Section 7.3), a period of time known as the "Maximum Client Lead
   Time" (MCLT) is defined and must be known to both the primary and
   secondary servers.  Proper use of this time interval places an upper
   bound on the difference allowed between the lease time provided to a


Mrugalski & Kinnear     Expires September 6, 2012              [Page 14]

Internet-Draft           DHCPv6 Failover Design               March 2012


   DHCPv6 client by a server and the lease time known by that server's
   failover partner.

   The MCLT is typically much less than the lease time that a server has
   been configured to offer a client, and so some strategy must exist to
   allow a server to offer the configured lease time to a client.
   During a lazy update the updating server typically updates its
   partner with a potential expiration time which is longer than the
   lease time previously given to the client and which is longer than
   the lease time that the server has been configured to give a client.
   This allows that server to give a longer lease time to the client the
   next time the client renews its lease, since the time that it will
   give to the client will not exceed the MCLT beyond the potential
   expiration time acknowledged by its partner.

   The fundamental relationship on which much of The correctness of this
   protocol depends is that the lease expiration time known to a DHCPv6
   client MUST NOT under any circumstances be more than the maximum
   client lead time (MCLT) greater than the potential expiration time
   known to a server's partner.

   The remainder of this section makes the above fundamental
   relationship more explicit.

   This protocol requires a DHCPv6 server to deal with several different
   lease intervals and places specific restrictions on their
   relationships.  The purpose of these restrictions is to allow the
   other server in the pair to be able to make certain assumptions in
   the absence of an ability to communicate between servers.

   The different times are:

   desired valid lifetime:
      The desired valid lifetime is the lease interval that a DHCPv6
      server would like to give to a DHCPv6 client in the absence of any
      restrictions imposed by the failover protocol.  Its determination
      is outside of the scope of this protocol.  Typically this is the
      result of external configuration of a DHCPv6 server.

   actual valid lifetime:
      The actual valid lifetime is the lease interval that a DHCPv6
      server gives out to a DHCPv6 client.  It may be shorter than the
      desired valid lifetime (as explained below).

   potential valid lifetime:
      The potential valid lifetime is the potential lease expiration
      interval the local server tells to its partner in a BNDUPD
      message.


Mrugalski & Kinnear     Expires September 6, 2012              [Page 15]

Internet-Draft           DHCPv6 Failover Design               March 2012


   acknowledged potential valid lifetime:
      The acknowledged potential valid lifetime is the potential lease
      interval the partner server has most recently acknowledged in a
      BNDACK message.

7.4.1.  MCLT example

   The following example demonstrates the MCLT concept in practice.  The
   values used are arbitrarily chosen are and not a recommendation for
   actual values.  The MCLT in this case is 1 hour.  The desired valid
   lifetime is 3 days, and its renewal time is half the valid lifetime.

   When a server makes an offer for a new lease on an IP address to a
   DHCPv6 client, it determines the desired valid lifetime (in this
   case, 3 days).  It then examines the acknowledged potential valid
   lifetime (which in this case is zero) and determines the remainder of
   the time left to run, which is also zero.  To this it adds the MCLT.
   Since the actual valid lifetime cannot be allowed to exceed the
   remainder of the current acknowledged potential valid lifetime plus
   the MCLT, the offer made to the client is for the remainder of the
   current acknowledged potential valid lifetime (i.e., zero) plus the
   MCLT.  Thus, the actual valid lifetime is 1 hour.

   Once the server has sent the REPLY to the DHCPv6 client, it will
   update its failover partner with the lease information.  However, the
   desired potential valid lifetime will be composed of one half of the
   current actual valid lifetime added to the desired valid lifetime.
   Thus, the failover partner is updated with a BNDUPD with a potential
   valid lifetime of 3 days + 1/2 hour.

   When the primary server receives a BNDACK to its update of the
   secondary server's (partner's) potential valid lifetime, it records
   that as the acknowledged potential valid lifetime.  A server MUST NOT
   send a BNDACK in response to a BNDUPD message until it is sure that
   the information in the BNDUPD message has been updated in its lease
   database.  Thus, the primary server in this case can be sure that the
   secondary server has recorded the potential lease interval in its
   stable storage when the primary server receives a BNDACK message from
   the secondary server.

   When the DHCPv6 client attempts to renew at T1 (approximately one
   half an hour from the start of the lease), the primary server again
   determines the desired valid lifetime, which is still 3 days.  It
   then compares this with the remaining acknowledged potential valid
   lifetime (3 days + 1/2 hour) and adjusts for the time passed since
   the secondary was last updated (1/2 hour).  Thus the time remaining
   of the acknowledged potential valid interval is 3 days.  Adding the
   MCLT to this yields 3 days plus 1 hour, which is more than the


Mrugalski & Kinnear     Expires September 6, 2012              [Page 16]

Internet-Draft           DHCPv6 Failover Design               March 2012


   desired valid lifetime of 3 days.  So the client is renewed for the
   desired valid lifetime -- 3 days.

   When the primary DHCPv6 server updates the secondary DHCPv6 server
   after the DHCPv6 client's renewal REPLY is complete, it will
   calculate the desired potential valid lifetime as the T1 fraction of
   the actual client valid lifetime (1/2 of 3 days this time = 1.5
   days).  To this it will add the desired client valid lifetime of 3
   days, yielding a total desired potential valid lifetime of 4.5 days.
   In this way, the primary attempts to have the secondary always "lead"
   the client in its understanding of the client's valid lifetime so as
   to be able to always offer the client the desired client valid
   lifetime.

   Once the initial actual client valid lifetime of the MCLT is past,
   the protocol operates effectively like the DHCPv6 protocol does today
   in its behavior concerning valid lifetimes.  However, the guarantee
   that the actual client valid lifetime will never exceed the remaining
   acknowledged partner server potential valid lifetime by more than the
   MCLT allows full recovery from a variety of failures.

7.5.  Unreachability detection

   Each partner maintains an FO_SEND timer for each partner connection.
   The FO_SEND timer is reset every time any message is transmitted.  If
   the timer reaches the FO_SEND_MAX value, a CONTACT message is
   transmitted and timer is reset.  The CONTACT message may be
   transmitted at any time.

   Discussion: Perhaps it would be more reasonable to use echo-reply
   approach, rather than periodic transmissions?

7.6.  Sending Data

   Each server updates its failover partner about recent changes in
   lease states.  Each update must include following information:

   1.  resource type - non-temporary address or a prefix

   2.  resource information - actual address or prefix

   3.  valid life time requested by client

   4.  IAID - Identity Association used by client, while obtaining this
       lease.  (Note1: one client may use many IAID simulatenously.
       Note2: IAID for IA, TA and PD are orthogonal number spaces.)


Mrugalski & Kinnear     Expires September 6, 2012              [Page 17]

Internet-Draft           DHCPv6 Failover Design               March 2012


   5.  valid life time sent to client

   6.  potential valid life time

   7.  preferred life time sent to client

   8.  CLTT - Client Last Transaction Time, a timestamp of the last
       received transmission from a client

   9.  assigned FQDN names, if any (optional)

   Discussion: Do we need T1 as well?  Something like next expected
   client transmission?

   Q: Maybe we could reuse IA_NA and IA_PD options here?  Yes.

   Q: Do we care about preferred lifetime? (presumably no).  Certainly
   not what was requested by the client.

   Q: Do we care about IAID? (presumably yes) Yes.

7.6.1.  Required Data

7.6.2.  Optional Data

7.7.  Receiving Data

7.7.1.  Conflict Resolution

   TODO: This is just a loose collection of notes.  This section will
   probably need to be rewritten as a a flowchart of some kind.

   The server receiving a lease update from its partner must evaluate
   the received lease information to see if it is consistent with
   already known state and decide which information - previously known
   or just received - is "better".  The server should take into
   consideration the following aspects: if the lease is already assigned
   to specific client, who had contact with client recently, start time
   of the lease, etc.

   The lease update may be accepted or rejected.  Rejection SHOULD NOT
   change the flag in a lease that says that it should be transmitted to
   the failover partner.  If this flag is set, then it should be
   transmitted, but if it is not already set, the rejection of a lease
   state update SHOULD NOT trigger an automatic update of the failover
   partner sending the rejected update.  The potential for update storms
   is too great, and in the unusual case where the servers simply can't
   agree, that disagreement is better than an update storm.


Mrugalski & Kinnear     Expires September 6, 2012              [Page 18]

Internet-Draft           DHCPv6 Failover Design               March 2012


   Discussion: There will definitely be different types of update
   rejections.  For example, this will allow a server to treat
   differently a case when receiving a new lease that it previously
   haven't seen than a case when partner sents old version of a lease
   for which a newer state is known.

7.7.2.  Acknowledging Reception


8.  Endpoint States

   The following sections contain detailed description of all possible
   states of failover endpoint.

8.1.  State Machine Initialization

   TODO

8.2.  NORMAL State Operation

   TODO

8.3.  COMMUNICATION-INTERRUPTED State Operation

   TODO

8.4.  PARTNER-DOWN State Operation

   TODO

8.5.  RECOVERING State Operation

   TODO

8.6.  State Transitions

   TODO


9.  Proposed extensions

   The following section discusses possible extensions to the proposed
   failover mechanism.  Listed extensions must be sufficiently simple to
   not further complicate failover protocol.  Any proposals that are
   considered complex will be defined as stand-alone extensions in
   separate documents.


Mrugalski & Kinnear     Expires September 6, 2012              [Page 19]

Internet-Draft           DHCPv6 Failover Design               March 2012


9.1.  Active-active mode

   A very simple way to achieve active-active mode is to remove the
   restriction that seconary server MUST NOT respond to SOLICIT and
   REQUEST messages.  Instead it could respond, but MUST have lower
   preference than primary server.  Clients discovering available
   servers will receive ADVERTISE messages from both servers, but are
   expected to select the primary server as it has higher preference
   value configured.  The following REQUEST message will be directed to
   primary server.

   Discussion: Do DHCPv6 clients actually do this?  DHCPv4 clients were
   rumored to wait for a "while" to accept the best offer, but to a
   first approximation, they all take the first offer they receive that
   is even acceptable.

   The benefit of this approach, compared to the "basic" active--passive
   solution is that there is no delay between primary failure and the
   moment when secondary starts serving requests.

   Discussion: The possibility of setting both servers preference to an
   equal value could theoretically work as a crude attempt to provide
   load balancing.  It wouldn't do much good on its own, as one (faster)
   server could be chosen more frequently (assuming that with equal
   preference sets clients will pick first responding server, which is
   not mandated by DHCPv6).  We could design a simple mechanism of
   dynamically updating preference depending on usage of available
   resources.  This concept hasn't been investigated in detail yet.


10.  Dynamic DNS Considerations

   TODO: Descibe DNS Updates challenges in failover environment.  It is
   nicely described in Section 5.12 of [dhcpv4-failover].


11.  Reservations and failover

   TODO: Describe how lease reservation works with failover.  See
   Section 5.13 in [dhcpv4-failover].


12.  Protocol entities

   Discussion: It is unclear if following sections belong to design or
   protocol draft.  It is currently kept here as a scratchbook with list
   of things that will have to be defined eventually.  Whether or not it
   will stay in this document or will be moved to the protocol spec


Mrugalski & Kinnear     Expires September 6, 2012              [Page 20]

Internet-Draft           DHCPv6 Failover Design               March 2012


   document is TBD.

12.1.  Failover Protocol

   This section enumerates list of options that will be defined in
   failover protocol specification.  Rough description of purpose and
   content for each option is specified.  Exact on wire format will be
   defined in protocol specification.

   1.  OPTION_FO_TIMESTAMP - convey information about timestamp.  It is
       used by time skew measurement algorithm (see Section 7.1).

12.2.  Protocol constants

   This section enumerates various constants that have to be defined in
   actual protocol specification.

   1.  TIME_SKEW_PKTS_AVG - number of packets that are used to calculate
       average time skew between partners.  See (see Section 7.1).


13.  Open questions

   This is scratchbook.  This section will be removed once questions are
   answered.

   Q: Do we want to support temporary addresses?  I think not.  They are
   short-lived by definition, so clients should not mind getting new
   temporary addresses.

   Q: Do we want to support CGA-registered addresses?  There is
   currently work in DHC WG about this, but I haven't looked at it yet.
   If that is complicated, we may not define it here, but rather as an
   extension.  [If it moves forward, we need to support it.]


14.  Security Considerations

   TODO: Security considerations section will contain loose notes and
   will be transformed into consistent text once the core design
   solidifies.


15.  IANA Considerations

   IANA is not requested to perform any actions at this time.


Mrugalski & Kinnear     Expires September 6, 2012              [Page 21]

Internet-Draft           DHCPv6 Failover Design               March 2012


16.  Acknowledgements

   This document extensively uses concepts, definitions and other parts
   of [dhcpv4-failover] document.  Authors would like to thank Shawn
   Routher, Greg Rabil, and Bernie Volz for their significant
   involvement and contributions.

   This work has been partially supported by Department of Computer
   Communications (a division of Gdansk University of Technology) and
   the Polish Ministry of Science and Higher Education under the
   European Regional Development Fund, Grant No.  POIG.01.01.02-00-045/
   09-00 (Future Internet Engineering Project).


17.  References

17.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2131]  Droms, R., "Dynamic Host Configuration Protocol",
              RFC 2131, March 1997.

   [RFC3074]  Volz, B., Gonczi, S., Lemon, T., and R. Stevens, "DHC Load
              Balancing Algorithm", RFC 3074, February 2001.

   [RFC3315]  Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C.,
              and M. Carney, "Dynamic Host Configuration Protocol for
              IPv6 (DHCPv6)", RFC 3315, July 2003.

   [RFC3633]  Troan, O. and R. Droms, "IPv6 Prefix Options for Dynamic
              Host Configuration Protocol (DHCP) version 6", RFC 3633,
              December 2003.

   [RFC4704]  Volz, B., "The Dynamic Host Configuration Protocol for
              IPv6 (DHCPv6) Client Fully Qualified Domain Name (FQDN)
              Option", RFC 4704, October 2006.

   [RFC5460]  Stapp, M., "DHCPv6 Bulk Leasequery", RFC 5460,
              February 2009.

17.2.  Informative References

   [I-D.ietf-dhc-dhcpv6-redundancy-consider]
              Tremblay, J., Brzozowski, J., Chen, J., and T. Mrugalski,
              "DHCPv6 Redundancy Deployment Considerations",
              draft-ietf-dhc-dhcpv6-redundancy-consider-02 (work in


Mrugalski & Kinnear     Expires September 6, 2012              [Page 22]

Internet-Draft           DHCPv6 Failover Design               March 2012


              progress), October 2011.

   [RFC2136]  Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
              "Dynamic Updates in the Domain Name System (DNS UPDATE)",
              RFC 2136, April 1997.

   [dhcpv4-failover]
              Droms, R., Kinnear, K., Stapp, M., Volz, B., Gonczi, S.,
              Rabil, G., Dooley, M., and A. Kapur, "DHCP Failover
              Protocol", draft-ietf-dhc-failover-12 (work in progress),
              March 2003.

   [requirements]
              Mrugalski, T. and K. Kinnear, "DHCPv6 Failover
              Requirements",
              draft-ietf-dhc-dhcpv6-failover-requirements-00 (work in
              progress), October 2011.


Authors' Addresses

   Tomasz Mrugalski
   Internet Systems Consortium, Inc.
   950 Charter Street
   Redwood City, CA  94063
   USA

   Phone: +1 650 423 1345
   Email: tomasz.mrugalski@gmail.com


   Kim Kinnear
   Cisco Systems, Inc.
   1414 Massachusetts Ave.
   Boxborough, Massachusetts  01719
   USA

   Phone: +1 (978) 936-0000
   Email: kkinnear@cisco.com


Mrugalski & Kinnear     Expires September 6, 2012              [Page 23]