Network Working Group R. Singh, Ed. Internet-Draft G. Kalyani Intended status: Standards Track Cisco Expires: April 14, 2011 Y. Nir Check Point D. Zhang Huawei October 11, 2010 Protocol Support for High Availability IKEv2/IPsec draft-ietf-ipsecme-ipsecha-protocol-01 Abstract IKEv2 and IPsec protocols are widely used for deploying VPN. In order to make such VPN highly available, more scalable and failure- prone, these VPNs are implemented as IKEv2/IPsec Highly Available (HA) cluster. But there are many issues in IKEv2/IPsec HA cluster. The draft "IPsec Cluster Problem Statement" enumerates all the issues encountered in IKEv2/IPsec HA cluster environment. This document proposes an extension to IKEv2 protocol to solve main issues of "IPsec Cluster Problem Statement" in Hot Standby cluster and gives implementation advice for other issues. The main issues to be solved are: o IKEv2 Message Id synchronization : This is done by syncing up expected send and receive message Id values with the peer and updating the values at the newly active cluster member after the failover. o IPsec Replay Counter synchronization : This is done by syncing up bumped up outgoing SA replay counters values with peer and updating the values at the newly active cluster member after the failover. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference Singh, Ed., et al. Expires April 14, 2011 [Page 1] Internet-Draft High Availability in IKEv2/IPsec October 2010 material or to cite them other than as "work in progress." This Internet-Draft will expire on April 14, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Singh, Ed., et al. Expires April 14, 2011 [Page 2] Internet-Draft High Availability in IKEv2/IPsec October 2010 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Issues solved from IPsec Cluster Problem Statement . . . . . . 6 4. IKEv2/IPsec SA Counter Synchronization Problem . . . . . . . . 6 5. IKEv2/IPsec SA Counter Synchronization Solution . . . . . . . 8 6. IKEv2/IPsec synchronization notification payloads . . . . . . 9 6.1. IKEV2_MESSAGE_ID_SYNC_SUPPORTED . . . . . . . . . . . . . 10 6.2. IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED . . . . . . . . . . . 10 6.3. IKEV2_MESSAGE_ID_SYNC . . . . . . . . . . . . . . . . . . 11 6.4. IPSEC_REPLAY_COUNTER_SYNC . . . . . . . . . . . . . . . . 11 7. Details of implementation . . . . . . . . . . . . . . . . . . 12 8. Step-by-Step details . . . . . . . . . . . . . . . . . . . . . 13 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 10. Interaction with other drafts . . . . . . . . . . . . . . . . 14 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 16 13.1. Draft -01 . . . . . . . . . . . . . . . . . . . . . . . . 16 13.2. Draft -00 . . . . . . . . . . . . . . . . . . . . . . . . 16 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 14.1. Normative References . . . . . . . . . . . . . . . . . . . 17 14.2. Informative References . . . . . . . . . . . . . . . . . . 17 Appendix A. IKEv2 Message Id examples . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 Singh, Ed., et al. Expires April 14, 2011 [Page 3] Internet-Draft High Availability in IKEv2/IPsec October 2010 1. Introduction IKEv2 is used for deploying IPsec-based VPNs. In order to make such VPN highly available, more scalable and failure-prone, these VPNs are implemented as IKEv2/IPsec Highly Available (HA) cluster. But there are many issues in IKEv2/IPsec HA cluster. The draft "IPsec Cluster Problem Statement" enumerates all the issues encountered in IKEv2/ IPsec HA cluster. In case of Hot Standby cluster implementation of IKEv2/IPsec based VPNs, the IKEv2/IPsec session gets established with the peer and the active member of cluster. After that, the active member syncs/ updates the IKE/IPsec SA state to the standby member of the cluster. This primary SA state sync-up is done on SA bring up and/or rekey. Doing SA state synchronization/updation between active and peer member for each IKE and IPsec message standby cluster is very costly, so normally its done periodically. So, when "failover" event happens in the cluster, first "failover' is detected by the standby member and then it becomes active member and it takes considerable time. During the time of failover and standby member becoming newly active member, the peer is unaware of failover and keeps sending IKE request and IPsec packets to the cluster which is allowed as per IKEv2 and IPsec windowing feature. Now, newly active member after coming up finds the mismtach in IKE message Id's and IPsec replay counters. Please see Section 4 for more details. This document proposes an extension to IKEv2 protocol to solve main issues of IKE message id sync and IPsec SA replay counter sync and gives implementation advice for others. Here is summary of solutions provided in this document: IKEv2 Message Id synchronization :This is done by syncing up expected send and receive message Id values with the peer and updating the values at the newly active cluster member after the failover. IPsec Replay Counter synchronization : This is done by syncing up bumped up outgoing SA replay counters values with peer and updating the values at the newly active cluster member after the failover Though this document describes the IKEv2 message Id sync and IPsec replay counter synchronization in context of IPsec HA cluster, the solution provided is genetic and can be used in other scenarios where IKEv2 message Id sync or IPsec SA replay counters sync is required. While some IPsec HA implementation suffers from IKEv2 message Id synchronization problem, some other implementation suffers from IPsec replay counter synchronization. Both of these problem are handled separately, using separate notify for each problem. This provides Singh, Ed., et al. Expires April 14, 2011 [Page 4] Internet-Draft High Availability in IKEv2/IPsec October 2010 the flexibility of implementing IKEv2 message Id synchronization or IPsec replay counter synchronization or both. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. "SA Counter SYNC Request" is the information exchange request defined in this document to synchronize the IKEv2/IPsec SA counter information between member of the cluster and the peer. "SA Counter SYNC Response" is the information exchange response defined in this document to synchronize the IKEv2/IPsec SA counter information between member of the cluster and the peer. Below are the terms taken from [IPsec Cluster Problem Statement] with added information in context of this document. "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of the members is active at any one time. This member is also referred to as the "active", whereas the other(s) are referred to as "standbys". VRRP ([RFC5798]) is one method of building such a cluster. The goal of Hot Standby Cluster is that it creates illusion of single virtual gateway to the peer(s). "Active Member" is the primary member in the Hot Standby cluster. It is responsible for forwarding packets for the virtual gateway. "Standby Member" is the primary backup router. The member takes control i.e. becomes active member after the "failover" event. "Peer" is the IKEv2/IPsec endpoint which establishes VPN connection with Hot Standby cluster. The Peer knows Hot Standby Cluster by single cluster's IP address. In case of "failover", the standby member of the cluster becomes active, so the peer normally doesn't notice that "failover" has occurred in the cluster. "Multiple failover" is the situation when in a cluster with three or more nodes failover happens in rapid succession. The protocol and implementation must be able to handle multiple failover i.e. able to handle new failover even if they are still processing the old failover. "Simultaneous failover" is the situation when in a cluster the failover happens at the both ends at the same time. The protocol and Singh, Ed., et al. Expires April 14, 2011 [Page 5] Internet-Draft High Availability in IKEv2/IPsec October 2010 implementation must be able to handle simultaneous failover. The generic term IKEv2/IPsec SA counters is used throughout. By IKEv2 SA counter stands for IKEv2 message ids and IPsec SA counter stands for IPsec SA replay counters which are used to provide optional anti-replay feature. 3. Issues solved from IPsec Cluster Problem Statement IPsec Cluster Problem Statement defines the problems encountered in IPsec Clusters. . The problems along with their section names as given in the statement are as follows. o 3.2. Lots of Long Lived State o 3.3. IKE Counters o 3.4. Outbound SA Counters o 3.5. Inbound SA Counters o 3.6. Missing Synch Messages o 3.7. Simultaneous use of IKE and IPsec SAs by Different Members * 3.7.1. Outbound SAs using counter modes o 3.8. Different IP addresses for IKE and IPsec o 3.9. Allocation of SPIs This document solves the main issues using the protocol extension, and provides implementation advice for other issues, given as follows. o 3.2 This section mentions that there's lots of state that needs to be synchronized. If state is not synchronized, it's not really an interesting cluster - failover will be just like a reboot, so the issue need not be solved with protocol extensions. o 3.3, 3.4,3.5, and 3.6 are solved by this document. Please see Section 4, for more details. o 3.7 is the problem to be solved while building clusters. However, the peers should be mandated to accept multiple parallel SAs for 3.7.1 o 3.8 can be solved by using IKEv2 Redirect Mechanism [RFC-5685]. o 3.9 is the problem about avoiding collision of same SPI's among the cluster members. This is outside the scope of the document since this has to be solved within the context of the cluster and not with the peer. 4. IKEv2/IPsec SA Counter Synchronization Problem IKEv2 RFC states that "An IKE endpoint MUST NOT exceed the peer's stated window size for transmitted IKE requests". As per the protocol, all IKEv2 packets follows request-response Singh, Ed., et al. Expires April 14, 2011 [Page 6] Internet-Draft High Availability in IKEv2/IPsec October 2010 paradigm. The initiator of an IKEv2 request MUST retransmit the request, until it has received a response from the peer. IKEv2 introduces a windowing mechanism that allows multiple requests to be outstanding at a given point of time, but mandates that the sender window does not move until the oldest message sent from one peer to another is acknowledged. Loss of even a single packet leads to repeated re-transmissions followed by an IKEv2 SA teardown if the re- transmissions are unacknowledged. IPsec Hot Standby Cluster is required to ensure that in case of failover of active member, the standby member becomes active immediately. The standby member is expected to have the exact values of message id fields of active member before failover. Even with the best efforts to update the message Id values from active to standby member, the values at standby member can be stale due to following reasons: o Standby member is unaware of the last message that was received and acknowledged by the older active member as failover could have happened before the standby could be updated. o Standby member does not have information about on-going unacknowledged requests of active member before the failover event. So after failover event when standby member becomes active, it can not re-transmit those requests. When a standby member takes over as the active member, it would start the message id ranges from previously updated values. This would make it reject requests from the peer, since the values would be stale. As a sender, the standby member may end up reusing a stale message id which will cause the peer to drop the request. Eventually there is a high probability of the IKEv2 and corresponding IPsec SAs getting torn down simply because of a transitory message id mis-match and re-transmission of requests. This is not a desirable feature of HA. Even after updating standby member periodically the cluster can loose IKE and so all IPsec SA due to message id i.e. SA counter mismatch. Similar issue is observed in IPsec counters also if anti-replay protection/ESN is implemented. Even with the best efforts of syncing the ESP and AH SA counter numbers from active to stand by member , there is a chance that the stand-by member would have stale counter values. The standby member would then send the stale counter numbers. The peer would reject/drop such packets since in case of anti-replay protection feature, duplicate use of counters are not allowed. In case of IPsec it is OK to skip some counter values and start with the higher counter values. Hence a mechanism is required in HA to ensure that the standby member has correct values of message Id values and IPsec counters, so that Singh, Ed., et al. Expires April 14, 2011 [Page 7] Internet-Draft High Availability in IKEv2/IPsec October 2010 sessions are not torn down just because of mismatching counters. 5. IKEv2/IPsec SA Counter Synchronization Solution When the standby member becomes the active member after failover event in the cluster, the standby member would send an authenticated IKEv2 request to the peer to send its values of SA counters. The standby member would then update its values of SA counters and then start sending/receiving the requests. First, the peer MUST negotiate its ability to support IKEv2 message Id synchronization information with active member of the cluster by sending the IKEV2_MESSAGE_ID_SYNC_SUPPORTED notification in IKE_AUTH exchange. Similarly to support IPsec replay counter synchronization, the peer MUST negotiate its ability to support IPsec replay counter synchronization with active member of the cluster by sending IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED notification in IKE_AUTH exchange. Peer Active Member - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HDR, SK {IDi, [CERT], [CERTREQ], [IDr], AUTH, N[IKEV2_MESSAGE_ID_SYNC_SUPPORTED], N[IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED], SAi2, TSi, TSr} ----------> <---------- HDR, SK {IDr, [CERT+], [CERTREQ+], AUTH, N[IKEV2_MESSAGE_ID_SYNC_SUPPORTED], N[IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED], SAr2, TSi, TSr} When peer and active member both support SA counter synchronization, the active member MUST sync/update SA counter synchronization capability to the standby member after the establishment of the IKE SA . So that standby member is aware of the capability and can use it when it becomes the active member after failover event. After failover event, when the standby member becomes the active member, it has to request the peer for the SA counters. Standby member would initiate the SYNC Request with an INFORMATIONAL exchange with message Id zero containing the notify IKEV2_MESSAGE_ID_SYNC or IPSEC_REPLAY_COUNTER_SYNC or both depending on whether the synchronization needs to be done for IKEv2 message Ids, IPsec replay Singh, Ed., et al. Expires April 14, 2011 [Page 8] Internet-Draft High Availability in IKEv2/IPsec October 2010 counters, or both. The initiator of IKEv2 message Id sync request sends its expected send and receive message Id values and "failover count" in IKEV2_MESSAGE_ID_SYNC notify. The responder of the request compares the received values with the available local values. The higher among both is selected and sent as sync response with notify IKEV2_MESSAGE_ID_SYNC. The initiator now updates send and receive IKEv2 message Ids to the values received in sync response and can start normal IKEv2 message exchange. The initiator of IPsec replay counter sync sends bumped outgoing IPsec SA reply counter value and "failover count" in IPSEC_REPLAY_COUNTER_SYNC notify. The responder of the request updates its incoming IPsec SA counter values and sends its bumped outgoing IPsec SA replay counter value in sync response with IPSEC_REPLAY_COUNTER_SYNC. The initiator now updates its incoming IPsec SA counter to values received in sync response and can start normal IPsec data traffic. Both the notify types IKEV2_MESSAGE_ID_SYNC and IPSEC_REPLAY_COUNTER_SYNC contain Nonce Data in the payload to avoid DOS attack due to replay of SA counter sync request/response. The Nonce are defined per notify and MUST be validated. The Nonce data sent in response MUST match with nonce data sent by newly-active member in request. If nonce data received in response does not match with nonce data sent in request, the standby i.e. newly-active member MUST discard this response, and normal IKEv2 behavior of re- transmitting the request and waiting for genuine reply from the peer SHOULD follow, before tearing down the SA because of re-transmits. Standby [Newly Active] Member Peer - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HDR, SK {N[IKEV2_MESSAGE_ID_SYNC ], N[IPSEC_REPLAY_COUNTER_SYNC]} --------> <--------- HDR, SK {N[IKEV2_MESSAGE_ID_SYNC ], N[IPSEC_REPLAY_COUNTER_SYNC]} 6. IKEv2/IPsec synchronization notification payloads Below are the new notify and payload types that are defined Singh, Ed., et al. Expires April 14, 2011 [Page 9] Internet-Draft High Availability in IKEv2/IPsec October 2010 6.1. IKEV2_MESSAGE_ID_SYNC_SUPPORTED IKEV2_MESSAGE_ID_SYNC_SUPPORTED: This notify is included in the IKE_AUTH request/response to indicate support for IKEv2 message Id synchronization mechanism described in this document. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Payload |C| RESERVED | Payload Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Protocol ID(=0)| SPI Size (=0) | Notify Message Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and 'Notify Message Type' fields are the same as described in Section 3 of [RFC5996]. The 'SPI Size' field MUST be set to 0 to indicate that the SPI is not present in this message. The 'Protocol ID' MUST be set to 0, since the notification is not specific to a particular security association. 'Payload Length' field is set to the length in octets of the entire payload, including the generic payload header. The 'Notify Message Type' field is set to indicate the IKEV2_MESSAGE_ID_SYNC_SUPPORTED payload. 6.2. IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED: This notify is included in the IKE_AUTH request/response to indicate support for IPsec SA replay counter synchronization mechanism described in this document. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Payload |C| RESERVED | Payload Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Protocol ID(=0)| SPI Size (=0) | Notify Message Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and 'Notify Message Type' fields are the same as described in Section 3 of [RFC5996]. The 'SPI Size' field MUST be set to 0 to indicate that the SPI is not present in this message. The 'Protocol ID' MUST be set to 0, since the notification is not specific to a particular security association. 'Payload Length' field is set to the length in octets of the entire payload, including the generic payload header. The 'Notify Message Type' field is set to indicate the IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED payload. Singh, Ed., et al. Expires April 14, 2011 [Page 10] Internet-Draft High Availability in IKEv2/IPsec October 2010 6.3. IKEV2_MESSAGE_ID_SYNC IKEV2_MESSAGE_ID_SYNC : This payload type is defined to sync the IKEv2 message Ids among newly-active [standby] member and the peer. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Payload | RESERVED | Payload Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Failover count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Nonce Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EXPECTED_SEND_REQ_MESSAGE_ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EXPECTED_RECV_REQ_MESSAGE_ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ It contains the following data. o Failover count (4 octets) : The failover count within the cluster, it increases with each failover event in HA cluster. o Nonce Data (4 octets) : The random nonce data. It should be sent same in the SYNC Request and Response. The nonce data is used to counter the replay of IKEV2_MESSAGE_ID_SYNC response by the attacker. o EXPECTED_SEND_REQ_MESSAGE_ID (4 octets) : This MUST be present only if protocol ID is IKE. This field is used by the sender of this notify, to indicate the message Id it will use in the next request, that it will send to the other side peer. o EXPECTED_RECV_REQ_MESSAGE_ID (4 octets) : This field is used by the sender of this notify, to indicate the message Id it can accept in the next request, received from the other side peer. 6.4. IPSEC_REPLAY_COUNTER_SYNC IPSEC_REPLAY_COUNTER_SYNC: This payload type is defined to sync the IPsec SA replay counters among newly-active [standby] member and the peer. Singh, Ed., et al. Expires April 14, 2011 [Page 11] Internet-Draft High Availability in IKEv2/IPsec October 2010 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Payload |ESN| RESERVED | Payload Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Failover count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Outgoing IPsec SA counter | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ It contains the following data. o ESN (1 bit) : The ESN bit MUST be ON if IPsec SA were established with Extended Sequence Numbers. o Failover count (4 octets) : The failover count within the cluster, it increases with each failover event in HA cluster. o Outgoing IPsec SA counter (4 octets or 8 octect) : The outgoing IPsec SA counter is the bumped-up outgoing IPsec SA replay counter value considering ALL Child SA under the IKEv2 SA. The size of outgoing IPsec SA counter depends on ESN bit. If ESN bit is ON, it is size of 8 octets else it is 4 octets. 7. Details of implementation The message Id used IKEV2_MESSAGE_ID_SYNC exchange MUST be zero so that it is not validated upon receipt as per IKEv2 windowing. Message Id zero MUST be permitted only for informational exchange that would have NOTIFY of type IKEV2_MESSAGE_ID_SYNC. If any INFORMATIONAL exchange uses the message Id Zero, without having this Notify, then such packets MUST be discarded upon decryption and INVALID_SYNTAX notify SHOULD be sent. No other payloads are allowed in this Informational exchange. Whenever IKEV2_MESSAGE_ID_SYNC or IPSEC_REPLAY_COUNTER_SYNC notify is received with invalid failover count or nonce data, the event SHOULD be logged. The standby member can initiate the synchronization of IKEv2 Message Id's o When it receives the bad IKEv2/IPsec packet. The 'bad" IKEv2/ IPsec packet means a packet outside receive window. o When it has to send an IKEv2/IPsec packet after failover event. o It has just got the control from active member and would require to update the values before-hand, so that it need not start this exchange at the time of sending/receiving the request. The standby member can initiate the synchronization of IPsec SA Counters Singh, Ed., et al. Expires April 14, 2011 [Page 12] Internet-Draft High Availability in IKEv2/IPsec October 2010 o If there is traffic using the IPsec SA in the recent past and there could be stale replay counter at standby member Since there can be many sessions at Standby member, and sending exchanges from all of the sessions can cause throttling, the standby member can choose to initiate the exchange when it has to send or receive the request. Thus the trigger to initiate this exchange depends on the requirement/discretion of the standby member. The member which has not announced its capability IKEV2_MESSAGE_ID_SYNC_SUPPORTED MUST NOT send/receive the notify IKEV2_MESSAGE_ID_SYNC. The member which has not announced its capability IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED MUST NOT send/receive the notify IPSEC_REPLAY_COUNTER_SYNC. If a peer gets IKEV2_MESSAGE_ID_SYNC or IPSEC_REPLAY_COUNTER_SYNC request even though it did not announce its capability in IKE_AUTH exchange, then it MUST ignore this message. If any of the Notify or the SYNC request/response is malformed, then it is treated as INVALID_SYNTAX message. 8. Step-by-Step details The step by step details of the synchronization of IKE message Id is as follows. o Active member and peer device establish the session . They announce the capability to sync the counter info by sending IKEV2_MESSAGE_ID_SYNC_SUPPORTED notify in IKE_AUTH Exchange. o Active member dies and Stand-by member takes over. Standby Member sends its own idea of the IKE Message ID (its side) to peer in an INFORMATIONAL message exchange with message Id zero. o The peer first authenticates the message and then validates that failover count. The peer will compare the received values with the values available locally and finally picks the higher value. It then updates its message Id's with the higher values and also propose the same in Response. o The peer should not wait for pending response while responding with this message Id values. For example if window size is 5 and peer window is 3-7 and if peer has sent requests 3, 4,5,6,7 and but got response only for 4,5,6,7 but not 3 then it should send the EXPECTED_SEND_REQ_MESSAGE_ID as 8 and should not wait for response of 3 anymore. Singh, Ed., et al. Expires April 14, 2011 [Page 13] Internet-Draft High Availability in IKEv2/IPsec October 2010 o The peer should not wait for pending request also. For example if window size is 5 and peer window is 3-7 and if peer has received requests 4,5,6,7 but not 3 then it should send the EXPECTED_RECV_REQ_MESSAGE_ID as 8 and should not wait for 3 anymore. There is corner case with "failover count' and multiple failover. What if "failover count" is not updated on a member, and next "failover" happened, then "failover count" is updated on other side but not on this member. [[ This need to be discussed on mailing list. ]] 9. Security Considerations There can be two types of DOS attacks. o Replay of Message SYNC Request. This is countered by "failover count", since synchronization starts after failover event and each member of the cluster is aware of failover event. The receiver of sync request should verify and maintain failover count. If a peer again receives a sync request with same "failover count', it can safely safely discard the request if it has received valid request/response from other side peer after sync exchange. The peer can send the cached response for sync request till it has not received valid request/response from other side peer or failover count has not increased. o Replay of Message SYNC Response. This is countered by sending the NONCE data along with the sync notify. The same NONCE data has to be returned in response. Thus the standby member can accept the reply only for the current request. After it receives the valid response, it MUST NOT process same response again and MUST discard the response. 10. Interaction with other drafts The primary assumption of IKEv2/IPsec SA Counter Synchronization proposal is IKEv2 SA has been established between active member of Hot Standby Cluster and peer, after that the failover event occurred and now standby member has "become" active. It also assumes the IKEv2 SA state was synced between active and standby member of the Hot Standby Cluster before the failover event. o Session Resumption. Session resumption assumes that peer i.e. client or initiator detects the need to re-establish the session. In IKEv2/IPsec SA counter synchronization, standby member which becomes active i.e. gateway or responder detects the need to synchronize the SA counter after the failover event. Also in Hot Standby Cluster, peer establishes the IKEv2/IPsec session with Singh, Ed., et al. Expires April 14, 2011 [Page 14] Internet-Draft High Availability in IKEv2/IPsec October 2010 single cluster's IP address, so peer normally does not detect the event of failover in the cluster until standby member took very long to become active and IKEv2 SA times out via liveness check. So, session resumption and SA counter synchronization after failover are mutually exclusive. o This document describes the operation of tightly coupled clusters, which are the common way of building IPsec clusters. In these clusters, all members appear to the peer as one gateway, specifically they share a single IP address. High availability can also be provided by loosely coupled clusters (for lack of a better term), which are a group of gateways that do not share an IP address and do not synchronize state. In this architecture, the client can use Session Resumption to fail-over from one cluster member to another. Specifically this requires: * Support of session resumption on peers and gateways. * A common session resumption ticket format on all gateways (not currently standardized). * Configuration on the peers of the group of gateways that constitute the cluster. o Redirect. Redirect mechanism for load-balancing can be used during init (IKE_SA_INIT) and auth (IKE_AUTH) and after session establishment. While SA counter sync is used after IKE SA has been established and failover event has occurred. So it is mutually exclusive with redirect during init and auth. The redirect after session established is used for timed or planned shutdown/maintenance. The failover event can not be detected on active member beforehand and so using redirect after session establishment is not possible in case of failover. So, Redirect and SA counter synchronization after failover are mutually exclusive. o Crash detection. Solves the similar problem where peer detect that cluster member has crashed based on a token. It is mutually exclusive with HA with SA counter sync. 11. IANA Considerations This document introduces four new IKEv2 Notification Message types as described in Section 6.The new Notify Message Types must be assigned values between 16396 and 40959. o IKEV2_MESSAGE_ID_SYNC_SUPPORTED. o IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED. o IKEV2_MESSAGE_ID_SYNC. o IPSEC_REPLAY_COUNTER_SYNC. Singh, Ed., et al. Expires April 14, 2011 [Page 15] Internet-Draft High Availability in IKEv2/IPsec October 2010 12. Acknowledgements We would like to thank Pratima Sethi and Frederic Detienne for their reviews comments and valuable suggestions for initial version of the document. We would also like to thank following people (in alphabetical order) for their review comments and valuable suggestions: Dan Harkins, Paul Hoffman, Steve Kent, Tero Kivinen, David McGrew, Pekka Riikonen, and Yaron Sheffar. 13. Change Log This section lists all the changes in this document. NOTE TO RFC EDITOR: Please remove this section before publication. 13.1. Draft -01 Added "Multiple and Simultaneous failover' scenarios. Now document provides a mechanism to sync either IKEv2 message or IPsec replay counter or both to cater different types of implementations. HA cluster's "failover count' is used to encounter replay of sync requests by attacker. The sync of IPsec SA replay counter optimized to to have just one global bumped-up outgoing IPsec SA counter of ALL Child SAs under an IKEv2 SA. The examples added for IKEv2 message Id sync to provide more clarity. Some edits as per comments on mailing list to enhance clarity. 13.2. Draft -00 Version 00 is identical to draft-kagarigi-ipsecme-ikev2-windowsync-04, started as WG document. Added IPSECME WG HA design team members as authors. Added comment in Introduction to discuss the window sync process on WG mailing list to solve some concerns. Singh, Ed., et al. Expires April 14, 2011 [Page 16] Internet-Draft High Availability in IKEv2/IPsec October 2010 14. References 14.1. Normative References [IPsec Cluster Problem Statement] Nir, Y., "IPsec Cluster Problem Statement", July 2010. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, "Internet Key Exchange Protocol: IKEv2", RFC 5996, September 2010. 14.2. Informative References [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for IKEv2", RFC 5685, November 2009. [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", RFC 5723, January 2010. Appendix A. IKEv2 Message Id examples Below are the examples to illustrate how the IKEv2 message Id values are synced. The notation used to denote EXPECTED_SEND_REQ_MESSAGE_ID and EXPECTED_RECV_REQ_MESSAGE_ID on a member is (EXPECTED_SEND_REQ_MESSAGE_ID, EXPECTED_RECV_REQ_MESSAGE_ID). Normal failover - Example 1 Standby [Newly Active] Member Peer - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Request SYNC (2, 3) --------> Peer has values as (4, 5) so it sends < -------------( 4, 5) Response SYNC Normal failover - Example 2 Standby [Newly Active] Member Peer - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Request SYNC (2, 5) --------> Singh, Ed., et al. Expires April 14, 2011 [Page 17] Internet-Draft High Availability in IKEv2/IPsec October 2010 Peer has values as (2, 4) so it sends < -------------( 5, 4) Response SYNC Simultaneous failover In case of simultaneous failover, both the sides send the SYNC request, but whichever side has the higher value will be eventually synced. Standby [Newly Active] Member Peer - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - request SYNC (4,4) -----> <-------------- request SYNC (5,5) response SYNC (5,5) ----> <-------- response SYNC (5,5) Authors' Addresses Raj Singh (Editor) Cisco Systems, Inc. Divyashree Chambers, B Wing, O'Shaugnessy Road Bangalore, Karnataka 560025 India Phone: +91 80 4301 3320 Email: rsj@cisco.com Kalyani Garigipati Cisco Systems, Inc. Divyashree Chambers, B Wing, O'Shaugnessy Road Bangalore, Karnataka 560025 India Phone: +91 80 4426 4831 Email: kagarigi@cisco.com Singh, Ed., et al. Expires April 14, 2011 [Page 18] Internet-Draft High Availability in IKEv2/IPsec October 2010 Yoav Nir Check Point Software Technologies Ltd. 5 Hasolelim st. Tel Aviv 67897 Israel Email: ynir@checkpoint.com Dacheng Zhang Huawei Technologies Ltd. Email: zhangdacheng@huawei.com Singh, Ed., et al. Expires April 14, 2011 [Page 19]