IPS Working Group M. Rajagopal, R. Bhagwat, LightSand Communications INTERNET-DRAFT E. Rodriguez, Lucent Technologies V. Chau, Gadzoox Networks (Expires October, 2001) J. Nelson, Vixel S. Wilson, Brocade Communications M. O'Donnell, McDATA C. Carlson, QLogic S. Rupanagunta, Aarohi Communications D. Fraser, Compaq M. Merhar, Pirus Networks D. Peterson, Cisco V. Rangan, Rhapsody Networks L. Lamers, SAN Valley Fibre Channel Over TCP/IP (FCIP) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as Reference material or to cite them other than as ``work in progress''. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html 1. Abstract Fibre Channel (FC) is a dominant technology used in Storage Area Networks (SAN). The purpose of this draft (FC over TCP/IP, FCIP) is to describe mechanisms that allow islands of FC SANs to be interconnected over IP-based networks to form a single, unified FC SAN fabric. FC over TCP/IP relies on IP-based network services to provide the connectivity between the SAN islands over LANs, MANs, or WANs. The FC over TCP/IP specification relies upon TCP for Rajagopal, et al. [Page 1] Internet-Draft Fibre Channel over TCP/IP April, 2001 congestion control and management and upon both TCP and FC for data error and data loss recovery. FC over TCP/IP treats all classes of FC frames the same -- as datagrams. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2]. 3. Motivation and Objectives Fibre Channel (FC) is a gigabit speed networking technology primarily used for Storage Area Networking (SAN). FC is standardized under American National Standard for Information Systems of the National Committee for Information Technology Standards (ANSI-NCITS) and has specified a number of documents describing its protocols, operations, and services [13]. The motivation behind connecting remote sites include disk or tape backup and live mirroring, or simply distance extension between two or more FC Switch clusters (SAN islands) or two or more FC devices. The first fundamental assumption made in this specification is that the Fibre Channel traffic is carried over the IP network in such a manner that the Fibre Channel fabric and all Fibre Channel devices on the fabric are unaware of the fact. This means that the FC datagrams MUST be delivered in such time as to comply with existing Fibre Channel specifications. The FC traffic may span LANs, MANs and WANs, so long as this fundamental assumption is adhered to. The second fundamental assumption made in this specification is that all Fibre Channel frames, regardless of whether the frames contain errors or not, are carried over the IP network. Thus, an FC frame which contains a bad CRC will be encapsulated and delivered to the receiving endpoint. The third assumption made in this specification is that the IP network is engineered in such a way that any errors introduced in the FCIP encapsulation layer occur at a very low rate. Thus the FCIP design has been optimized with this assumption in mind. While tunneling of Fibre Channel traffic over other IP networks not so engineered is not precluded, the above environment is an important one, and the FCIP design is optimized for such traffic, while not over-burdening other configured IP networks. Any error introduced in the FCIP encapsulation layer will result in the frame being dropped at the receiving end of the IP network. This will prevent the frame from being propagated to the FC network. Rajagopal, et al. [Page 2] Internet-Draft Fibre Channel over TCP/IP April, 2001 The objectives of this document are to: 1) specify the encapsulation and mapping of FC frames using the FC encapsulation method specified in . 2) apply the mechanism described in (1) to an FC-switched backbone network using an IP-based network as a backbone, or more generally, between any two FC devices. 3) address any FC concerns arising from tunneling FC traffic over an IP-based network, including security, data integrity (loss), congestion, and performance. This will be accomplished, where appropriate, by utilizing the existing IETF-specified suite of protocols. 4) be compatible with existing FC specifications. While new work may be undertaken in T11 [13] to optimize and enhance the bridging of FC networks/fabrics, this specification will not require adherence to such future works. 4. FCIP Protocol 4.1 FCIP Device In this specification, the term FCIP device generally refers to any device that encapsulates FC frames into TCP byte streams and reassembles TCP byte streams to regenerate FC frames. Note: In an actual implementation, the FCIP device may be a stand-alone box or integrated with an FC device such as an FC backbone switch (BBW) or integrated with any TCP/IP device such as an IP switch or an IP router. The FCIP device is a transparent translation point. The IP network is not aware of the FC payload that it is carrying. Likewise, the FC fabric and the FC end nodes are unaware of the IP-based transport. 4.2 Protocol The FCIP protocol specifies the TCP/IP encapsulation, mapping and routing of FC frames and applies these mechanisms to an FC network utilizing IP for its backbone, or more generally, between any two FC devices. The FCIP protocol is summarized below: 1. All FCIP protocol devices are peers and communicate using TCP/IP. Each FCIP device behaves like a TCP endpoint from the perspective of the IP-based network. That is, these devices do not perform IP routing or IP switching but simply forward FC frames. Rajagopal, et al. [Page 3] Internet-Draft Fibre Channel over TCP/IP April, 2001 2. There is no requirement for an FCIP device to establish a login with a peer before communication begins. However, FCIP devices MAY authenticate the IP packet before accepting it using the IPSec protocols. An FCIP device receiver simply listens to the appropriate destination TCP port number (to be assigned) to commence communication with other FCIP devices. 3. Each FCIP device MAY be statically or dynamically configured with a list of IP addresses corresponding to all the participating FCIP devices. Dynamic discovery of participating FCIP devices MAY be performed using Internet protocols such as LDAP, DHCP or other discovery protocols. (Discovery work is in progress). 4. Discovery of FC addresses (accessible via the FCIP device) is provided by techniques and protocols within the FC architecture. These techniques and protocols are described in Fibre Channel ANSI standards ([3], [7], [15]). FCIP devices do not participate in the discovery of FC addresses although the FC fabric elements of which they are a part MAY participate. The establishment of relationships between FC addresses and TCP port numbers is outside the scope of this document. FCIP devices MAY discover FC domains reachable through other FCIP devices by exchanging FC routing information with each other. The exact method used to exchange FC routing information between FCIP devices is beyond the scope of this document. FCIP devices MAY also discover FC domains reachable through the fabric region they are attached to by exchanging FSPF routing information with the FC switches they are connected to. 5. The exact path (route) taken by an FC over TCP/IP encapsulated packet follows the normal procedures of routing any IP packet. From the perspective of the FCIP devices this communication is between only two FCIP devices for any given packet. 6. An FCIP device MAY send FC encapsulated TCP/IP packets to more than one FCIP device. However, these encapsulated packets are treated as separate instances and are not correlated in any way by the FCIP protocol devices. The source FCIP device routes its packets based on the 3-byte FC destination Address Identifier (D_ID) contained in each encapsulated FC frame. 7. The IPSec architecture MAY be used to provide secure communications for FCIP protocol across the IP-based network. Other security protocols are not precluded. Rajagopal, et al. [Page 4] Internet-Draft Fibre Channel over TCP/IP April, 2001 8. Any re-ordering of data due to IP MTU fragmentation, TCP MSS fragmentation, or IP packet re-ordering will be recovered in accordance with a normal TCP reliable delivery behavior. The FCIP device will be aware only of the TCP-delivered byte stream and not cognizant of the TCP recovery action taken to deliver the byte stream. 9. FCIP relies on both TCP error recovery mechanism and normal FC recovery mechanisms to detect and recover from data loss and corruption within the IP portion of the overall FC and IP network. 10. Fibre Channel provides support for several classes of service with differing loss, priority, and capacity characteristics. An FCIP device MAY choose to map these classes to available DiffServ services of the IP network. 11. FCIP uses the common encapsulation method as specified by . 4.3 FCIP's Interaction with FC and TCP The FCIP device always delivers entire FC frames to the FC ports connected to it. The FC ports MUST remain unaware of the existence of the IP network that provides, through the FCIP devices, the connection for these FC ports. The FCIP device SHALL treat all classes of FC frames the same - as datagrams to be inserted into the TCP byte stream. Since FC Primitives and Primitive Sequences are not exchanged between FCIP devices, there may be times when an FC frame is lost within the IP network. When this event occurs it is the responsibility of the communicating FC devices to detect and correct the errors. The FCIP devices MAY choose not to generate Fibre Channel's F_BSY or F_RJT frames or otherwise participate in FC frame recovery. Each FCIP data frame is built by adding an FCIP header to one FC frame delivered to the FCIP endpoint for transport. The FCIP data frames are handed in their entirety to TCP; TCP is responsible for delivering the same series of FCIP data frames to the receiving side in the same order as they are transmitted by the sending FCIP device. The FCIP device MUST find the FCIP headers and deliver the FC frames wrapped inside the FCIP data frames to the correct FC ports connected to the FCIP device. Note that the order of the FC frames sent by the FCIP device may Rajagopal, et al. [Page 5] Internet-Draft Fibre Channel over TCP/IP April, 2001 not be the same as the order sent by the source FC device. This is due to the fact that FC frames may be re-ordered in the Fibre Channel network/fabric before reaching the ingress FCIP device. The relationship between FCIP and other protocols is illustrated in the following diagram: FC switch port FCIP Device +--------+ +------------------------+ | FC-SW | | FC-IP | +--------+ +--------+---------------+ | FC-2 | | FC-2 | | TCP | +--------+ +--------- +--------+ | FC-1 | | FC-1 | | IP | +--------+ +--------+ +--------+ | FC-0 | | FC-0 | | LINK | +--------+ +--------+ +--------+ | | | PHY | | | +--------+ | | | | | | ---------------- -------> to the other FC-IP Devices Fig. 2 Protocol Stack Diagram 5. FCIP Encapsulation 5.1 FC Frame Format (INFORMATIVE) All FC frames have a standard format much like LAN's 802.x protocols. However, the exact size of each frame varies depending on the size of the variable fields. The size of the variable field ranges from 0 to 2112-bytes as shown in the FC Frame Format in Fig. 3 resulting in the minimum size FC Frame of 36 bytes and the maximum size FC frame of 2148 bytes. Valid Fibre Channel frame lengths are always a multiple of four bytes. +------+--------+-----------+----//-------+------+------+ | SOF |Frame |Optional | Frame | CRC | EOF | | (4B) |Header |Header | Payload | (4B) | (4B) | | |(24B) |<----------------------->| | | | | | Data Field = (0-2112B) | | | +------+--------+-----------+----//-------+------+------+ Rajagopal, et al. [Page 6] Internet-Draft Fibre Channel over TCP/IP April, 2001 Fig. 3 FC Frame Format SOF and EOF Delimiters: On an FC link, Start-of-Frame (SOF) and End-Of-Frame (EOF) are called Ordered Sets and are sent as special words constructed from the 8B/10B comma character (K28.5) followed by three additional 8B/10B data characters making them uniquely identifiable in the data stream. On an FC link the SOF delimiter serves to identify the beginning of a frame and prepares the receiver for frame reception. The SOF contains information about the frame's Class of Service, position within a sequence, and in some cases, connection status. The EOF delimiter identifies the end of the frame and the final frame of a sequence. In addition, it serves to force the running disparity to negative. The EOF is used to end the connection in connection-oriented classes of service. It is therefore important to preserve the information conveyed by the delimiters across the IP-based network, so that the receiving FCIP device can correctly reconstruct the FC frame in its original SOF and EOF format before forwarding it to its ultimate FC destination on the FC link. When an FC frame is encapsulated and sent over a byte-oriented interface, the SOF and EOF delimiters are represented as sequences of four consecutive bytes, which carry the equivalent Class of Service and frame termination information as the FC ordered sets. This form of encoding can not provide unambiguous identification of frame beginning and end, however, and must rely on other mechanisms provided by the encapsulation protocol. Frame Header: The FC Frame Header is transparent to the FCIP device. The FC Frame Header is 24 bytes long and has several fields that are associated with the identification and control of the payload. Current FC Standards allow up to 3 Optional Header fields [4], [5]: - Network_Header (16-bytes) - Association_Header (32-bytes) - Device_Header (up to 64-bytes). Frame Payload: Rajagopal, et al. [Page 7] Internet-Draft Fibre Channel over TCP/IP April, 2001 The FC Frame Payload is transparent to the FCIP device. An FC application level payload is called an Information Unit at the FC-4 Level. This is mapped into the Frame Payload of the FC Frame. A large Information Unit is segmented using a structure consisting of FC Sequences. Typically, a Sequence consists of more than one FC frames. FCIP does not maintain any state information regarding the relationship of frames within a FC Sequence. CRC: The FC CRC is 4 bytes long and uses the same 32-bit polynomial used in FDDI and is specified in ANSI X3.139 Fiber Distributed Data Interface. This CRC value is calculated over the entire FC header and the FC payload; it does not include the SOF and EOF delimiters. Note: When FC frames are encapsulated into FCIP frames, the FC frame CRC is untouched by the FCIP device. 5.3 TCP Connection Management In order to realize a Virtual ISL between two FC end-points, an FCIP Device establishes TCP connections with its peer FCIP Device. In order to achieve better TCP aggregate throughput properties in the face of packet losses, a pair of peer FCIP devices MAY use multiple TCP connections between them, and use appropriate policies for mapping FC frames to these connections. It may also be useful to assign a pool of connections for transmission of priority and control messages (e.g., Class F messages) on connections so they do not encounter "head of line" blocking behind Class 2 or Class 3 traffic. The use of multiple connections and policies for distributing frames on these connections are described in Section 5.5. FCIP Devices SHALL listen for new TCP connection requests on the well- known port . Any FCIP device establishing a TCP connection SHALL direct it to this well known port number. Also, an FCIP Device MAY use an existing connection, previously established by its peer. An FCIP device MAY also accept and establish TCP connections to a different TCP port number, as configured by the network administrator. Rajagopal, et al. [Page 8] Internet-Draft Fibre Channel over TCP/IP April, 2001 A Virtual ISL and the two FCIP Device endpoints that are involved are operational only after the first TCP connection is established. The sequence of operations performed in order to establish a Virtual ISL is as follows. 1. The FCIP device initializes its local resources to enable it to listen for TCP connection requests. 2. The FCIP device discovers the FCIP device endpoints that it can establish a virtual ISL. The result of the discovery SHALL be, at the minimum, the IP address and the TCP port of the peer endpoint. The discovery process may rely on administrative configuration or on services such as SLP or iSNS (TBD). (Needs to have its own section eventually). 3. FCIP device endpoint initiates a TCP connection to the peer endpoint. It also sets up operational parameters for both TCP and IP layers for optimal performance, as described in section 5.3.1. 4. The FCIP device endpoint SHALL exchange security context and authenticate itself to the peer endpoint. The use of security context is explained in section TBD. After connection establishment, FCIP devices use the FCIP frame encapsulation as defined in [common encapsulation document]. 5. At this point the FCIP device endpoint SHALL exchange Fibre Channel port initialization frames (Switch ISL) to enable and identify port operation. Port state machine and initialization are described in Fibre Channel Methodologies for Interconnect (FC-SW 2) standards. 6. An FCIP device operates in E-port or B-Port mode. When operating in E-Port mode, normal FC-SW2 FSPF messages are exchanged and the switch port becomes operational. 7. For computing the link cost of the ISL, the following formula SHALL be used: [TBD]. In certain deployments, a single FCIP device endpoint MAY establish virtual ISLs with multiple FCIP device endpoints. In this situation, the FCIP device endpoint SHALL manage TCP operational parameters independently for each ISL. Also, the FCIP Device Endpoint SHALL perform the E_Port or B_Port initialization independently, for each connection. An FCIP Device Endpoint uses normal TCP based flow control mechanisms for managing its internal resources and match that with the advertised TCP Receiver Window Size. Thus, an FCIP Device endpoint is NOT REQUIRED to advertise or manage Fibre Channel BB_Credits or process any R_RDY frames. Rajagopal, et al. [Page 9] Internet-Draft Fibre Channel over TCP/IP April, 2001 An FCIP Device Endpoint SHALL implement established TCP mechanisms as defined in RFC 2581 [20] for congestion control on its connections. 5.3.1 TCP Connection Parameters In order to provide efficient management of FCIP Device resources as well as link resources, certain TCP connection parameters are recommended. It is recommended that FCIP devices use the TCP mechanisms for Long Fat Networks (LFNs) (i.e. an IP networks with large (bandwidth*delay) product), as defined in RFC 1072 [22]. 5.3.1.1 TCP Selective Acknowledgement Option The Selective Acknowledgement option RFC 2883 [21] allows receiving end to acknowledge multiple lost packets in a single ACK, enabling faster recovery. An FCIP device MAY negotiate use of TCP SACK and use it for faster recovery from lost packets and holes in TCP sequence number space. 5.3.1.2 TCP Window Scale option This option allows TCP window sizes larger than 16-bit limits to be advertised by the receiver. It is necessary to allow data in long fat networks to fill the available pipe. This also implies buffering on the TCP sender that matches the (bandwidth*RTT) product of the TCP connection. A TCP endpoint SHALL use locally available mechanisms to set a window size that matches the available local buffer resources and the desired throughput. 5.3.1.3 IP DSCP Option The recommended IP DSCP field setting is 101110 corresponding to the EF service. (Need better wording to fit current Diffserv specifications.) 5.3.1.4 Protection against sequence number wrap It is recommended that TCP endpoints implement protection against sequence number wrap. It is quite possible that within a single connection, TCP sequence numbers wrap within a timeout window. 5.3.1.5 TCP No Delay Option Rajagopal, et al. [Page 10] Internet-Draft Fibre Channel over TCP/IP April, 2001 TCP endpoints SHALL disable the Nagle TCP No Delay option. This option is designed for usage in a telnet environment. 5.4 TCP Connection Error Recovery 5.4.1 Determining loss of connectivity FCIP Device endpoints SHALL implement detection and recovery from lost TCP connections. In idle mode, a TCP connection "keep alive" option of TCP is normally used to keep a connection alive. However, this timeout is fairly large and may prevent early detection of loss of connectivity. In order to facilitate faster detection of loss of connectivity, FCIP devices exchange FCIP specific Extended Link Service command messages. These FCIP ELS messages use the same encapsulation mechanism as described in TBD. Upon detecting a loss of connectivity, an FCIP Device SHALL establish a new connection, or SHALL use an existing TCP connection to the same FCIP Device endpoint. An FCIP Device SHALL NOT retransmit an FCIP frame on the new connection. This is to ensure exactly-once delivery semantics to the Fibre Channel endpoint. 5.4.2 TCP Synchronization Errors If the FCIP Framing and Encapsulation layer determines that it has lost synchronization or has received FCIP header CRC error, it SHALL drop the particular frame and attempt to synchronize at the earliest possible subsequent frame. For frames with FCIP header errors, the FCIP Device SHALL drop the frame and update appropriate error counters. FCIP device endpoints assume that if the TCP layer determines that there are TCP checksum errors, the TCP layer invokes appropriate TCP retransmission and error recovery procedures. So the FCIP layer gets an ordered delivery of FCIP frames. Hence, these errors are transparent to FCIP layer. If the FCIP Device layer is delivered frames which are delayed by more than R_A_TOV in the IP network, the FCIP Device layer SHALL drop the frame. This SHALL continue until a FCIP Encapsulated frame whose life in the IP network is smaller than R_A_TOV. Note that unlike a physical Fibre Channel link, the FCIP Device endpoints may involve IP routing dynamics that result in reliable, ordered delivery at the TCP layer, with the Rajagopal, et al. [Page 11] Internet-Draft Fibre Channel over TCP/IP April, 2001 result that some FC Fabric operating constraints may be violated. The FCIP device is responsible for detecting violations of these FC Fabric constraints and discarding affected frames. 5.5 Multiple Connection Management A pair of FCIP device endpoints MAY establish a certain number of TCP connections between them. Since a Virtual ISL potentially maps a fairly large number of FC flows (where a flow is a pair of Fibre Channel S_ID, D_ID addresses), it may not be practical to establish a separate TCP connection for each Fibre Channel flow. In order to address this, an implementation MAY choose to manage a pool of TCP connections for a single Virtual ISL and map Fibre Channel flows to TCP connections of that ISL. However, while assigning Fibre Channel flows to TCP connections, an implementation SHALL follow the following rules: 1. Once a channel flow is assigned to a TCP connection within the virtual ISL, it SHALL send all Fibre Channel frames of that flow on that connection. 2. When an FCIP endpoint processes any response traffic from a particular target, the Endpoint SHALL send the response on the same connection on which the request was sent. 3. Any class 2 ACK frames SHALL be sent on the same connection in which the original frame was sent. These rules are in place to honor any in-order delivery guarantees that may have been made between the two end points of the Fibre Channel flow. 5.6 Multi Virtual ISL Management It is quite likely that a single switch may provide multiple Virtual ISLs, all providing alternate connectivity paths between two switches. In this situation, a switch SHALL select any of the available ISLs for mapping a FCIP flow. In doing so, a switch MUST follow a flow allegiance model, where a pair of Fibre Channel [S_ID, D_ID] end points are always mapped to the same Virtual ISL. Furthermore, switches SHALL implement a connection allegiance policy, which ensures that the responses to particular [S_ID, D_ID] pair is always sent back on the same Virtual ISL. 6. FCIP Network and Device model 6.1 General FCIP Model Fibre Channel defines interconnections between FC Fabric Elements Rajagopal, et al. [Page 12] Internet-Draft Fibre Channel over TCP/IP April, 2001 [3]. Pairs of FCIP devices (described in this draft) connected by a TCP/IP network may replace the Fibre Channel connection between two FC Fabric Elements in ways specified by the Fibre Channel documents that reference this draft [13]. The concept of the FC/FCIP fabric and interconnections is shown in figure 5 below. All Fibre Channel Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fibre Channel Fabric / / | | | | | | / / +---------+ Fibre Channel +---------+ --| Fabric | defined | Fabric |-- / / --| Element |------------------| Element |-- --| A | connection | B |-- / / +---------+ +---------+ | | | | | | / / ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fibre Channel + FCIP Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fibre Channel Fabric / / | | | | | | / / +-----------+ +-----------+ --| Fabric | FCIP | Fabric |-- / / --| Element FD|--------------|FD Element |-- --| A | connection | B |-- / / +-----------+ +-----------+ | | | | | | / / ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ note: FD = FCIP Device Fig. 5 General FCIP Model with Respect to FC Fabric It is expected that the FCIP device will be integrated with the FC Fabric Element for which it supplies one end of an FCIP connection. There is no interface specified between the FCIP device and the FC Fabric Element. The following requirements are placed on the replacement of a Fibre Channel connection with an FCIP connection: 1. The connection of both Fibre Channel components MUST have an analogous connection defined in the ANSI-NCITS Fibre Channel Standards [13]. 2. The same type of Fibre Channel components MUST be located at both Rajagopal, et al. [Page 13] Internet-Draft Fibre Channel over TCP/IP April, 2001 ends of the connection served by FCIP devices. 3. The requirements in this standard apply equally to any pair of FCIP devices and the FC Fabric Elements of which they are a part wishing to tunnel FC frames across an IP network. Any FC routing protocol exchanges may still occur transparently to the pair of FCIP devices providing the connection. It should be noted that Fibre Channel Primitive Sequences and Primitives are not exchanged between pairs of FCIP devices. 6.2 Model Details for Border Switches +-----+ +-----+ | SW1 | | SW3 | +-----+ +-----+ | +-----+ +-----+ | | | | | | | +-----+E B| BBW | ////// | BBW |E E+-----+ | SW2 |----| 1 |--- ---| 2 |----| SW4 | +-----+ | | / / | | +-----+ +-----+ +-----+ / / / IP / +-----+ Network +-----+ | | / / | | +-----+E E| BBW | | BBW |B E+-----+ | SW6 |----| 3 |---/ /---| 4 |----| SW7 | +-----+ | | ////// | | +-----+ | +-----+ +-----+ | | | +-----+ +-----+ | SW5 | | SW8 | +-----+ +-----+ Fig. 6 FCIP Border Switch Model The Fibre Channel Border Switch model for an FCIP device is an extension of the Backbone WAN (BBW) device currently defined for ATM (BBW_ATM) and SONET (BBW_SONET) in FC-BB. The planned extensions to the current BBW model include: 1) Defining the Fibre Channel operational details of Rajagopal, et al. [Page 14] Internet-Draft Fibre Channel over TCP/IP April, 2001 a BBW_TCP/IP device 2) Providing for three Fibre Channel interconnection configurations transported by BBW_TCP/IP devices: a) [B_Port]..BBW_TCP/IP<---->BBW_TCP/IP..[B_Port] b) [E_Port]..BBW_TCP/IP<---->BBW_TCP/IP..[E_Port] c) [B_Port]..BBW_TCP/IP<---->BBW_TCP/IP..[E_Port] d) [E_Port]..BBW TCP/IP<---->BBW TCP/IP..[B_Port] Note: For an FCIP device, d) is the mirror case of c) and is architecturally identical to c) but operational conditions within the FC fabric may be different depending on whether case c) or its mirror d) is in effect. 3) Defining Virtual inter-switch links (ISL's) between the E_ports on BBW_TCP/IP devices. Because substantial Fibre Channel fluency is required for the BBW_TCP/IP definition, it is anticipated that FCIP will contain only a high level overview of the model, with the details appearing in FC-BB-2 (an existing T11 project created specifically to extend the Backbone concept to TCP/IP). 7. Security Considerations Using a wide-area, general purpose network such as an IP internet in a position normally occupied by physical cabling introduces some security problems not normally encountered in Fibre Channel storage networks. Normal FC media are typically protected physically from outside access; IP internets typically invite outside access. The general effect is that the security of the entire Fibre Channel internetwork is only as good as the security of the entire IP internet through which it tunnels. The following broad classes of attacks are possible: 1. Unauthorized Fibre Channel controllers can gain access to resources through normal Fibre Channel processes. 2. Unauthorized agents can monitor and manipulate Fibre Channel traffic flowing over physical media used by the IP internet and under control of the agent. To a large extent, these security risks are typical of the risks facing any other application using an IP internet. They are mentioned here only because Fibre Channel storage networks are not normally suspicious of the media. Fibre Channel storage network administrators will need to be aware of these additional security Rajagopal, et al. [Page 15] Internet-Draft Fibre Channel over TCP/IP April, 2001 risks. Security protocols and procedures used in other IP applications may be used for FCIP. For Virtual Private Networks, both authentication and encryption are generally desired, because it is important both to (1) assure that unauthorized users do not penetrate the virtual private network and (2) assure that eavesdroppers on the network cannot read messages sent over the network. Note: Use of the IPSec protocol suite is optional. Security work is in progress. 8. Data Integrity Considerations The material in this section is subject to change pending work in progress in the data integrity area. 8.1 Loss of FCIP synchronization The use of the FCIP length with either or both of the EOF byte-code immediately preceding the FCIP header and the SOF byte-code immediately following the FCIP header provides enough verification that the FCIP devices communicating over a particular TCP connection are synchronized with each other. If a communicating pair of FCIP devices loses synchronization (the receiving FCIP device cannot find the next FCIP header) due to data loss, network congestion, or other error conditions in the TCP byte stream, the receiving FCIP device SHALL reset the TCP connection (set the RST bit). 8.2 Loss Recovery from data loss due to IP datagram loss is provided via the TCP reliable delivery mechanism. Note: Due to varying TCP timeouts, competing FC and TCP recovery schemes is a possibility. This issue is addressed in section 8.4. 8.3 Corruption Data corruption is detected at two different levels: TCP checksum and Fibre Channel CRC. Data corruption detected at the TCP level SHALL be recovered via TCP reliable data recovery mechanisms. Data corruption detected at the Fibre Channel level SHALL be Rajagopal, et al. [Page 16] Internet-Draft Fibre Channel over TCP/IP April, 2001 handled within the Fibre Channel end nodes. Also, each recovery technique is performed independent of the other. FCIP devices that perform a CRC integrity check on encapsulated FC frames SHALL modify the EOF sequence from EOF-normal to EOF- normal- invalid for each frame found to have a CRC error. 8.4 Timeouts Fibre Channel has two important timeouts to consider in FCIP. These are: ED_TOV, and R_A_TOV. ED_TOV determines the life of an individual Fibre Channel frame in any particular fabric element. The effects of ED_TOV on the fabric as a whole are typically cumulative since each fabric element contains it's own ED_TOV timers for any frame received. R_A_TOV determines the life of an individual Fibre Channel frame in the fabric as a whole. For a fabric, R_A_TOV implies that no particular frame will remain in (and thus be emitted from) the fabric after the timer expires. TCP has a TCP acknowledgement timeout. This is a variable timeout. 8.5 Recovery Mechanisms When an FCIP data frame is transported over an IP network, there is a possibility of the frame's getting dropped. This can happen if there is congestion along the path within the IP network or if there are no empty buffers available on one of the incoming ports, due to bit errors, etc. When this happens, the TCP acknowledgement will not be received by the source, and normal TCP retry mechanisms will be activated. An issue may arise during these recovery mechanisms, since TCP timeout is variable, and may exceed Fibre Channel FC ED_TOV/R_A_TOV timeouts. 9. Performance Considerations The FCIP protocol does not crack the FC Frame (except for attaching the correct byte-encoded SOF and EOF) nor does it do any FC payload processing. This allows any FC traffic to be tunneled across at high throughput rates. Rajagopal, et al. [Page 17] Internet-Draft Fibre Channel over TCP/IP April, 2001 If fragmentation at the data link and IP layers is avoided by the use of path MTU discovery, throughput performance is enhanced. The Flow Control Protocol (discussed in the next section) provides the ability to stream gigabit FC data when using a large window size. 9.1 QoS Support The Differentiated Services Architecture (diffserv) provides a "Class of Service" to a flow aggregate [6], [17]. At so-called diffserv boundaries, IP packets are classified and marked. Within the diffserv domain, resources – bandwidth and buffers – are allocated for each classification. Packets with the same classification use the resources allocated for the classification. IP packets with the same destination and class marking exit a diffserv capable router in the same order they arrived. Packets with the same destination but different class markings exit according to priorities assigned to the different class markings. The Diffserv has renamed the Ipv4 TOS field as Differentiated Services Code Point (DSCP). The DSCP indicates the particular behavior a packet is to receive at each router. How a packet gets marked is based on a policy administered and configured into the network. [18] and [19] provide various encodings of the DSCP field to achieve a specific behavior from the routers. There may be several ways to administer the policies and the policy definition is up to the network provider. That is one network provider may choose to mark all packets going from one source IP address to a specific destination as "high priority", while another might mark just a specific traffic type (e,g., HTTP) as "high priority". Thus packets carry the desired class information and each diffserv- capable router treats the packet according to the information in its DSCP field. This is referred to as Per Hop Behavior (PHB). Currently, the IETF standards define essentially 3 types of services: Expedited Forwarding (EF) [18], Assured Forwarding (AF) [19], or Default Forwarding (DF) [6], [17]– that corresponds to its DSCP. [17] specifies the AF service AF PHB provides a way to prioritize best- effort traffic. Currently, 4 AF classes and 3 drop precedence levels are specified providing 12 different levels of forwarding assurances. The DSCP value specifies a drop-order in the event that a packet experiences congestion at a subsequent diffserv router. [18] specifies the DSCP code point equal to 101110 EF service which is also sometimes refereed to as "Premium" service. When supported, this class behavior has the lowest levels of latency, packet loss, and delay variation. This service behavior most Rajagopal, et al. [Page 18] Internet-Draft Fibre Channel over TCP/IP April, 2001 closely matches the Fibre Channel characteristics. This is therefore the recommended DSCP setting in the IP DSCP field. What resources are not used for EF and AF are left for the DF services which is really a best-effort service. Note that if a packet is being forwarded over an underlying network without diffserv support, then the packet would simply receive best- effort service regardless of its DSCP field setting. 9.2 TCP considerations In order to achieve better TCP aggregate throughput in the face of packet losses, a pair of peer FCIP devices may use multiple TCP connections between them, and use appropriate policies for mapping FC frames to these connections. The reason for this is the TCP's slow-start algorithm, which reduces TCP's window whenever it detects congestion in the network. If, on the other hand, the traffic is distributed across multiple connections, all the connections will not be affected at the same time, resulting in a better aggregate throughput. The use of multiple connections and policies for distributing frames on these connections are described in section TBD. Note that even though multiple connections provide better aggregate throughput (when packet losses occur on IP networks), it is not a requirement. A pair of FCIP devices may use single TCP connection to tunnel the FC traffic. It is recommended that FCIP devices use the TCP mechanisms for Long Fat Networks (LFNs) when they are used in IP networks with large (bandwidth*delay) product. These mechanisms include TCP window scale option, Selective Acknowledgement, among others. 10. Flow Control and Congestion Management FCIP protocol specifies encapsulating FC frames over IP networks, using TCP connections. The FCIP device is connected to both FC fabric and IP network and it needs to follow the flow control mechanisms on both the networks, which work independent of each other. This section provides guidelines as to how the FCIP device can map from one flow control mechanism to another, while encapsulating FC traffic over TCP connections and vice versa. There are two scenarios when the flow control management at FCIP device becomes crucial: Rajagopal, et al. [Page 19] Internet-Draft Fibre Channel over TCP/IP April, 2001 1. When there is mismatch between the line speeds of FC and IP networks. Even though it is recommended that both FC and IP networks be of comparable speeds, it is possible that FC traffic is carried over an IP network of different line speed and bit error rates. 2. When one of the networks (FC or IP) is congested. Even when both FC and IP networks are of comparable speeds, during the course of operation, one of the networks could be congested due to transient conditions. The FCIP device needs to use the available flow control mechanisms in TCP and FC protocols to handle these situations. The FCIP protocol does not specify any particular mechanism to handle the flow control, but leaves this to implementation's choice. 10.1 Flow control on FC network When the Fibre channel traffic is encapsulated over TCP connection(s), FCIP device needs to ensure that TCP connections can handle the frame arrival rate from Fibre channel network. This MAY require FCIP device to use Buffer-to-buffer flow control on its Fibre channel Port(s), to control the frame arrival rate. Alternatively, the FCIP device MAY choose to send F_BSY frame to the originator of FC frame, for FC Class-1 (Connect Request) and Class-2 frames. 10.2 Flow control on IP network When the FCIP device needs to forward frames from TCP connection(s) to Fibre channel ports, it needs to follow the Buffer-to-buffer credit mechanism on its FC port(s). If there is no available credit on the FC port(s), FCIP device MAY require to control the packet arrival rate from the IP network, by using TCP windowing techniques. This MAY involve advertising zero-window on TCP connections occasionally, so that the TCP connection is flow controlled while the FC network is congested. 11. References: [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 Rajagopal, et al. [Page 20] Internet-Draft Fibre Channel over TCP/IP April, 2001 [3] NCITS 321-200x (ANSI) T11/Project 1305-D/Rev 4.9 "Fibre Channel Switch-Fabric-2", (FC-SW-2) November 14, 2000 (www.t11.org) [4] Fibre Channel Physical and Signaling Interface-3 (FC-PH-3), Rev. 9.4, ANSI X3.303-1998 [5] The Fibre Channel Consultant: A Comprehensive Introduction, "Robert W. Kembel", Northwest Learning Associates, 1998 [6] Nichols, K., Blake, S., Baker, F. and D. Black, " Definition of the Differentiated Services Field (DS Field) in the IPv4 and Ipv6 Headers", RFC 2474, December 1998. [7] NCITS T11/Project 1238-D/Rev4.7 "Fibre Channel Backbone", (FC-BB) June 8, 2000 (www.t11.org) [8] Kent, S. and Atkinson, R., "Security Architecture for the Internet Protocol", RFC 2401, Nov 1998 [9] Kent, S. and Atkinson, R., "IP Authentication Header", RFC 2402, Nov 1998 [10] Kent, S. and Atkinson, R., "IP Encapsulating Security Payload (ESP)", RFC 2406, Nov 1998 [11] Maughan, D. et all, "Internet Security Association and Key Management Protocol (ISAKMP)", RFC 2408, Nov 1998 [12] http://www.isi.edu/in-notes/iana/assignments/protocol-numbers [13] http://www.t11.org [14] Fibre Channel Physical and Signaling Interface (FC-PH), Rev 4.3, ANSI X3.230-1994. [15] Fibre Channel NCITS 321-200x (ANSI) T11/Project 1356-D/Rev4.3 " Fibre Channel - Generic Services 3", June 2000 (www.t11.org)). [16] ISI, "Transmission Control Protocol", RFC 793, Sep 1981 [17] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., Weiss, W., "An Architecture for Differentiated Services", RFC 2475, Dec 1998 [18] Jacobson, V., Nichols, K., Poduri, K., "An Expedited Forwarding PHB Group", RFC 2598, June 1999 [19] Heinanen, J., Baker, F., Weiss, W., Wroclawski, J., "An Rajagopal, et al. [Page 21] Internet-Draft Fibre Channel over TCP/IP April, 2001 Assured Forwarding PHB", RFC 2597, June 1999 [20] Allman, et al., "TCP Congestion Control", RFC 2581, April 1999 [21] Floyd, et al, "SACK Extension", RFC 2883, July 2000 [22] Jacobson & Braden, "TCP Extensions for Long-Delay Paths", RFC1072, October 1988 12. Acknowledgments 13. Authors' Addresses Murali Rajagopal LightSand Communications, Inc. 24411 Ridge Route Dr. Suite 135 Laguna Hills, CA 92653 Phone: 949-837-1733 x101 Email: muralir@lightsand.com Raj Bhagwat LightSand Communications, Inc. 24411 Ridge Route Dr. Suite 135 Laguna Hills, CA 92653 Phone: 949-837-1733 x104 Email: rajb@lightsand.com Elizabeth G. Rodriguez Lucent Technologies 1202 Richardson Drive, Suite 210 Richardson, TX 75080 Phone: +1 972 231 0672 Fax: +1 972 671 5476 Email: egrodriguez@lucent.com Vi Chau Gadzoox Networks, Inc. 16241 Laguna Canyon Road, Suite 100 Irvine, CA 92618 Phone: +1 949 789 4639 Fax: +1 949 453 1271 Email: vchau@gadzoox.com Gaby Hecht Rajagopal, et al. [Page 22] Internet-Draft Fibre Channel over TCP/IP April, 2001 Gadzoox Network, Inc. 16241 Laguna Canyon Road, Suite 100 Irvine, CA 92618 Phone: +1 949 789 4642 Email: ghecht@Gadzoox.com Ken Hirata Vixel Corporation 15245 Alton Parkway, Suite 100 Irvine, CA 92618 Phone: +1 949 788 6368 Email: ken.hirata@vixel.com Jim Nelson Vixel Corporation 15245 Alton Parkway, Suite 100 Irvine, CA 92618 Phone: +1 949 450 6159 Fax: +1 949 753 9500 Email: Jim.Nelson@vixel.com Steve Wilson Brocade Communications Systems, Inc. 1745 Technology Drive San Jose, CA. 95110 Phone: 408-487-8128 Fax: 408-487-8101 email: swilson@brocade.com Bob Snively Brocade Communications Systems, Inc. 1745 Technology Drive San Jose, CA 95110 Phone: 408 487 8135 Email: rsnively@brocade.com Ralph Weber ENDL Texas, representing Brocade Suite 102#178 18484 Preston Road Dallas, TX 75252 Phone: +1 214 912 1373 Email: roweber@acm.org Michael E. O'Donnell McDATA Corporation 310 Interlocken Parkway Broomfield, Co. 80021 Rajagopal, et al. [Page 23] Internet-Draft Fibre Channel over TCP/IP April, 2001 Phone: +1 303 460 4142 Fax: +1 303 465 4996 Email: modonnell@mcdata.com Anil Rijhsinghani McDATA Corporation 5 Brickyard lane Westboro, MA 01581 Phone: +1 508 870 6593 Email: anil.rijhsinghani@mcdata.com Craig W. Carlson QLogic Corporation 6321 Bury Drive Eden Prairie, MN 55346 Phone: +1 952 932 4064 Email: craig.carlson@qlogic.com Sriram Rupanagunta Aarohi Communications 3200 Montelena Drive San Jose, CA 95135 Phone: 408-966-8309 Email: sriramr@aarohi-inc.com Milan J. Merhar Pirus Networks Acton, MA 01720 Phone: +1 978 206 9124 Email: Milan@pirus.com Venkat Rangan Rhapsody Networks Inc. 3450 W. Warren Ave Fremont, CA 94538 Phone: +1 510 743 3018 Fax: +1 510 687 0136 Email: venkat@rhapsodynetworks.com Donald R. Fraser Compaq Computer Corporation 301 Rockrimmon Blvd Colorado Springs, CO 80919 Phone: 719-548-3272 Email: don.fraser@compaq.com ANNEX A: Relationship between FCIP and IP over Rajagopal, et al. [Page 24] Internet-Draft Fibre Channel over TCP/IP April, 2001 FC (IPFC) IPFC (RFC 2625) describes the encapsulation of IP packets in FC frames. It is intended to facilitate IP communication over an FC network. FCIP describes the encapsulation of FC frames in TCP segments which in turn are encapsulated inside IP packets for transporting over an IP network. It gives no consideration to the type of FC frame that is being encapsulated. Therefore, the FC frame may actually contain an IP packet as described in the IP over FC specification (RFC 2625). In such a case, the data packet would have: Data Link Header IP Header TCP Header FCIP Header FC Header IP Header Note: The two IP headers would not be identical to each other. One would have information pertaining to the final destination while the other would have information pertaining to the FCIP device. The two documents focus on different objectives. As mentioned above, implementation of FCIP will lead to IP encapsulation within IP. While perhaps inefficient, this should not lead to issues with IP communication. One caveat: if a Fibre Channel device is encapsulating IP packets in an FC frame (e.g. an IPFC device), and that device is communicating with a device running IP over a non-FC medium, a second IPFC device will need to act as a gateway between the two networks. This scenario is not specifically addressed by FCIP. There is nothing in either of the specifications to prevent a single device from implementing both FCIP and IP-over-FC (IPFC), but this is implementation specific, and is beyond the scope of this document. Full Copyright Statement Copyright (C) The Internet Society (1999). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included Rajagopal, et al. [Page 25] Internet-Draft Fibre Channel over TCP/IP April, 2001 on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. [draft-ietf-ips-fcovertcpip-02.txt] [This INTERNET DRAFT expires in October, 2001] Rajagopal, et al. [Page 26]