IP Storage Working Group Charles Monia INTERNET DRAFT Rod Mullendore Expires November 2001 Josh Tseng Nishan Systems Franco Travostino Victor Firoiu Nortel Networks David Robinson Sun Microsystems Wayland Jeong Troika Networks Rory Bolt Quantum/ATL Paul Rutherford ADIC Mark Edwards Eurologic May 2001 iFCP - A Protocol for Internet Fibre Channel Storage Networking Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Comments Comments should be sent to the ips mailing list (ips@ece.cmu.edu) or to the author(s). Monia, et al. Standards Track 1 iFCP Revision 2 May 2001 Status of this Memo..............................................1 Comments.........................................................1 1. Abstract................................................4 2. About This Document.....................................4 2.1 Conventions used in this document.......................4 2.2 Purpose of this document................................4 3. iFCP Introduction.......................................4 3.1 Definitions.............................................5 3.2 The iFCP Network Model..................................6 3.3 The N_PORT Addressing Model.............................8 3.3.1 Operation in Address Transparent Mode.................11 3.3.2 Operation in Address Translation Mode.................12 3.4 iFCP Layered Services..................................16 3.4.1 Application Layer.....................................17 3.4.2 FC-4 Layer (FCP)......................................18 3.4.3 FC-2 Layer............................................18 3.4.4 iFCP Layer............................................18 4. iFCP Protocol..........................................19 4.1 Overview...............................................19 4.1.1 iFCP Transport Services...............................19 4.1.2 iFCP Support for Link Services........................19 4.2 Mandatory FC-2 Functionality...........................19 4.3 FC-2 Functionality Not Supported.......................19 4.4 Optional FC-2 Functionality............................20 5. TCP Stream Transport of iFCP Frames....................20 5.1 TCP Session Model......................................20 5.2 IFCP Session Management................................20 5.2.1 Creating an N_PORT Login Session......................20 5.2.2 Terminating an N_PORT Login Session...................21 5.3 TCP Port Numbers.......................................22 6. Encapsulation of Fibre Channel Frames..................23 6.1 Encapsulation Header Format............................23 6.1.1 Common Encapsulation Flags............................25 6.2 SOF and EOF Delimiter Fields...........................26 6.3 Frame Encapsulation and De-encapsulation...............27 7. Link Services..........................................28 7.1 Augmented Link Service Messages........................29 7.2 Augmented Link Services Requiring Payload Address Translation.....................................................30 7.3 Augmented Link Services................................31 7.3.1 Abort Exchange (ABTX).................................32 7.3.2 Discover Address (ADISC)..............................33 7.3.3 Discover Address Accept (ADISC ACC)...................34 7.3.4 FC Address Resolution Protocol Reply (FARP-REPLY).....34 7.3.5 FC Address Resolution Protocol Request (FARP-REQ).....36 7.3.6 Logout (LOGO).........................................37 7.3.7 Port Login (PLOGI)....................................37 7.3.8 Read Exchange Concise.................................38 7.3.9 Read Exchange Concise Accept..........................39 7.3.10 Read Exchange Status Block (RES)....................40 7.3.11 Read Exchange Status Block Accept...................40 Monia Standards Track 2 iFCP Revision 2 May 2001 7.3.12 Read Link Error Status (RLS)........................41 7.3.13 Read Sequence Status Block (RSS)....................42 7.3.14 Reinstate Recovery Qualifier (RRQ)..................43 7.3.15 Request Sequence Initiative (RSI)...................43 7.3.16 Third Party Process Logout (TPRLO)..................44 8. TCP Session Control Messages..........................45 8.1 Connection Bind (CBIND)...............................47 8.2 Unbind Connection (UNBIND)............................49 9. iFCP Error Detection..................................50 9.1 Overview..............................................50 9.2 Timer Definitions and Stale Frame Detection...........50 9.2.1 Error_Detect_Timeout (E_D_TOV).......................50 9.2.2 Resource Allocation Timeout (R_A_TOV.................51 10. Fabric Services Supported by an iFCP implementation...52 10.1 iFCP Support for the FC Broadcast Service.............53 11. Security..............................................54 11.1 Overview..............................................54 11.2 Physical Security.....................................54 11.3 Controlling Access....................................54 11.4 Authentication and Encryption.........................54 11.5 Storage Firewalls.....................................55 12. Quality of Service Considerations.....................55 12.1 Minimal requirements..................................55 12.2 High-assurance........................................55 13. References............................................57 13.1 Relevant SCSI (T10) Specifications....................57 10.2 Relevant Fibre Channel (T11) Specifications.........58 10.3 Relevant RFC Documents..............................58 10.4 Other Reference Documents...........................59 14. Author's Addresses....................................59 A. iFCP Support for Fibre Channel Link Services..........61 A.1 Basic Link Services...................................61 A.2 Link Services Processed Transparently.................61 A.3 Augmented Link Services...............................62 B. Performance of The Multi-Connection iFCP Session Model 64 B.1 Relationship of Throughput to Packet Losses...........64 B.2 Background............................................65 Full Copyright Statement.......................................67 Monia Standards Track 3 iFCP Revision 2 May 2001 1. Abstract This document specifies an architecture and gateway-to-gateway protocol for the implementation of Fibre Channel fabric functionality on a network in which TCP/IP switching and routing elements replace Fibre Channel components. The protocol enables the attachment of existing Fibre Channel storage products to an IP network by supporting the fabric services required by such devices. 2. About This Document 2.1 Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. All frame formats are in big endian network byte order. 2.2 Purpose of this document This is a standards-track document, which specifies a protocol for the implementation of Fibre Channel transport services on a TCP/IP network. Some portions of this document contain material from standards controlled by NCITS T10 and T11. This material is included here for informational purposes only. The authoritative information is given in the appropriate NCITS standards document. The authoritative portions of this document specify the protocol for mapping standards-compliant fibre Channel storage and adapter implementations to TCP/IP. This mapping includes sections of this document which describe the "iFCP Protocol" (see section 4). 3. iFCP Introduction iFCP is a gateway-to-gateway protocol, which provides Fibre Channel fabric services to FCP-based Fibre Channel devices over a TCP/IP network. iFCP uses TCP to provide congestion control, error detection and recovery. iFCP's primary objective is to allow interconnection and networking of existing Fibre Channel devices at wire speeds over an IP network. The protocol and method of frame translation described in this document permit the transparent attachment of Fibre Channel Monia Standards Track 4 iFCP Revision 2 May 2001 storage devices to an IP-based fabric by means of lightweight gateways. The protocol achieves this transparency through an address translation process that allows normal frame traffic to pass through the gateway directly, with provisions for intercepting and emulating the fabric services required by an FCP device. 3.1 Definitions Terms needed to clarify the concepts presented in this document are presented here. Address-translation mode û A mode of gateway operation in which the scope of N_PORT fabric addresses for locally attached devices are local to the iFCP gateway. Address-transparent mode û A mode of gateway operation in which the scope of N_PORT fabric addresses for all fibre channel devices are unique to the logical fabric to which the gateway belongs. Gateway Region û The portion of the storage network accessed through an iFCP gateway. Devices in the region consist of all fibre channel devices directly attached to the gateway. Logical Fabric û A collection of iFCP gateways configured to interoperate together in address-transparent mode. Fibre Channel Network - A native fibre channel fabric and all attached Fibre Channel devices. Fabric - The part of a Fibre Channel network that provides the transport services defined in the FC-FS specification. A fabric may be implemented in the IP framework by means of the architecture and protocols discussed in this document. FC-2 - The Fibre Channel transport services layer described in the FC-FS specification. FCP Portal - An IP-addressable entity representing the point at which a logical or physical iFCP device is attached to the IP network. N_PORT - An iFCP or Fibre Channel entity representing the interface to Fibre Channel device functionality. This interface implements the Fibre Channel N_PORT semantics specified in the FC-FS standard [FC-FS]. Monia Standards Track 5 iFCP Revision 2 May 2001 N_PORT fabric address - The address of an N_PORT within the Fibre Channel fabric. N_PORT Network Address - The address of an N_PORT in the IP fabric. This address consists of the IP address of the FCP Portal and the N_PORT ID of the directly- attached Fibre Channel device. F_PORT - The interface used by an N_PORT to access Fibre Channel fabric and fabric services functionality. iFCP - The protocol discussed in this document. Logical FCP Device - The abstraction representing a single Fibre Channel device as it appears on an iFCP network. iSNS - The protocol by which storage name services are implemented. Resolution of Fibre Channel network object names is provided by an iSNS name server. N_PORT Session - An association created when two N_PORTS have executed a PLOGI operation. It is comprised of the N_PORTs and TCP connection that carries traffic between them. iFCP Frame - The frame inserted into the TCP stream which contains the Fibre Channel frame and iFCP header. Port Login (PLOGI) - The Fibre Channel Extended Link Service (ELS) that establishes an N_PORT login session through the exchange of identification and operation parameters between an originating N_PORT and a responding N_PORT. DOMAIN_ID û The value contained in the high-order byte of a 24-bit N_PORT fibre channel address. 3.2 The iFCP Network Model The purpose of the iFCP protocol is to enable the implementation if Fibre Channel fabric functionality on an IP network in which IP components and technology replace the fibre channel infrastructure. The following diagram shows a Fibre Channel fabric with attached devices. These are connected to the fabric through N_PORT and F_PORT interfaces, whose behavior is specified in [FGS]. Within the Fibre Channel device domain, fabric-addressable entities consist of other N_PORTs and devices internal to the fabric that perform the fabric services defined in [FGS]. In Monia Standards Track 6 iFCP Revision 2 May 2001 this case, the N_PORT Fibre Channel addresses are 24-bit quantities that are unique within the scope of the FC fabric. N_PORTs that perform fabric services are assigned well-known addresses starting at the top end of the 24-bit Fibre Channel address space. Fibre Channel Network +--------+ +--------+ | FC | | FC | | Device | | Device | |........| |........| Fibre Channel | N_PORT |<------>| N_PORT | Device Domain +---+----+ +----+---+ ^ | | | +---+----+ +----+---+ | | F_PORT | | F_PORT | | ==========+========+========+========+============== | Fabric & | | | Fabric Services | v | | Fibre Channel +--------------------------+ Fabric Domain An iFCP Network with iFCP Gateways Fibre Channel Devices Fibre Channel Devices +--------+ +--------+ +--------+ +--------+ | FC | | FC | | FC | | FC | | Device | | Device | Fibre | Device | | Device | Fibre |........| |........| Channel |........| |........| Channel | N_PORT | | N_PORT |<--------->| N_PORT | | N_PORT | Device +---+----+ +---+----+ Traffic +----+---+ +----+---+ Domain | | | | ^ +---+----+ +---+----+ +----+---+ +----+---+ | | F_PORT | | F_PORT | | F_PORT | | F_PORT | | =+========+==+========+===========+========+==+========+========== | iFCP Layer |<--------->| iFCP Layer | | |....................| ^ |....................| | | FCP Portal | | | FCP Portal | v +--------+-----------+ | +----------+---------+ IP | Control | Fabric | Data | | | | | |<------Encapsulated Frames------->| | +------------------+ | | | | | +------+ IP Network +--------+ | | +------------------+ Monia Standards Track 7 iFCP Revision 2 May 2001 The above diagram shows the simplest implementation of an equivalent iFCP fabric. Two gateway regions are shown. Each consists of Fibre Channel devices directly connected to the iFCP fabric through F_PORTs implemented as part of the edge switch or gateway. Looking into the F_PORT on the Fibre Channel side of the gateway, the network appears as a Fibre Channel fabric. Here, the gateway presents remote N_PORTs as directly attached devices. Conversely, on the IP network side, the gateway presents each locally connected N_PORT as a logical fibre channel device. An important property of this gateway architecture is that the fabric configuration and topology within the gateway region are opaque to the IP network. That is, the topology in the fibre channel domain, whether it is loop- or switch-based, is hidden from the IP network and from other gateways. Consequently, support for such FC fabric topologies becomes a gateway implementation option. In such cases, the gateway incorporates whatever functionality is required to distil and present locally attached N_PORTs (or NL_PORTs) as logical iFCP devices. N_PORT to N_PORT communications that traverse a TCP/IP network require the intervention of the iFCP layer. This consists of the following operations: a) Execution of the frame addressing and mapping functions described in section 3.3. b) Execution of fabric-supplied link services addressed to one of the well-known Fibre Channel N_PORT addresses. c) Encapsulation of Fibre Channel frames for injection into the TCP/IP network and de-encapsulate Fibre Channel frames received from the TCP/IP network. d) Establishment of an N_PORT login session in response to a PLOGI directed to a remote device. The following sections discuss the frame addressing mechanism and the way in which it is used to achieve communications transparency between N_PORTs. 3.3 The N_PORT Addressing Model This section discusses the role of the N_PORT addressing model in the routing of frames between locally and remotely attached N_PORTs. Monia Standards Track 8 iFCP Revision 2 May 2001 In the case of a remote N_PORT, where the frame traffic must traverse the IP network, the gateway must perform this routing transparently with respect to the locally attached N_PORT. To provide such transparency, the gateway maintains an association between the fibre channel address of a remote N_PORT, as seen by a locally attached device, and the corresponding address of the remote device on the IP network. To establish this association the iFCP gateway assigns and manages fibre channel N_PORT fabric addresses as described in the following sections. The fabric address of an N_PORT device is a 24-bit value having the following format defined by the fibre channel specification [FCS]: Bit 23 16 15 8 7 0 +-----------+------------+----------+ | Domain ID | Area ID | Port ID | +-----------+------------+----------+ Fibre Channel Address Format Such addresses are volatile and subject to change based on modifications in the fabric configuration. In a fibre channel fabric, each switch element has a unique Domain I/D assigned by a master switch. The value of the Domain I/D ranges from 1 to 239 (0xEF). Each switch in turn controls a 65K block of addresses divided into area and port IDs. N_PORTs logging into the fabric receive a unique fabric address consisting of the switchÆs Domain I/D concatenated with switch-assigned area and port I/Ds. These N_PORT addresses are carried in the fibre channel frame as shown in the following diagram. Bit 31 24 23 0 +--------+-----------------------------------+ Word 0 | | Destination N_PORT Address (D_ID) | +--------+-----------------------------------+ Word 1 | | Source N_PORT Address (S_ID) | +--------+-----------------------------------+ . | | . | Control information | . | and Payload | Word 527 +--------------------------------------------+ (Max) Fibre Channel Address Fields within a Frame The D_ID and S_ID fields represent the fabric addresses of the source and destination N_PORTs respectively. Monia Standards Track 9 iFCP Revision 2 May 2001 In an iFCP storage fabric, the iFCP gateway replaces the FC switch element as the device responsible for N_PORT address assignment and frame routing. Unlike an FC switch, however, an iFCP gateway must route frames between N_PORTs within the gateway region or to external devices attached to remote gateways on the IP network. In order to be FC-compatible, the gateway must route such frames using only the embedded 24-bit address. By exploiting its control of address allocation and access to frame traffic entering or leaving the gateway region, it is able to achieve the necessary transparency. The gateway may allocate device addresses in one of two ways: a) Address Transparent Mode û A mode of address assignment in which several gateways collaborate to form a ælogical fabricÆ. Each gateway in control of a region is responsible for obtaining and distributing unique domain I/Ds from the address assignment authority as described in section 3.3.1.1. Consequently, within the scope of the logical fabric, the address of each N_PORT is unique. For that reason, gateway-assigned aliases are not required to represent remote N_PORTs. b) Address Translation Mode û A mode of address assignment in which the scope of all N_PORT device addresses, including remote devices, is local to each gateway region. The address of a remote device is represented by a gateway assigned N_PORT alias. All iFCP implementations MUST support operation in address translation mode. Support for address transparent mode is optional. The choice of addressing mode involves the tradeoffs between scalability, and transparency discussed below. The scalability constraints are a consequence of the Fibre Channel address allocation policy described above. As noted, an IP fabric using this address allocation scheme is limited to a combined total of 239 gateways and fibre channel switch elements. As the system expands, an IP fabric may consist of many switch elements distributed throughout the enterprise, each of which controls a small number of devices. In this case, the limitation in switch count may become a barrier to extending and fully integrating the storage network. Address Translation mode avoids this limitation by decoupling N_PORT fabric addresses from the constraints of fabric-wide address space management. Consequently, a virtually unlimited Monia Standards Track 10 iFCP Revision 2 May 2001 number of iFCP gateways, Fibre Channel devices and switch elements may be internetworked. This mode of address allocation also simplifies management of the IP storage fabric configuration by eliminating the need for a centralized address-assignment authority. A consequence of address translation mode is that the 24-bit N_PORT address is no longer unique across the storage network. As a result, when processing frame traffic to or from remote N_PORTs, the gateway must intervene to translate the 24-bit N_PORT addresses between the sending and receiving gateways. These address operations involve: a) Translating the N_PORT I/Ds in the frame header and b) Translating N_PORT I/Ds carried in the payload of certain extended link service messages. The process of N_PORT I/D translation for the frame header is described in section 3.3.2. The processing for link services with frame addresses in the payload is described in section 7.1. The details of the address transparent and address translation operational modes are discussed in the following sections. 3.3.1 Operation in Address Transparent Mode The use of address transparent mode is an alternative where address transparency is desired. In addition to the scalability limits discussed above, the following considerations and requirements pertain to this mode of operation: a) There is increased dependency on the services of a central address assignment authority, such as iSNS. If connectivity with the server is lost, new DOMAIN_ID values cannot be automatically allocated as gateways and fibre channel switch elements are added to the logical fabric. As a result, new gateways and switch elements cannot be automatically added to the ip fabric. Of course, it is always possible to add and manage such additional components manually. b) Coordination of iSNS servers is required. Multiple iFCP gateways set up with independently-administered address servers must be completely torn down and slaved under a single iSNS name server before they can be configured into the same logical fabric. In contrast, operation in address translation mode requires only that the independent iSNS servers import client attributes from other iSNS servers, Monia Standards Track 11 iFCP Revision 2 May 2001 before clients under different iSNS authorities can be made to interoperate. c) iFCP gateways in transparent mode will not interoperate with iFCP gateways that are not in transparent mode. d) When interoperating with locally attached Fibre Channel fabrics, the iFCP gateway MUST assume control of DOMAIN_ID assignments in accordance with the appropriate Fibre Channel standard or specification. As described in section 3.3.1.1, DOMAIN_ID values assigned to FC switches in attached fabrics must be issued by the iSNS server or manually assigned. e) When operating in address transparent Mode, no fibre channel address translation SHALL take place, and no link service Messages shall be augmented with additional information by the iFCP layer. The process for establishing the TCP/IP context associated with an N_PORT login session in this mode is similar to that specified for address translation mode (section 3.3.2). 3.3.1.1 Transparent Mode Domain I/D Management As described above, each gateway and fibre channel switch in a logical fabric must have a unique domain I/D. In a gateway region containing fibre channel switch elements, each element obtains a domain I/D by querying a master switch element as described in [FC-SW] -- in this case the iFCP gateway itself. The gateway in turn may obtain domain I/Ds on demand from a central address allocation authority, such as an iSNS name server or manually from a pre-assigned block of IDs. In that sense, the address authority (e.g., iSNS) assumes the role of master switch for the logical fabric. 3.3.1.2 Incompatibility with Address Translation Mode iFCP gateways in address transparent mode shall not originate or accept frames that do not have the TRN bit set to one in the iFCP flags field of the encapsulation header (see section 6.1). The iFCP gateway shall immediately terminate any N_PORT sessions with the iFCP gateway from which it receives such frames. 3.3.2 Operation in Address Translation Mode This section summarizes the process for modifying FC frame addresses embedded in the frame header. Monia Standards Track 12 iFCP Revision 2 May 2001 As described above, the iFCP gateway is responsible for assigning Fibre Channel N_PORT addresses to locally and remotely attached N_PORTs. For remotely attached N_PORTs, the gateway assigns an N_PORT alias used in place of the N_PORT address assigned by the remote gateway. To perform this function and enable the appropriate routing, the gateway builds and maintains a table that maps N_PORT aliases to the appropriate TCP/IP connection and N_PORT ID of all external N_PORTs. The gateway opportunistically builds the store of N_PORT addresses and TCP/IP connections for remotely attached devices in the IP fabric by: a) Intercepting name service requests issued by locally- attached N_PORTs as described below or, b) Intercepting incoming N_PORT login requests from external Fibre Channel devices and outgoing N_PORT login requests directed to remote N_PORTs. Such requests are used to establish the N_PORT login session as described in section 5.1. In response to name server requests, the iSNS server returns the IP address and N_PORT ID pair of the remote device. The IP address is mapped to the connection context. After saving the context and N_PORT ID, the iFCP layer creates the 24-bit N_PORT alias that is returned to the local N_PORT as the Fibre Channel address of the external device. 3.3.2.1 Translation Table Maintenance The contents of the gatewayÆs address translation tables are updated opportunistically, in response to the name service queries and PLOGI requests described previously. There is no need to invalidate entries in response to changes in the fabric configuration, since any potentially stale entries caused by such events are self-correcting as described below. Once a fabric has achieved steady-state operation, any event that causes a change in the fibre channel address of a device also causes the device to terminate all N_PORT sessions. In the process of resuming operation, the status of the device, including its new address, is reflected in the name serverÆs database. The new state of the device is advertised using the appropriate state change notifications. These, in turn, trigger the series of port login operations described below. For inbound PLOGI requests, the iFCP gateway simply updates the translation table, generates the N_PORT alias and forwards Monia Standards Track 13 iFCP Revision 2 May 2001 the request to the local N_PORT for processing as described above. For outbound requests, a fabric-attached fibre channel device usually precedes the PLOGI with a name server query to obtain the deviceÆs new N_PORT address. At this point, the iFCP gateway intercepts such a request, performs the necessary iSNS query, creates the translation table entry and returns the assigned N_PORT alias to the requester. After issuing the PLOGI, the N_PORT verifies that it has logged in with the expected device by checking the device name returned in the PLOGI response. An N_PORT that attempts to execute a PLOGI without first querying the name server is still required to confirm the device name as described above. 3.3.2.2 Frame Address Translation For outbound frames, the table of external N_PORT network addresses are referenced to map the Destination N_PORT alias and Source N_PORT ID to a TCP connection identifier and the N_PORT ID assigned by the remote gateway. The translation process for outbound frames is shown below. Monia Standards Track 14 iFCP Revision 2 May 2001 Raw Fibre Channel Frame +--------+-----------------------------------+ +--------------+ | | Destination N_PORT Alias |--->| Lookup TCP | +--------+-----------------------------------+ | connection | | | Source N_PORT ID |--->| and N_PORT ID| +--------+-----------------------------------+ +------+-------+ | | | TCP | Control information | | Conn | and Payload | | & +--------------------------------------------+ | N_PORT | ID | After Address Translation and TCP/IP Encapsulation | +--------------------------------------------+ Conn | | iFCP Encapsulation |<----------+ | Header | Context | +========+===================================+ | | | Destination N_PORT ID |<----------+ +--------+-----------------------------------+ | | Source N_PORT ID | +--------+-----------------------------------+ | | | Control information | | and Payload | +--------------------------------------------+ For inbound frames, the store regenerates the N_PORT alias from the TCP connection context and N_PORT ID contained in the encapsulated FC frame. The translation process for inbound frames is shown below. Monia Standards Track 15 iFCP Revision 2 May 2001 Network Format of Inbound Frame +--------------------------------------------+ Conn. +--------+ | iFCP Encapsulation Header |------>| N_PORT | | |Context| Alias | +========+===================================+ | Lookup | | | Destination N_PORT ID | | | +--------+-----------------------------------+ | | | | Source N_PORT ID |------>| | +--------+-----------------------------------+ +----+---+ | | |N_PORT | Control information | |Alias | and Payload | | +--------------------------------------------+ | | | | Frame after Address Translation and De-encapsulation | +--------+-----------------------------------+ | | | Destination N_PORT ID | | +--------+-----------------------------------+ | | | Source N_PORT Alias |<-----------+ +--------+-----------------------------------+ | | | Control information | | and Payload | +--------------------------------------------+ 3.3.2.3 Incompatibility with Address Transparent Mode iFCP gateways in address translation mode shall not originate or accept frames that have the TRN bit set to one in the iFCP flags field of the encapsulation header. The iFCP gateway shall immediately abort any N_PORT login sessions with the iFCP gateway from which it receives such frames as described in section 5.2.2.2. 3.4 iFCP Layered Services The following diagram shows the functional layers for host devices that support FCP. As shown, iFCP provides a set of layered services that transparently provide the transport services required by FCP devices. Using the iFCP framework, any existing host FCP implementation will execute with no modifications required. The iFCP protocol layer consists of the data transport services and iFCP-specific Link Services. This layer provides transport services specific to Fibre Channel devices as specified in [FC-PH], [FC-PH-2], and [FC-PH-3]. Monia Standards Track 16 iFCP Revision 2 May 2001 This is illustrated in the following diagram, which shows the IP Fabric consisting of the TCP/IP network and the iFCP Layer. The IP Fabric provides the transport services for FCP, and is a direct replacement for the transport services provided by a Fibre Channel fabric. Meanwhile, the components in the Fibre Channel Device Domain remain unchanged. +---------------------------------------+ - - - - - - - | Storage & Backup Applications | +---------------------------------------+ | Operating System | Application +--------------------+ | Layer | SCSI | | +--------------------+ | - - - - - - - | FCP | | FC-4 Layer +------------+-------+------------------+ - - - - - - - | | Link Services | | +--------------------------+ FC-2 Layer ^ | | | | N_PORT - F_PORT Interface | Fibre Channel | | Device Domain <=============================================================> | | IP Fabric | iFCP Data Transport Service | | | | v | +---------------+ | |iFCP Specific | iFCP Layer | |Link Services | +-----------------------+---------------+ - - - - - - | | | TCP | Transport | | Layer +---------------------------------------+ - - - - - - | | | IP | Network | | Layer +---------------------------------------+ - - - - - - | | | Physical Transport | Link Layer | | +---------------------------------------+ - - - - - - In the figure shown above, each layer leverages the services of the layer below it. 3.4.1 Application Layer This includes the operating system, Storage and Backup applications, and the SCSI driver. This layer interfaces with FCP and Link Services in the FC-2 and FC-4 layers. Monia Standards Track 17 iFCP Revision 2 May 2001 3.4.2 FC-4 Layer (FCP) FCP is the Fibre Channel FC-4 layer application protocol used to communicate with devices implementing the SCSI-3 command set and architectural model. Basically, FCP divides each SCSI I/O operation into a series of information units to be transferred between the initiator and target. 3.4.3 FC-2 Layer The FC-2 Layer provides the facilities for Link Services and transfer of Fibre Channel information units as described below. 3.4.3.1 Link Service Messages Fibre Channel defines a series of link services defined in Fibre Channel Physical and Signaling Interface specification (FC-PH, FC-PH-2, FC-PH-3). These Link Service Messages provide a set of defined functions that allow a Fibre Channel port to send control information, or to request another port to perform a specific function. Some Link Service messages reference services provided internally within the Fibre Channel fabric. 3.4.3.2 N_PORT Interface This is an interface which provides access to Fibre Channel device functionality. The N_PORT interface is responsible for segmentation and reassembly of information units from Fibre Channel frames. 3.4.3.3 F_PORT Interface This is the interface through which the N_PORT accesses the Fibre Channel fabric. 3.4.4 iFCP Layer The iFCP layer provides three essential services for FCP-based storage products: a) Transport of Fibre Channel frames and Link Service messages between N_PORTs b) Support for special Link Service messages needed by iFCP to manage the transmission of storage data on a IP network. c) Augmentation of some Link Service messages with additional data needed in the iFCP environment. Monia Standards Track 18 iFCP Revision 2 May 2001 The iFCP layer maps Fibre Channel frames to a predetermined TCP connection for transport. Additionally, many link service messages can similarly be transported without modification over a TCP connection. 4. iFCP Protocol 4.1 Overview 4.1.1 iFCP Transport Services The iFCP transport services map the Fibre Channel frames comprising each FCP IU and Link Service message to a predetermined TCP connection for transport across an IP network. When receiving FCP-based storage data from the network, the iFCP layer transports, and delivers each resulting frame to the appropriate N_PORT. Except for the augmented ELS requests in section 7.1, the iFCP layer never interprets the contents of the frame payload. For incoming iFCP frames with control data, iFCP interprets the augmented information, modifies the frame content accordingly, and may forward the resulting frame to the N_PORT for further processing. For out-bound Fibre Channel frames that require control data, the iFCP layer creates the augmented information based on frame content, modifies the frame content, then transmits the resulting Fibre Channel frame with augmented data through the appropriate TCP connection. 4.1.2 iFCP Support for Link Services Some Link Service messages contain N_PORT addresses in the payload. When a gateway operating in address translation mode encounters such messages, it will augment the information in the payload by adding additional information. The receiving gateway will reference the augmented information in order to reconstruct the original Link Service message. The reconstructed frames are then forwarded to the receiving N_PORT for further processing. Section 7.1 describes augmented Link Services in detail. 4.2 Mandatory FC-2 Functionality [To be specified] 4.3 FC-2 Functionality Not Supported [To be specified] Monia Standards Track 19 iFCP Revision 2 May 2001 4.4 Optional FC-2 Functionality [To be specified] 5. TCP Stream Transport of iFCP Frames TCP connections MAY be established between FCP_Portals that have discovered each other through a naming service or through manual configuration. If a TCP connection is not maintained between the FCP_Portals, then a change in the status of remote N_PORTs must be discovered through a central name server authority. Multiple TCP connections may exist between pairs of FCP Portals. Such connections are either "bound" or "unbound". An unbound connection is a TCP connection that is not actively supporting an N_PORT login session. Pre-existing TCP connections between FCP Portals remain unbound and uncommitted until a CBIND message (see section 7.2.2) has been transmitted through them. When the iFCP layer detects a Port Login (PLOGI) message creating a login session between a pair of N_PORTs, it will select an existing unbound TCP connection or establish a new TCP connection, and send the CBIND message down that TCP connection. This allocates the TCP connection to that PLOGI login session. A TCP connection may not be bound to more than one N_PORT login session. 5.1 TCP Session Model iFCP uses a single TCP connection to transport all Fibre Channel frames between unique pairs of N_PORTs. A TCP connection may be used by one and only one N_PORT login session. 5.2 IFCP Session Management This section describes the protocols for establishing and terminating an N_PORT login session. One and only one N_PORT login session SHALL exist between an N_PORT pair. 5.2.1 Creating an N_PORT Login Session The gateway SHALL initiate the creation of an N_PORT login session in response to a PLOGI ELS directed to a remote N_PORT from a locally attached N_PORT as described in the following steps. a) Allocate a TCP connection to the remote gateway. An existing connection in the Unbound state may be used or a Monia Standards Track 20 iFCP Revision 2 May 2001 new connection may be created and placed in the Unbound state. b) If a connection cannot be allocated or created, the gateway SHALL terminate the PLOGI with an LS_RJT response. The Reason Code field in the LS_RJT message shall be set to 0x09 (Unable to Perform Command Request) and the Reason Explanation SHALL be set to 0x29 (Insufficient Resources to Support Login). c) If an N_PORT login session already exists to the remote N_PORT, the gateway SHALL forward the PLOGI ELS using the existing session. d) If the N_PORT login session does not exist, the gateway SHALL issue a CBIND session control message (see section 0) to allocate the connection and SHALL place the connection in the Bound state if successful. e) In the event that the CBIND message fails, the PLOGI shall be terminated with an LS_REJ message. Depending on the CBIND failure status, the Reason Code and Reason Explanation SHALL be set as shown in the following table. CBIND Failure LS_RJT Reason LS_RJT Reason Code Status Code Explanation ------------- ------------- ------------------ Unspecified Unable to Perform No additional Reason (0x10) Command Request explanation (0x00) (0x0D) No Such Device Unable to Perform Invalid N_PORT Name (0x11) Command Request (0x0D). (0x0D) N_PORT Login Unable to Perform Invalid N_PORT Name Session Already Command Request (0x0D). Exists (0x12) (0x0D) Lack of Unable to Perform Insufficient Resources (0x13) Command Request Resources to Support (0x0D). Login (0x29). 5.2.2 Terminating an N_PORT Login Session An N_PORT login session SHALL be terminated or aborted in response to one of the following events: a) An LS_RJT response is returned to the gateway that issued the PLOGI ELS. The gateway shall forward the LS_RJT to Monia Standards Track 21 iFCP Revision 2 May 2001 the local N_PORT and complete the session as described in section 5.2.2.1. b) An ACC received from a remote device in response to a LOGO. The gateway shall forward the ACC to the local N_PORT and complete the session as described in section 5.2.2.1. c) For an FC frame received from the IP network, a gateway detects a CRC error in the encapsulation header. The gateway shall abort the session as described in section 5.2.2.2. d) The TCP connection associated with the login session fails for any reason. The gateway detecting the failed connection shall abort the session as described in section 5.2.2.2. 5.2.2.1 N_PORT Login Session Completion An N_PORT login session is completed in response to a rejected PLOGI request or a successful LOGO ELS. The gateway receiving one of the above responses shall issue an Unbind session control ELS as described in section 8.2. An Unbind error shall be considered a fatal gateway error. In response to the Unbind message, either gateway may choose to close the connection or return it to the pool of unbound connections. 5.2.2.2 Aborting an N_PORT Login Session An N_PORT login session SHALL be aborted if the TCP connection is spontaneously terminated or for any other reason described in this specification. In any event, the TCP connection shall be closed. If the local N_PORT has logged in to the remote N_PORT, the gateway SHALL send a LOGO to the local N_PORT. 5.3 TCP Port Numbers An FCP Portal uses a single port number to receive TCP connection requests for iFCP over TCP. All TCP connections established between FCP Portals must be directed to the registered well known port number assigned by the IANA. An FCP Portal may use any TCP port number consistent with its implementation of the TCP/IP stack to initiate a TCP connection, but each port number must be unique. Monia Standards Track 22 iFCP Revision 2 May 2001 6. Encapsulation of Fibre Channel Frames This section describes the iFCP encapsulation of Fibre Channel frames. The encapsulation is based on the common encapsulation format defined in [ENCAP]. The format of an encapsulated frame is shown below: +--------------------+ | Header | +--------------------+-----+ | SOF | f | +--------------------+ F r | | FC frame content | C a | +--------------------+ m | | EOF | e | +--------------------+-----+ Encapsulation Format As shown, the encapsulation consists of a 7-word header, an SOF delimiter word, the FC frame (including the fibre channel CRC), and an EOF delimiter word. The header and delimiter formats are described in the following sections. 6.1 Encapsulation Header Format W|------------------------------Bit------------------------------| o| | r|3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 | d|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0| +---------------+---------------+---------------+---------------+ 0| Protocol# | Version | -Protocol# | -Version | +---------------+---------------+---------------+---------------+ 1| Reserved (must be zero) | +---------------+---------------+---------------+---------------+ 2| LS_COMMAND | iFCP Flags | SOF | EOF | +-----------+---+---------------+-----------+---+---------------+ 3| Flags | Frame Length | -Flags | -Frame Length | +-----------+-------------------+-----------+-------------------+ 4| Time Stamp [integer] | +---------------------------------------------------------------+ 5| Time Stamp [fraction] | +---------------------------------------------------------------+ 6| CRC | +---------------------------------------------------------------+ Common Encapsulation Fields: Monia Standards Track 23 iFCP Revision 2 May 2001 Protocol# IANA-assigned protocol number identifying the protocol using the encapsulation. For iFCP the value is (/TBD/). Version Encapsulation version -Protocol# Ones complement of the protocol# -Version Ones complement of the version Flags Encapsulation flags (see 6.1.1) Frame Length Contains the length of the entire FC Encapsulated frame including the FC Encapsulation Header and the FC frame (including SOF and EOF words) in units of 32-bit words. -Flags Ones-complement of the Flags field. -Frame Length Ones-complement of the Frame Length field. Time Stamp [integer] Integer component of the frame time stamp in SNTP format [SNTP]. Time Stamp Fractional component of the time stamp [fraction] in SNTP format [SNTP]. CRC Header CRC. MUST be valid for iFCP. The time stamp fields are used to enforce the limit on the lifetime of a fibre channel frame as described in section 9.2.2.1. iFCP-specific fields: Monia Standards Track 24 iFCP Revision 2 May 2001 LS_COMMAND For an augmented ELS ACC response, the LS_COMMAND field SHALL contain bits 31 through 24 of the LS_COMMAND to which the ACC applies. Otherwise the LS_COMMAND field shall be set to zero. IFCP Flags IFCP-specific flags (see below) SOF Copy of the SOF delimiter encoding (see section 6.2) EOF Copy of the EOF delimiter encoding (see section 6.2) The iFCP flags word has the following format: |------------------------Bit----------------------------| | | | 23 22 21 20 19 18 17 16 | +------+------+------+------+------+------+------+------+ | CPL | Reserved | SES | TRN | AUG | +------+------+------+------+------+------+------+------+ iFCP Flags: CPL Compliance level: 1 = Encapsulation complies with draft standard or RFC of iFCP or the encapsulation specification 0 = Encapsulation complies with standards track version of iFCP or the encapsulation specification SES 1 = Session control frame (TRN and AUG MUST be 0) TRN 1 = Address transparent mode enabled 0 = Address translation mode enabled AUG 1 = Augmented frame. 6.1.1 Common Encapsulation Flags The iFCP usage of the common encapsulation flags is shown below: Monia Standards Track 25 iFCP Revision 2 May 2001 |------------------------Bit--------------------------| | | | 31 30 29 28 27 26 | +--------------------------------------------+--------+ | Reserved | CRCV | +--------------------------------------------+--------+ For iFCP, the CRC field MUST be valid and CRCV MUST be set to one. 6.2 SOF and EOF Delimiter Fields The format of the delimiter fields is shown below. W|------------------------------Bit------------------------------| o| | r|3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 | d|1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0| +---------------+---------------+-------------------------------+ 0| SOF | SOF | -SOF | -SOF | +---------------+---------------+-------------------------------+ 1| | +----- FC frame content -----+ | | +---------------+---------------+-------------------------------+ n| EOF | EOF | -EOF | -EOF | +---------------+---------------+-------------------------------+ FC Frame Encapsulation Format SOF (bits 31-24 and bits 23-16 in word 0): The SOF fields contain the encoded SOF value selected from the table below. +-------+----------+ | FC | | | SOF | SOF Code | +-------+----------+ | SOFf | 0x28 | | SOFi2 | 0x2D | | SOFn2 | 0x35 | | SOFi3 | 0x2E | | SOFn3 | 0x36 | +-------+----------+ Translation of FC SOF values to SOF field contents -SOF (bits 15-8 and 7-0 in word 0): The -SOF fields contain the ones complement of the value in the SOF fields. Monia Standards Track 26 iFCP Revision 2 May 2001 EOF (bits 31-24 and 23-16 in word n): The EOF fields contain the encoded EOF value selected from the table below. +-------+----------+ | FC | | | EOF | EOF Code | +-------+----------+ | EOFn | 0x41 | | EOFt | 0x42 | | EOFni | 0x49 | | EOFa | 0x50 | +-------+----------+ Translation of FC EOF values to EOF field contents -EOF (bits 15-8 and 7-0 in word n): The -EOF fields contain the one's complement of the value in the EOF fields. iFCP implementations shall place a copy of the SOF and EOF delimiter codes in the appropriate header fields. 6.3 Frame Encapsulation and De-encapsulation When encapsulating a frame, the frame originator MUST fill in the header and the SOF and EOF delimiter words as specified above. The receiving gateway SHALL perform de-encapsulation as follows: Upon receiving the encapsulated frame, the gateway SHALL check the header CRC. If the CRC is invalid, the gateway SHALL terminate the N_PORT login session as described in section 5.2.2.2. If the CRC is valid, any additional header validity checks are optional. If a header check is unsuccessful, the N_PORT login session SHALL be terminated as described in section 5.2.2.2. After header validation, the receiving gateway MAY generate the FC frame delimiters by: a) Using the EOF and SOF codes in the encapsulation header or b) By referencing the SOF and EOF delimiter words. If the EOF and SOF delimiter words are used, the gateway must validate the delimiter contents by verifying that the code is legal and that the delimiter contents conform to the formats shown above. If an invalid delimiter is detected, the gateway SHALL terminate the N_PORT login session as specified in section5.2.2.2. Monia Standards Track 27 iFCP Revision 2 May 2001 If the EOF and SOF in the header are used, the gateway MAY ignore the contents of the delimiter words. The gateway SHOULD verify that the SOF and EOF codes in the header are legal. If an illegal code is detected, the gateway shall terminate the N_PORT login session as specified in section 5.2.2.2. After validating the encapsulation, the receiving gateway MAY verify the frame propagation delay as described in section 9.2.2.1. 7. Link Services Link services provide a set of functions that allow a port to send control information or request another port to perform a specific function. Each Link Service message (response and reply) is carried by a Fibre Channel sequence, and can be segmented into multiple frames. The iFCP Layer is responsible for transporting Link Service messages across the IP fabric. This includes mapping Link Service messages appropriately from the domain of the Fibre Channel transport to that of the IP network. This process may involve manipulation of field values as the Link Service message travels to and from the IP and Fibre Channel fabrics. It also may also require the inclusion of augmented data by the iFCP layer. Each link service or extended link service is processed according to one of the following rules: a) Transparent û The link service message and reply MUST be transported to the receiving N_PORT by the iFCP gateway without altering the message payload. The link service message and reply are not processed by the iFCP implementation. b) Augmented - Applies to an extended link service reply or request containing fibre channel addresses in the payload or requiring other special processing by the iFCP implementation. The processing for augmented link services is described in this section. c) Rejected û When issued by a directly attached N_PORT, the specified link service request MUST be rejected by the iFCP implementation. The gateway SHALL respond to a rejected link service message by returning an LS_RJT response with a Reason Code of 0x0B (Command Not Supported) and a Reason Code Explanation of 0x0 (No Additional Explanation). Monia Standards Track 28 iFCP Revision 2 May 2001 This section describes the processing for augmented link services, including the manner in which augmentation data is transmitted over the IP network. Appendix A enumerates all link services and the iFCP processing policy that applies to each. 7.1 Augmented Link Service Messages Augmentation applies to extended link service requests that require the intervention of the iFCP layer. Such intervention is required in order to: a) Service any ELS that requires special handling, such as a PLOGI. b) In address translation mode only, service any ELS which has an N_PORT address in the payload. Such ELS messages are transmitted in a fibre channel frame having the following format: Word 31<-bit>24 23<------------------Bit---------------------->0 +----------+------------------------------------------------+ 0| R_CTL | D_ID | | [22] | [Destination of extended link Service request] | +----------+------------------------------------------------+ 1| CS_CTL | S_ID | | | [Source of extended link service request] | +----------+------------------------------------------------+ 2| TYPE | F_CTL | +----------+------------------+-----------------------------+ 3| SEQ_ID | DF_CTL | SEQ_CNT | +----------+------------------+-----------------------------+ 4| OX_ID | RX_ID | +-----------------------------+-----------------------------+ 5| Parameter | | [ 00 00 00 00 ] | +-----------------------------------------------------------+ 6| LS_COMMAND | | [Extended Link Service Command Code] | +-----------------------------------------------------------+ 7| | .| Additional Service Request Parameters | .| ( if any ) | n| | +-----------------------------------------------------------+ Format of ELS Frame Monia Standards Track 29 iFCP Revision 2 May 2001 7.2 Augmented Link Services Requiring Payload Address Translation This section describes the handling for ELS frames containing N_PORT addresses in the ELS payload. Such addresses SHALL only be translated when the gateway is operating in address translation mode. When operating in address transparent mode, these addresses SHALL NOT be translated and such ELS messages SHALL not be sent as augmented frames unless other special processing is required. Supplemental data includes information required by the receiving gateway to convert an N_PORT address in the payload to an N_PORT address in the receiving gatewayÆs address space. The following rules define the manner in which such supplemental data is packaged and referenced. For an N_PORT address field, the gateway originating the frame MUST set the value in the payload to identify the address translation type as follows: 0x00 00 00 û The gateway receiving the frame from the IP network MUST reference the augmentation data to set the field contents as described below. The augmentation information is the 64-bit world wide identifier of the N_PORT as set forth in the fibre channel specification. If not otherwise part of the ELS, this information MUST be appended as described below. This translation type SHALL NOT be used when the address to be converted corresponds to that of the frame originator or recipient. 0x00 00 01 û The gateway receiving the frame from the IP network MUST replace the contents of the field with the N_PORT alias of the frame originator. This translation type MUST be used when the address to be converted is that of the source N_PORT. 0x00 00 02 û The gateway receiving the frame from the IP network MUST replace the contents of the field with the N_PORT I/D of the destination N_PORT. This translation type MUST be used when the address to be converted is that of the destination N_PORT Since fibre channel addressing rules prohibit the assignment of fabric addresses with a domain I/D of 0, the above codes will never correspond to valid N_PORT fabric IDs. For translation type 0, the receiving gateway SHALL obtain the information needed to fill in the ELS field by converting the specified N_PORT world-wide identifier to a gateway IP address and N_PORT ID. This information MUST be obtained through a name server query. If the N_PORT is locally attached, the Monia Standards Track 30 iFCP Revision 2 May 2001 gateway MUST fill in the field with the N_PORT ID. If the N_PORT is remotely attached, the gateway MUST assign and fill in the field with an N_PORT alias. If an N_PORT alias has already been assigned, it MUST be reused. In the event that the sending gateway cannot obtain the world wide identifier of an N_PORT, or a receiving gateway cannot obtain the IP address and N_PORT ID, the gateway detecting the error SHALL terminate the request with an LS_RJT message as described in [FCS]. The Reason Code SHALL be set to 0x07 (protocol error) and the Reason Explanation SHALL be set to 0x1F (Invalid N_PORT identifier). [EditorÆs note: Such errors, when detected by the receiving gateway, may be indicative of a serious problem requiring a more drastic response. Therefore, this section should be regarded as tentative.] Supplemental data is sent with the ELS request or ACC frames in one of the following ways: a) By appending the necessary data to the end of the ELS frame. b) By extending the sequence through the addition of additional frames. In the first case, a new frame SHALL be created whose length includes the supplemental data. The procedure for extending the ELS sequence with additional frames is /TBS/. After applying the supplemental data, the receiving gateway SHALL forward the resulting ELS to the destination N_PORT with the supplemental information removed. When the ACC response must be augmented, the receiving gateway must act as a proxy for the originator, retaining the state needed to process the response from the N_PORT to which the request was directed. 7.3 Augmented Link Services The following Link Service Messages must receive special processing or be supplemented with additional control data. When the iFCP header encapsulates one of these Extended Link Service messages in the iFCP payload, the AUG bit must be set to one in the iFCP FLAGS field as specified in section 6.1 and the supplemental data (if any) must be appended as described in the following section. An ELS ACC frame that is augmented must be similarly formatted. Monia Standards Track 31 iFCP Revision 2 May 2001 Link Service Message LS_COMMAND Mnemonic -------------------- ---------- -------- Abort Exchange 0x06 00 00 00 ABTX Discover Address 0x52 00 00 00 ADISC Discover Address Accept 0x02 00 00 00 ADISC ACC FC Address Resolution Protocol 0x55 00 00 00 FARP-REPLY Reply FC Address Resolution Protocol 0x54 00 00 00 FARP-REQ Request Logout 0x05 00 00 00 LOGO Port Login 0x30 00 00 00 PLOGI Read Exchange Concise 0x13 00 00 00 REC Read Exchange Concise Accept 0x02 00 00 00 REC ACC Read Exchange Status Block 0x08 00 00 00 RES Read Exchange Status Block 0x02 00 00 00 RES ACC Accept Read Link Error Status Block 0x0F 00 00 00 RLS Read Sequence Status Block 0x09 00 00 00 RSS Reinstate Recovery Qualifier 0x12 00 00 00 RRQ Request Sequence Initiative 0x0A 00 00 00 RSI Third Party Process Logout 0x24 00 00 00 TPRLO The formats of each augmented ELS, including supplemental data where applicable, are shown in the following sections. Each ELS diagram shows the basic format, as specified in the applicable FC standard, followed by supplemental data as shown below. +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | LS_COMMAND | +------+------------+------------+-----------+----------+ | 1 | | | . | | | . | ELS Payload | | | | | n | | +======+============+============+===========+==========+ | n+1 | | | . | Supplemental Data | | . | (if any) | | n+k | | +======+================================================+ ELS Diagram (single FC Frame Format) 7.3.1 Abort Exchange (ABTX) ELS Format: Monia Standards Track 32 iFCP Revision 2 May 2001 +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x6 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | RRQ Status | Exchange Originator S_ID | +------+------------+------------+-----------+----------+ | 2 | OX_ID of Tgt exchange | RX_ID of tgt exchange| +------+------------+------------+-----------+----------+ | 3-10 | Optional association header (32 bytes | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data Address Translation Type (see (type 0 only) ------------------- section 7.2) ------------ ----------- Exchange Originator 1, 2 N/A S_ID Other Special Processing: None 7.3.2 Discover Address (ADISC) Format of ADISC ELS: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x52 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Reserved | Hard address of ELS Originator | +------+------------+------------+-----------+----------+ | 2-3 | Port Name of Originator | +------+------------+------------+-----------+----------+ | 4-5 | Node Name of originator | +------+------------+------------+-----------+----------+ | 6 | Rsvd | N_PORT I/D of ELS Originator | +======+============+============+===========+==========+ Monia Standards Track 33 iFCP Revision 2 May 2001 Fields Requiring Translation Supplemental Data Address Translation Type (see (type 0 only) ------------------- section 7.2) ------------ ------------- N_PORT I/D of ELS 1 N/A Originator Other Special Processing: The Hard Address of the ELS originator shall be set to 0. 7.3.3 Discover Address Accept (ADISC ACC) Format of ADISC ACC ELS: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x20 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Reserved | Hard address of ELS Originator | +------+------------+------------+-----------+----------+ | 2-3 | Port Name of Originator | +------+------------+------------+-----------+----------+ | 4-5 | Node Name of originator | +------+------------+------------+-----------+----------+ | 6 | Rsvd | N_PORT I/D of ELS Originator | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data Address Translation Type (see (type 0 only) ------------------- section 7.2) ------------ ------------ N_PORT I/D of ELS 1 N/A Originator Other Special Processing: The Hard Address of the ELS originator SHALL be set to 0. 7.3.4 FC Address Resolution Protocol Reply (FARP- REPLY) Monia Standards Track 34 iFCP Revision 2 May 2001 The FARP-REPLY ELS is used in conjunction with the FARP-REQ ELS (see section 7.3.5) to perform the address resolution services required by the FC-VI protocol [FC-VI]. Format of FARP-REPLY ELS: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x55 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Match Addr | Requesting N_PORT Identifier | | | Code Points| | +------+------------+------------+-----------+----------+ | 2 | Responder | Responding N_PORT Identifier | | | Action | | +------+------------+------------+-----------+----------+ | 3-4 | Requesting N_PORT Port_Name | +------+------------+------------+-----------+----------+ | 5-6 | Requesting N_PORT Node_Name | +------+------------+------------+-----------+----------+ | 7-8 | Responding N_PORT Port_Name | +------+------------+------------+-----------+----------+ | 9-10 | Responding N_PORT Node_Name | +------+------------+------------+-----------+----------+ | 11-14| Requesting N_PORT IP Address | +------+------------+------------+-----------+----------+ | 15-18| Responding N_PORT IP Address | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data Address Translation Type (see (type 0 only) ------------------- section 7.2) ----------------- ------------- Requesting N_PORT 2 N/A Identifier Responding N_PORT 1 N/A identifier Other Special Processing: None. Monia Standards Track 35 iFCP Revision 2 May 2001 7.3.5 FC Address Resolution Protocol Request (FARP- REQ) The FARP-REQ ELS is used to in conjunction with the FC-VI protocol [FC-VI] to perform IP and FC address resolution in an FC fabric. The FARP-REQ ELS is usually directed to the fabric broadcast server at well-known address 0xFF FF FF for retransmission to all attached N_PORTs. Section 10.1 describes the iFCP implementation of FC broadcast server functionality in an iFCP fabric. Format of FARP_REQ ELS: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x54 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Match Addr | Requesting N_PORT Identifier | | | Code Points| | +------+------------+------------+-----------+----------+ | 2 | Responder | Responding N_PORT Identifier | | | Action | | +------+------------+------------+-----------+----------+ | 3-4 | Requesting N_PORT Port_Name | +------+------------+------------+-----------+----------+ | 5-6 | Requesting N_PORT Node_Name | +------+------------+------------+-----------+----------+ | 7-8 | Responding N_PORT Port_Name | +------+------------+------------+-----------+----------+ | 9-10 | Responding N_PORT Node_Name | +------+------------+------------+-----------+----------+ | 11-14| Requesting N_PORT IP Address | +------+------------+------------+-----------+----------+ | 15-18| Responding N_PORT IP Address | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data Address Translation Type (see (type 0 only) ------------------- section 7.2) ----------------- ----------- Requesting N_PORT 0 Requesting N_PORT Identifier Port Name Responding N_PORT 1 N/A Identifier Other Special Processing: Monia Standards Track 36 iFCP Revision 2 May 2001 None. 7.3.6 Logout (LOGO) ELS Format: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x5 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Rsvd | N_PORT I/D being logged out | +------+------------+------------+-----------+----------+ | 2-3 | Port name of the LOGO originator (8 bytes) | +======+============+============+===========+==========+ This ELS shall always be sent as an augmented ELS regardless of the translation mode in effect. Fields Requiring Translation Supplemental Data Address Translation Type(see (type 0 only) ------------------- section 7.2) -------------- ----------- N_PORT I/D Being 0, 1 or 2 Port Name of LOGO Logged Out Originator Other Special Processing: See section 5.2.2.1. 7.3.7 Port Login (PLOGI) PLOGI provides the mechanism for establishing a login session between two N_PORTs. The PLOGI request carries information identifying the originating N_PORT, including specification of its capabilities and limitations. If the destination N_PORT accepts the login request, it sends an accept (an ACC frame with PLOGI payload), specifying its capabilities and limitations. This exchange establishes the operating environment for the two N_PORTs. The following figure is duplicated from FC-PH, and shows the PLOGI message format for both request and accept (ACC) response. A port will reject a PLOGI request by transmitting an LS_RJT message, which contains no payload. Monia Standards Track 37 iFCP Revision 2 May 2001 Byte Offset +----------------------------------+ 0 | LS_COMMAND | 4 Bytes +----------------------------------+ 4 | COMMON SERVICE PARAMETERS | 16 Bytes +----------------------------------+ 20 | PORT NAME | 8 Bytes +----------------------------------+ 28 | NODE NAME | 8 Bytes +----------------------------------+ 36 | CLASS 1 SERVICE PARAMETERS | 16 Bytes +----------------------------------+ 52 | CLASS 2 SERVICE PARAMETERS | 16 Bytes +----------------------------------+ 68 | CLASS 3 SERVICE PARAMETERS | 16 Bytes +----------------------------------+ 86 | CLASS 4 SERVICE PARAMETERS | 16 Bytes +----------------------------------+ 102 | VENDOR VERSION LEVEL | 16 Bytes +----------------------------------+ Total Length = 116 bytes Details on the above fields, including common and class-based service parameters, can be found in [FC-PH]. The above PLOGI message is transported by the iFCP layer without modification. [EditorÆs note: The service parameter details that apply to an iFCP environment are /TBS/.] 7.3.8 Read Exchange Concise ELS Format: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x13 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Rsvd | Exchange Originator S_ID | +------+------------+------------+-----------+----------+ | 2 | OX_ID | RX_ID | +======+============+============+===========+==========+ | 3-4 |Port name of the exchange originator (8 bytes) | | | (present only for translation type 0) | +======+============+============+===========+==========+ Monia Standards Track 38 iFCP Revision 2 May 2001 Fields Requiring Translation Supplemental Data Address Translation Type(see (type 0 only) ------------------- section 7.2) ------------------ ----------- Exchange Originator 0, 1 or 2 Port Name of the S_ID Exchange Originator Other Special Processing: None. 7.3.9 Read Exchange Concise Accept Format of ACC Response: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Acc = 0x02 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | OX_ID | RX_ID | +------+------------+------------+-----------+----------+ | 2 | Rsvd | Exchange Originator N_PORT ID | +------+------------+------------+-----------+----------+ | 3 | Rsvd | Exchange Responder N_PORT ID | +------+------------+------------+-----------+----------+ | 4 | Data Transfer Count | +------+------------+------------+-----------+----------+ | 5 | Exchange Status | +======+============+============+===========+==========+ | 6-7 |Port name of the Exchange Originator (8 bytes) | +======+============+============+===========+==========+ | 8-9 |Port name of the Exchange Responder (8 bytes) | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data Address Translation Type(see (type 0 only) ------------------- section 7.2) ------------------ ----------- Exchange Originator 0, 1 or 2 Port Name of the N_PORT I/D Exchange Originator Exchange Responder 0, 1 or 2 Port Name of the N_PORT I/D Exchange Responder Monia Standards Track 39 iFCP Revision 2 May 2001 When supplemental data is required, the ELS shall be always be extended by 4 words as shown above. Unused words in the extended fields SHALL be set to 0. Other Special Processing: None. 7.3.10 Read Exchange Status Block (RES) ELS Format: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x13 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Rsvd | Exchange Originator S_ID | +------+------------+------------+-----------+----------+ | 2 | OX_ID | RX_ID | +------+------------+------------+-----------+----------+ | 3-10 | Association header (may be optionally reqÆd) | +======+============+============+===========+==========+ | 11-18| Port name of the Exchange Originator (8 bytes) | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data Address Translation Type(see (type 0 only) ------------------- section 7.2) ------------------ ----------- Exchange Originator 0, 1 or 2 Port Name of the S_ID Exchange Originator Other Special Processing: None. 7.3.11 Read Exchange Status Block Accept Format of ELS Accept Response: Monia Standards Track 40 iFCP Revision 2 May 2001 +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Acc = 0x02 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | OX_ID | RX_ID | +------+------------+------------+-----------+----------+ | 2 | Rsvd | Exchange Originator N_PORT ID | +------+------------+------------+-----------+----------+ | 3 | Rsvd | Exchange Responder N_PORT ID | +------+------------+------------+-----------+----------+ | 4 | Exchange Status Bits | +------+------------+------------+-----------+----------+ | 5 | Reserved | +------+------------+------------+-----------+----------+ | 6ûn | Service Parameters and Sequence Statuses | | | as described in [FCS] | +======+============+============+===========+==========+ |n+1- | Port name of the Exchange Originator (8 bytes) | |n+8 | | +======+============+============+===========+==========+ |n+9- | Port name of the Exchange Responder (8 bytes) | |n+16 | | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data Address Translation Type(see (type 0 only) ------------------- section 7.2) ------------------ ----------- Exchange Originator 0, 1 or 2 Port Name of the N_PORT I/D Exchange Originator Exchange Responder 0, 1 or 2 Port Name of the N_PORT I/D Exchange Responder When supplemental data is required, the ELS SHALL be extended by 4 words as shown above. Unused words in the extended fields SHALL be set to 0. Other Special Processing: None. 7.3.12 Read Link Error Status (RLS) ELS Format: Monia Standards Track 41 iFCP Revision 2 May 2001 +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x0F | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Rsvd | N_PORT Identifier | +======+============+============+===========+==========+ | 2-9 | Port name of the N_PORT (8 bytes) | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data (type Address Translation Type(see 0 only) ------------------- section 7.2) ------------------ ----------- N_PORT Identifier 0, 1 or 2 Port Name of the N_PORT Other Special Processing: None. 7.3.13 Read Sequence Status Block (RSS) ELS Format: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x09 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | SEQ_ID | Exchange Originator S_ID | +------+------------+------------+-----------+----------+ | 2 | OX_ID | RX_ID | +======+============+============+===========+==========+ | 3-4 |Port name of the Exchange Originator (8 bytes) | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data Address Translation Type(see (type 0 only) ------------------- section 7.2) ------------------ ----------- Exchange Originator 0, 1 or 2 Port Name of the S_ID Exchange Originator Other Special Processing: Monia Standards Track 42 iFCP Revision 2 May 2001 None. 7.3.14 Reinstate Recovery Qualifier (RRQ) ELS Format: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x12 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Rsvd | Exchange Originator S_ID | +------+------------+------------+-----------+----------+ | 2 | OX_ID | RX_ID | +------+------------+------------+-----------+----------+ | 3-10 | Association header (may be optionally reqÆd) | +======+============+============+===========+==========+ Fields Requiring Translation Supplemental Data Address Translation Type(see (type 0 only) ------------------- section 7.2) ------------------ ----------- Exchange Originator 1 or 2 N/A S_ID Other Special Processing: None. 7.3.15 Request Sequence Initiative (RSI) ELS Format: +------+------------+------------+-----------+----------+ | Word | Bits 31û24 | Bits 23û16 | Bits 15û8 | Bits 7-0 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0x0A | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | Rsvd | Exchange Originator S_ID | +------+------------+------------+-----------+----------+ | 2 | OX_ID | RX_ID | +------+------------+------------+-----------+----------+ | 3-10 | Association header (may be optionally reqÆd) | +======+============+============+===========+==========+ Monia Standards Track 43 iFCP Revision 2 May 2001 Fields Requiring Translation Supplemental Data Address Translation Type(see (type 0 only) ------------------- section 7.2) ------------------ ----------- Exchange Originator 1 or 2 N/A S_ID Other Special Processing: None. 7.3.16 Third Party Process Logout (TPRLO) TPRLO provides a mechanism for an N_PORT (third party) to remove one or more login sessions that exists between the destination N_PORT and other N_PORTs specified in the command. This command includes one or more TPRLO LOGOUT PARAMETER PAGEs, each of which when combined with the destination N_PORT identifies a SCSI login session which shall be terminated by the command. Byte Offset +----------------------------------+ 0 | LS_COMMAND | 1 Byte +----------------------------------+ 1 | PAGE LENGTH (0x10) | 1 Byte +----------------------------------+ 2 | PAYLOAD LENGTH (0x14) | 2 Bytes +----------------------------------+ 4 | TPRLO LOGOUT PARAMETER PAGE 1 | 2-4 Bytes +----------------------------------+ | . . . . | M Bytes +----------------------------------+ | TPRLO LOGOUT PARAMETER PAGE N | 2-4 Bytes +----------------------------------+ Total Length = Variable Each TPRLO LOGOUT PARAMETER PAGE identifies a remote N_PORT which when combined with the destination N_PORT identifies a SCSI session to be terminated. The TPRLO LOGOUT PARAMETER PAGE is of the following format: Monia Standards Track 44 iFCP Revision 2 May 2001 Byte Offset +----------------------------------+ 0 | TYPE CODE | 1 Byte +----------------------------------+ 1 | TYPE CODE EXTENSION | 1 Byte +----------------------------------+ 2 | TPRLO FLAGS | 2 Bytes +----------------------------------+ 4 | ORIG PROCESS ASSOC (if present) | 4 Bytes +----------------------------------+ 8 | RESP PROCESS ASSOC (if present) | 4 Bytes +----------------------------------+ 12 | RESERVED | 1 Byte +----------------------------------+ 13 | THIRD PARTY ORIGINATOR N_PORT ID | 3 Bytes +----------------------------------+ When the iFCP header contains a TPRLO message (including the ACC response), iFCP supplemental data field will contain the PORT_NAME(s) (WWPN) identifying the N_PORT described by the equivalent TPRLO LOGOUT PARAMETER PAGE(s). If more than one TPRLO LOGOUT PARAMETER PAGE is contained in the Link Service message, the corresponding PORT_NAME shall also be included. PORT_NAMEs shall be listed in the same order as the equivalent TPRLO LOGOUT PARAMETER PAGEs in the original Link Service message. [The format for passing supplemental data is /TBS/] Additionally, the THIRD PARTY ORIGINATOR N_PORT ID field in each TPRLO LOGOUT PARAMETER PAGE shall be cleared when it is sent by the originating gateway. This applies to both the original Link Service message and the ACC response. When the iFCP layer receives a TPRLO message, it shall use the latter to replace the THIRD PARTY ORIGINATOR N_PORT ID in the original Link Service message, before forwarding it on to the upper Fibre Channel layers. Additional information on TPRLO can be found in [FC-PH-2]. 8. TCP Session Control Messages TCP session control messages are used to create and manage N_PORT login session.. They are passed between peer FCP Portals, and are only processed within the iFCP layer. The response to a TCP session control message (if any) will echo the original request. Monia Standards Track 45 iFCP Revision 2 May 2001 The message format is based on the extended link service message template shown below. Word 3124 23<---------------Bits------------------------->0 +----------+------------------------------------------------+ 0| R_CTL | D_ID [0x00 00 00] | |[Req = 22]| [Destination of extended link Service request] | |[Rep = 23 | | +----------+------------------------------------------------+ 1| CS_CTL | S_ID [0x00 00 00] | | [0x0] | [Source of extended link service request] | +----------+------------------------------------------------+ 2|TYPE [0x1]| F_CTL [0] | +----------+------------------+-----------------------------+ 3|SEQ_ID | DF_CTL [0x00] | SEQ_CNT [0x00] | |[0x0] | | | +----------+------------------+-----------------------------+ 4| OX_ID [0x0000] | RX_ID_[0x0000] | +-----------------------------+-----------------------------+ 5| Parameter | | [ 00 00 00 00 ] | +-----------------------------------------------------------+ 6| LS_COMMAND | | [Session Control Command Code] | +-----------------------------------------------------------+ 7| | .| Additional Session Control Parameters | .| ( if any ) | n| | +===========================================================+ n| Fibre Channel CRC | +| | 1+===========================================================+ Format of Session Control Message The LS_COMMAND value for the response remains the same as that used for the request. The session control ELS frame is terminated with a fibre channel CRC. {EditorÆs note: Since these messages are never passed to the fibre channel device, the use of the FC ELS format is not required. However, leveraging the format may benefit a gateway implementation. Depending on the tradeoffs, therefore, the format may be modified to eliminate use of the ELS as a message template.] The encapsulation header for the link Service frame carrying a TCP ELS message SHALL be set as follows: Monia Standards Track 46 iFCP Revision 2 May 2001 Encapsulation Header Fields: LS_COMMAND 0 iFCP Flags SES = 1 TRN = 0 AUG = 0 SOF code SOFi3 encoding (0x2E) EOF code EOFn encoding (0x50) Time Stamp Integer 0,0 and Fraction fields The SOF and EOF delimiter words SHALL be set based on the SOF and EOF codes specified above. The following lists the TCP Link Service messages and their corresponding LS_COMMAND values. Request LS_COMMAND Short Name iFCP Support ------- ---------- ---------- ----------- Control Connection Bind 0xE0 CBIND REQUIRED Unbind Connection 0xE4 UNBIND REQUIRED 8.1 Connection Bind (CBIND) The CBIND message binds an N_PORT login session to a specific TCP connection in preparation for establishing an N_PORT login session. In the CBIND request message, the source and destination N_Ports are identified by the N_PORT network address (iFCP portal address and N_PORT ID). The following shows the format of the CBIND request. Monia Standards Track 47 iFCP Revision 2 May 2001 +------+------------+------------+-----------+----------+ | Word | Byte 0 | Byte 1 | Byte 2 | Byte 3 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0xE0 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | User Info | +------+------------+------------+-----------+----------+ | 2 | Interface Speed | +------+------------+------------+-----------+----------+ | 3 | | +------+ SOURCE PORT NAME | | 4 | | +------+------------------------------------------------+ | 5 | | +------+ DESTINATION PORT NAME | | 6 | | +------+------------------------------------------------+ USER INFO - Contains any data desired by the requester. This info MUST be echoed by the recipient in the CBIND response message. SOURCE PORT NAME - Contains the originating N_PORT's World Wide Port Name (WWPN). DESTINATION PORT NAME - Contains the destination N_PORT's World Wide Port Name (WWPN). The following shows the format of the CBIND response. +------+------------+------------+-----------+----------+ | Word | Byte 0 | Byte 1 | Byte 2 | Byte 3 | +------+------------+------------+-----------+----------+ | 0 | Cmd = 0xE0 | 0x00 | 0x00 | 0x00 | +------+------------+------------+-----------+----------+ | 1 | User Info | +------+------------+------------+-----------+----------+ | 2 | | +------+ SOURCE PORT NAME | | 3 | | +------+------------------------------------------------+ | 4 | | +------+ DESTINATION PORT NAME | | 5 | | +------+-------------------------+----------------------+ | 6 | Reserved | CBIND Status | +------+-------------------------+----------------------+ | 7 | Reserved | CONNECTION HANDLE | +------+-------------------------+----------------------+ Total Length = 26 Monia Standards Track 48 iFCP Revision 2 May 2001 USER INFO - Contains the same value received in the USER INFO field of the CBIND request message. DESTINATION PORT NAME - Contains the destination N_PORT's World Wide Port Name (WWPN). CBIND STATUS - Indicates success or failure of the CBIND request. CBIND values are shown below. Value Description ----- ----------- 0 Successful û No other status 1 û 15 Reserved 16 Failed û Unspecified Reason 17 Failed û No such device 18 Failed û N_PORT session already exists 19 Failed û Lack of resources Others Reserved CONNECTION HANDLE (CHANDLE) - Contains a value assigned by the FCP Portal to identify the control connection. 8.2 Unbind Connection (UNBIND) UNBIND is used to release a bound TCP connection and return it to the pool of unbound TCP connections. This message is transmitted in the connection that is to be unbound. The following is the format of the UNBIND request message. Byte MSb LSb Offset 7 6 5 4 3 2 1 0 +----------------------------------+ 0 | LS_COMMAND (0xE4000000) | 4 Bytes +----------------------------------+ 4 | USER INFO | 4 Bytes +----------------------------------+ 8 | CONNECTION HANDLE | 4 Bytes +----------------------------------+ 12 | RESERVED | 8 Bytes +----------------------------------+ Total Length = 20 CONNECTION HANDLE (CHANDLE) - Contains a value assigned by the FCP Portal to identify the connection The following shows the format of the UNBIND response message. Monia Standards Track 49 iFCP Revision 2 May 2001 Byte MSb LSb Offset 7 6 5 4 3 2 1 0 +----------------------------------+ 0 | LS_COMMAND (0xE4000000) | 4 Bytes +----------------------------------+ 4 | USER INFO | 4 Bytes +----------------------------------+ 8 | CONNECTION HANDLE | 4 Bytes +----------------------------------+ 16 | RESERVED | 10 Bytes +----------------------------------+ 26 | UNBIND STATUS | 2 Bytes +----------------------------------+ 28 | RESERVED | 2 Bytes +----------------------------------+ Total Length = 26 UNBIND STATUS - Indicates the success or failure of the UNBIND request. Value Description ----- ----------- 0 Successful û No other status 1 û 15 Reserved 16 Failed û Unspecified Reason 17 Failed û No such device 18 Failed û Connection ID Invalid Others Reserved CONNECTION HANDLE (CHANDLE) - Contains a value assigned by the FCP Portal to identify the unbound connection. 9. iFCP Error Detection 9.1 Overview [FCP-2], [FC-PH], and [FC-PH-2] define error detection and recovery procedures. These Fibre Channel-defined mechanisms continue to be available in the iFCP environment. 9.2 Timer Definitions and Stale Frame Detection 9.2.1 Error_Detect_Timeout (E_D_TOV) E_D_TOV is "a reasonable timeout value for detection of a response to a timed event". The default value specified by FC-FS of 2 seconds will be also used as the iFCP default value. Monia Standards Track 50 iFCP Revision 2 May 2001 E_D_TOV is the maximum time allowed between the transmission of consecutive data frames within a sequence. For Class 2 service, E_D_TOV specifies the maximum time interval between transmission of a frame, and receipt of the ACK for that frame. E_D_TOV MAY be specified individually for each gateway. If a gateway-specific value is not set, the gateway SHALL obtain the value from the iSNS name server. 9.2.2 Resource Allocation Timeout (R_A_TOV R_A_TOV is defined in FC-PH-2 as "the maximum transit time within a fabric to guarantee that a lost frame will never emerge from the fabric". A value of 2 x R_A_TOV is the minimum time that the originator of an ELS request or FC-4 ELS request shall wait for the response to that request. The fibre channel default value for R_A_TOV is 10 seconds. R_A_TOV MAY be specified individually for each gateway. If a gateway-specific value is not set, the gateway SHALL obtain the value from the iSNS name server. The iFCP fabric MAY actively enforce limits on R_A_TOV as described in section 9.2.2.1. 9.2.2.1 Enforcing R_A_TOV Limits The R_A_TOV limit on frame lifetimes MAY be enforced by means of the time stamp in the encapsulation header (see section 6.1) as described in this section. If enforced by a gateway, the propagation delay time limit (MAX_PROP_DELAY) SHOULD be set well below the value of R_A_TOV specified for the iFCP fabric and SHOULD be stored in the iSNS server. A rule of thumb is to set MAX_PROP_DELAY to 50 percent of R_A_TOV. The following paragraphs describe the requirements for synchronizing gateway time bases and the rules for measuring and enforcing propagation delay limits. The protocol for synchronizing a gateway time base is SNTP. In order to insure that all gateways are time-aligned, a gateway SHOULD obtain the address of an SNTP server via an iSNS query. If multiple SNTP server addresses are returned by the query, the servers must be synchronized and the gateway may use any server in the list. Alternatively, the server may return a multicast group address in support of operation in Anycast mode. Implementation of Anycast mode is as specified in RFC 2030, including the precautions defined in that document. Multicast mode SHOULD NOT be used. Monia Standards Track 51 iFCP Revision 2 May 2001 An SNTP server may use any one of the time reference sources listed in RFC 2030. The resolution of the time reference MUST be 125 milliseconds or better. With regard to the time base, the gateway is in either the synchronized or unsychronized state. When in the unsynchronized state, the gateway SHALL: a) Set the time stamp field to 0,0 for all outgoing frames b) Ignore the time stamp field for all incoming frames. When in the synchronized state, the gateway SHALL a) Set the time stamp field for each outgoing frame in accordance with the gateway's internal time base b) Check the time stamp field of each incoming frame. c) If the incoming frame has a time stamp of 0,0, the receiving gateway SHALL NOT test the frame to determine if it is stale. d) If the incoming frame has a non-zero time stamp, the receiving gateway shall compute the time in flight and compare it against the value of MAX_PROP_DELAY specified for the IP fabric. e) If the result in step (d) exceeds MAX_PROP_DELAY, the frame shall be discarded. Otherwise, the frame shall be accepted. A gateway SHALL enter the synchronized state upon receiving a successful response to an SNTP query. A gateway shall enter the unsynchronized state: a) Upon power up and before successful completion of an SNTP query b) Whenever the gateway looses contact with the SNTP server. If synchronization is lost, the gateway MAY choose to abort all N_PORT login sessions with all remote gateways. 10. Fabric Services Supported by an iFCP implementation An iFCP gateway implementation MUST support the following fabric services: N_PORT ID Value Description Section --------------- ----------- ------- Monia Standards Track 52 iFCP Revision 2 May 2001 0xFF FF FE F_PORT Server /TBS/ 0xFF FF FD Fabric Controller /TBS/ 0xFF FF FC Directory/Name Server /TBS/ In addition, an iFCP gateway MAY support the FC broadcast server functionality described in section 10.1. 10.1 iFCP Support for the FC Broadcast Service In Fibre Channel, frames are broadcast by addressing them to the broadcast server at well-known address 0xff ff ff. The broadcast server then replicates and delivers the frame to each attached N_PORT in all zones to which the originating device belongs. Only class 3 (datagram) service is supported. In an iFCP/iSNS system, the broadcast functionality MAY be implemented within each gateway by an iFCP broadcast server. The broadcast server has an N_PORT I/D of 0xff ff ff. Outgoing frames to be broadcast are directed to the broadcast server by locally attached N_PORTs. The broadcast server then redistributes such frames as follows: a) One copy is sent to each locally attached N_PORT in the same zone as the originator. b) One copy is sent to the broadcast server in each remote gateway via a UDP datagram, The D_ID field is set to the well-known address of the broadcast server. The datagram encapsulation format is identical to the iFCP encapsulation format described in section 6. On receiving an iFCP broadcast datagram via UDP, the broadcast server SHALL: a) Validate the header as described in section 6.3. If the header is invalid, the frame SHALL be discarded. b) Convert the S_ID N_PORT address in the frame to an N_PORT alias as described in section 3.3.2, if address translation mode is in effect. c) If the AUG bit is set in the iFCP flags field, perform any special processing required by the ELS, including translation of any addresses in the payload. Monia Standards Track 53 iFCP Revision 2 May 2001 d) Replicate and redistribute the frame to all locally attached N_PORTs in the discovery domain of the sender. If no broadcast server is implemented, the receiving gateway SHALL discard an incoming broadcast frame from a remote gateway. Frames received from locally attached N_PORTs shall be processed as specified in [FC-SW]. 11. Security 11.1 Overview As with any other IP-based network, an iFCP storage network has security issues which must be addressed with the appropriate security policies and enforcement resources. There are various levels of security paradigms which when applied appropriately to an iFCP network can provide sufficient levels of security, including data integrity, authentication, and privacy, depending on user needs. 11.2 Physical Security Most existing SCSI and Fibre Channel interconnections are deployed in private, physically isolated environments where hostile entities are not provided access to the SCSI and Fibre Channel interconnects. This is the most basic security mechanism, and may be a sufficient model in some cases for an iFCP network. 11.3 Controlling Access A second level of security is the use of zoning. Zoning specifies which devices are allowed to communicate, and is similar in concept to VLAN (Virtual Local Area Network) technology. Zoning information is maintained in a Name Server. 11.4 Authentication and Encryption Where additional levels of data integrity and privacy are required for iFCP, existing IPSec specifications can be applied to iFCP. Because IPSec is a layer-3 technology and has no knowledge of TCP, UDP, or higher-level protocols such as iFCP and FCP, it can be applied transparently to iFCP. The following IETF documents describe the operational framework and automatic keying mechanisms for IPSec. RFC2401 Security Architecture for the Internet Protocol RFC2402 IP Authentication Header Monia Standards Track 54 iFCP Revision 2 May 2001 RFC2406 IP Encapsulating Security Payload RFC2407 The Internet IP Security Domain of Interpretation for ISAKMP RFC2408 Internet Security Association and Key Management Protocol (ISAKMP) RFC2409 The Internet Key Exchange (IKE) 11.5 Storage Firewalls Firewalls are a common and proven methodology for securing access to IP-based networks, and they can be appropriate for use in IP-based storage networks as well. A firewall is a choke point through which all transit traffic must transit in order to pass between two separate networks. Since all iFCP traffic uses a well-known IANA-assigned TCP port number, it can easily be recognized and inspected. Access to storage resources can be secured by setting up a single gateway through which all outside non-secured traffic must pass through in order to access resources in the storage network. Such a firewall can be a proxy host operating at the session or application layer, requiring authentication before allowing traffic to pass. It can also be a stateful inspection gateway which understands the iFCP protocol, and can passively inspect and discover security threats as they transit the gateway. A third option is to use a standard router access control list to filter authorized traffic based upon static parameters such as IP addresses and TCP port numbers. 12. Quality of Service Considerations 12.1 Minimal requirements Conforming iFCP protocol implementations SHALL correctly communicate gateway-to-gateway even across one or more intervening best-effort IP regions. The timings with which such gateway-to gateway communication is performed, however, will greatly depend upon BER, packet losses, latency, and jitter experienced throughout the best-effort IP regions. The higher these parameters, the higher will be the gap measured between iFCP observed behaviors and baseline iFCP behaviors (i.e., as produced by two iFCP gateways directly attached to one another). 12.2 High-assurance Monia Standards Track 55 iFCP Revision 2 May 2001 It is expected that many iFCP deployments will benefit from a high degree of assurance on the behaviors of the intervening IP regions, with resulting high-assurance on the overall end- to-end path, as directly experienced by Fibre Channel applications. Such assurance on the IP behaviors stems from the intervening IP regions supporting standard Quality-of- Service (QoS) techniques, fully complementary to iFCP, such as: a) Congestion avoidance by over-provisioning of the network b) Integrated Services [IntServ] QoS c) .Differentiated Services [DiffServ] QoS d) .Multi-Protocol Label Switching [MPLS] In the most general definition, two iFCP gateways are separated by one or more independently managed IP regions, some of which implement some of the QoS solutions mentioned above. The IP regions with these QoS solutions are said to support Service Level Agreements (SLAs). Such agreements finalize requirements on network parameters such as bandwidth, loss, latency, jitter, burst length. The requirements may be expressed in absolute or relative terms, and apply to a unidirectional flow of packets. Depending on the QoS techniques available, the dynamic stipulation of a SLA may require the iFCP gateway to interact with network ancillary functions such admission control and bandwidth brokers (with RSVP or other signalling protocols that an IP region may accept). Due to the fact that Fibre Channel Class 2 and Class 3 do not support fractional bandwidth guarantees, and that iFCP is committed to supporting current Fibre Channel semantics, it is impossible for an iFCP gateway to autonomously infer bandwidth requirements from streaming Fibre Channel traffic. Rather, the requirements on bandwidth or other network parameters need to be injected out-of-band into a iFCP gateway (or the node that will actually negotiate the SLA on the gateway's behalf) through mechanisms outside the scope of this specification (e.g., through a management interface into the iFCP gateway). The administrator of a iFCP gateway MAY thus stipulate a Service Level Agreement with the local IP region for one, several, or all of an iFCP gateway's TCP sessions used by iFCP. Alternately, this responsibility may be delegated to a node downstream. Should an iFCP implementation support multiple tuples over the same TCP connection, and should such a connection be subject to a SLA, then all these tuples will share in the same SLA and the resulting treatment by the network. For finer granularity Monia Standards Track 56 iFCP Revision 2 May 2001 of QoS behaviors, iFCP implementations MAY elect to dedicate a distinct TCP connection to each active tuple. This is the way an individual tuple can enjoy a customized SLA. To render the best emulation of Fibre Channel possible over IP, it is anticipated that typical SLAs will specify a fixed amount of bandwidth, null losses, and, to a lesser degree of relevance, low latency, and low jitter. For example, an IP region using DiffServ QoS may support SLAs of this nature by applying EF DSCPs to the iFCP traffic. For the same SLA, another IP region might as well use a different DSCP or different QoS techniques alltogether. The way different QoS techniques are re-mapped at the edge of different intervening IP regions is beyond the scope of this specification. [T11/00-603V0] describes a proposal to add fractional bandwidth guarantees to Class 2 and 3 (migrating it from Class 4). In such proposal, the bandwidth parameters would surface in the FLOGI request and accept, and PLOGI request and accept. In this case, it will become possible for an iFCP gateway to trap this information and autonomously remap it onto the SLA negotiation mechanism required by the local IP region, without resorting to out-of-band QoS management. Such an in-band QoS mechanism would result in true end-to-end provisioning of network resources. Forthcoming revisions of this iFCP specification will build upon this new opportunity. 13. References 13.1 Relevant SCSI (T10) Specifications The following documents are available from: Global Engineering, 15 Inverness Way East, Englewood, CO 80112-5704. Telephone (800) 854-7179 or (303) 792-2181, Fax: (303) 792- 2192 [SAM] SCSI-3 Architecture Model (SAM), ANSI X3.270-1996 [SAM-2] SCSI Architecture Model-2 (SAM-2), Project 1157-D, revision 11 [SPC] SCSI Primary Commands (SPC), ANSI X3.301-1997 [SPC-2] SCSI Primary Commands-2 (SPC-2), Project 1236-D, revision 16 [FCP] Fibre Channel Protocol for SCSI (FCP), ANSI X3.269-1996 [FCP-2] Fibre Channel Protocol for SCSI, Second Revision (FCP- 2), Project 1144D, revision 04 Monia Standards Track 57 iFCP Revision 2 May 2001 10.2 Relevant Fibre Channel (T11) Specifications The following documents are available from: Global Engineering, 15 Inverness Way East, Englewood, CO 80112-5704. Telephone (800) 854-7179 or (303) 792-2181, Fax: (303) 792- 2192 [FC-PH] Fibre Channel Physical and Signaling Interface (FC-PH) Rev 4.3, ANSI X3.230:1994 [FC-PH-2] Fibre Channel Physical and Signaling Interface (FC-PH- 2) Rev 7.4, ANSI X3.297:1997 [FC-PH-3] Fibre Channel Physical and Signaling Interface (FC-PH- 3) Rev 9.4, ANSI X3.303:1998 [FC-FG] Fibre Channel Generic Requirements (FC-FG) Rev 3.5 ANS X3.289:1996 [FC-GS-2] Fibre Channel Generic Services (FC-GS-2) Rev 5.2, ANSI NCITS 288 [FC-AL] Fibre Channel Arbitrated Loop (FC-AL) Rev 4.5, ANSI X3.272:1996 [FC-AL-2] Fibre Channel Arbitrated Loop (FC-AL-2) Rev 7.0, NCITS 32:1999 [FC-PLDA] Fibre Channel Private Loop SCSI Direct Attachment (FC LDA), NCITS TR-19:1998 [FC-FLA] Fibre Channel Fabric Loop Attachment (FC-FLA), NCITS TR-20:1998 [FC-TAPE] Fibre Channel Tape and Tape Medium Changers (FC-TAPE), NCITS TR-24:1999 10.3 Relevant RFC Documents [RFC768] User Datagram Protocol [RFC791] Internet Protocol, DARPA Internet Program Protocol Specification [RFC1146] TCP Alternate Checksum Options Monia Standards Track 58 iFCP Revision 2 May 2001 [RFC2401] Security Architecture for Internet Protocol [RFC2402] IP Authentication Header [RFC2406] Encapsulating Security Protocol (ESP) [RFC2407] The Internet IP Security Domain for ISAKMP [RFC2408] Internet Security Association and Key Management Protocol (ISAKMP) [RFC2409] The Internet Key Exchange (IKE) [RFC2460] Internet Protocol, Version 6 (IPv6) Specification 10.4 Other Reference Documents Fibre Channel, Gigabit Communications and I/O for Computer Networks, Alan F. Beener, McGraw-Hill, ISBN 0-07-005669-2 The Fibre Channel Consultant, A Comprehensive Introduction, Robert W. Kembel, Northwest Learning Associates, ISBN 0- 931836-82-6 The Fibre Channel Consultant, Arbitrated Loop, Rober W. Kembel, Connectivity Solutions, a division of Northwest Learning Associates, ISBN 0-931836-84-0 14. Author's Addresses Charles Monia Franco Travostino Rod Mullendore Director, Content Josh Tseng Internetworking Lab, Nishan Systems Victor Firoiu 3850 North First Street San Jose, CA 95134 Nortel Networks Phone: 408-519-3986 3 Federal Street Email: Billerica, MA 01821 cmonia@nishansystems.com Phone: 978-288-7708 Email: travos@nortelnetworks.com David Robinson Wayland Jeong Sun Microsystems Troika Networks Senior Staff Engineer Vice President, Hardware M/S UNWK16-301 Engineering 901 San Antonio Road 2829 Townsgate Road Suite Palo Alto, CA 94303-4900 200 Monia Standards Track 59 iFCP Revision 2 May 2001 Phone: 510-936-2337 Westlake Village, CA 91361 Email: Phone: 805-370-2614 David.Robinson@sun.com Email: wayland@troikanetworks.com Rory Bolt Paul Rutherford Quantum/ATL ADIC Director, System Design Vice President, Technology & 101 Innovation Drive Software Irvine, CA 92612 1143 Willows Road N.E. Phone: 949-856-7760 P.O. Box 97057 Email: rbolt@atlp.com Redmond, WA 98073-9757 Phone: 425-881-8004 Email: paul.rutherford@adic.com Mark Edwards Senior Systems Architect Eurologic Development, Ltd. 4th Floor, Howard House Queens Ave, UK. BS8 1SD Phone: +44 (0)117 930 9600 Email: medwards@eurologic.com Monia Standards Track 60 iFCP Revision 2 May 2001 Appendix A A. iFCP Support for Fibre Channel Link Services For reference purposes, this appendix enumerates all the fibre channel link services and the manner in which each shall be processed by an iFCP implementation. The iFCP processing policies are defined in section 7. A.1 Basic Link Services The basic link services are shown in the following table. Basic Link Services Name Description iFCP Policy ---- ----------- ---------- ABTS Abort Sequence Transparent BA_ACC Basic Accept Transparent BA_RJT Basic Reject Transparent NOP No Operation Transparent PRMT Preempted Rejected (Applies to Class 1 only) RMC Remove Connection Rejected (Applies to Class 1 only) A.2 Link Services Processed Transparently The following link service requests and responses MUST be processed transparently as defined in section 7. ELSs Processed Transparently Name Description ---- ----------- ACC Accept ADVC Advise Credit CSR Clock Synchronization Request CSU Clock Synchronization Update ECHO Echo ESTC Estimate Credit ESTS Establish Streaming FACT Fabric Activate Alias_ID FAN Fabric Address Notification FDACT Fabric Deactivate Alias_ID FDISC Discover F_Port Service Parameters Monia Standards Track 61 iFCP Revision 2 May 2001 FLOGI F_Port Login GAID Get Alias_ID LCLM Login Control List Management LINIT Loop Initialize LIRR Link Incident Record Registration LPC Loop Port Control LS_RJT Link Service Reject LSTS Loop Status NACT N_Port Activate Alias_ID NDACT N_Port Deactivate Alias_ID PDISC Discover N_Port Service Parameters PRLI Process Login PRLO Process Logout QoSR Quality of Service Request RCS Read Connection Status RLIR Registered Link Incident Report RNC Report Node Capability RNFT Report Node FC-4 Types RNID Request Node Identification Data RPL Read Port List RPS Read Port Status Block RPSC Report Port Speed Capabilities RSCN Registered State Change Notification RTIN Request Topology Information RTV Read Timeout Value RVCS Read Virtual Circuit Status SBRP Set Bit-error Reporting Parameters SCL Scan Remote Loop SCN State Change Notification SCR State Change Registration TEST Test TPLS Test Process Login State A.3 Augmented Link Services The following extended link services are augmented with additional data and processed by the iFCP implementation as described in the referenced section listed in the table. Augmented Link Services Name Description Section ---- ----------- ------- ABTX Abort Exchange 7.3.1 ADISC Discover Address 7.3.2 Monia Standards Track 62 iFCP Revision 2 May 2001 ADISC Discover Address Accept 7.3.3 ACC FARP- Fibre Channel Address 7.3.4 REPLY Resolution Protocol Reply FARP-REQ Fibre Channel Address 7.3.5 Resolution Protocol Request LOGO N_PORT Logout 7.3.6 PLOGI Port Login 7.3.7 REC Read Exchange Concise 7.3.8 REC ACC Read Exchange Concise Accept 7.3.9 RES Read Exchange Status Block 7.3.10 RES ACC Read Exchange Status Block 7.3.11 Accept RLS Read Link Error Status Block 7.3.12 RRQ Reinstate Recovery Qualifier 7.3.14 RSI Request Sequence Initiative 7.3.15 RSS Read Sequence Status Block 7.3.13 TPRLO Third Party Process Logout 7.3.16 Monia Standards Track 63 iFCP Revision 2 May 2001 Appendix B B. Performance of The Multi-Connection iFCP Session Model This appendix provides a quantitative analysis of the claim that N TCP connections carrying the traffic of all the sessions active between gateways provide significantly higher aggregate average throughput than a single TCP connection carrying the same sessions. The analysis shows that the difference is proportional to the square of the number of TCP sessions, N. This analyses is based on three fundamental assumptions: (i) all the available bandwidth in a link is available to iFCP traffic, (ii) the sender has always data ready to send (as is most likely the case with a backup application), and (iii) the maximum window size at the two TCP ends (i.e., the iFCP gateways) is set to the link nominal capacity multiplied by the round-trip-time (so as to have the highest chances of saturating the link yet without unduly raising buffering requirements at the end nodes). The N^2 factor that emerges from this analysis is essentially due to the way TCP congestion control reacts to packet losses. B.1 Relationship of Throughput to Packet Losses There are several reasons for packet losses: network congestion, link errors and network errors. Network congestion is pervasive in current IP networks, where the only way to control congestion is through dropping packets. Techniques for loss prevention, such as traffic engineering, admission control and bandwidth reservation, are not widely deployed and hence are not a factor in the behavior of existing networks. Even in a perfectly engineered network, link errors occur. Assuming a link error rate equal to that specified for Fibre Channel (10^-12) and a 10Gb/s link, there is one error every 100 seconds. Network errors also occur with significant frequency in IP networks. Jonathan Stone and Craig Partridge recently reported in Sigcomm 2000 that network errors caught by the TCP checksum occur with significant frequency. Between one packet in 1100 and 1 in 32000 have errors get past the link CRC and are detected by the TCP/IP checksum. TCP throughput is impacted by each packet loss. Following TCP's congestion control algorithm (supported by the Tahoe, Reno, New-Reno, and SACK implementations) each packet loss results in the TCP sender's congestion window being reduced to half of its current value, and therefore (assuming constant Round Trip Time), TCP's throughput is halved. After that, the window increases by roughly one packet every two Round Trip Times (assuming the widely-used Delayed-Acknowledgement Monia Standards Track 64 iFCP Revision 2 May 2001 algorithm). The temporary decrease in TCP's rate translates into a missed opportunity to transmit a given amount of data. As we show in the following Background section, for N storage connections sharing an IP "pipe" of rate E, the amount of data missing the opportunity to be transmitted due to a packet loss is: D(N) = E^2/(N^2)*RTT^2/(256*M) where RTT = Round Trip Time, M = packet size. For example, for a set of N=100 connections totaling E=10Gb/s, RTT=10ms, M=1500B, the data not transmitted in time due to a packet loss is D(N)=2.6MB. For the same set transported over one TCP session, the data not sent in time is D(1)= 26GB, a 10,000 fold increase. The time interval for TCP to recover its sending rate to its initial value after a packet loss is I(N)= 0.833 seconds in the case N TCP connections, and I(1)=83.3seconds in the case of a single TCP connection. Observe that in the latter case, the time to recover its rate, I(1)=83.3s, is of the same order of magnitude as the time between two packet losses due exclusively to a link Bit Error Rate of 10^-12. In other words, a packet loss occurs almost immediately after TCP has recovered its rate. This means that a single TCP connection delivers on average about 3/4 of the required 10Gb/s rate, since 1/4 of the rate is lost during the time the TCP rate is increasing linearly from 1/2 to full rate. (More precisely, the effective rate is 8.27Gb/s because 1/4 of the rate is lost during 83.3s, and the time between two errors is now 120.825s due to a decreased sending rate). By comparison, N TCP connections deliver approximately 9.99979Gb/s (i.e., lost 1/4 of one TCP full rate of 100Mb/s during 0.833s out of a 100s interval). If the impact of TCP checksum errors is also considered, the TCP sending rate is limited to an average of (8M/RTT)sqrt(3/4p), where p is the probability of packet loss (see [1] for details). For M=1500, RTT=10ms and p=1/32000, TCP throughput is about 240Mb/s. For p=1/1100, maximum TCP throughput is 34.4Mb/s. Therefore, to fill a 10Gb/s line, about 42 simultaneous TCP flows are required (in the case where p=1/32000) or 291 TCP flows (in the case where p=1/1100). Practically, for these reasons the iFCP protocol supports combinations of M tuples using N TCP connections, with M, N >= 1, and with an individual tuple using at most one TCP connection (thus M >= N). B.2 Background. Monia Standards Track 65 iFCP Revision 2 May 2001 For a TCP session to sustain a rate of C bits/second, the TCP's maximum congestion window W (measured in number of packets) has to be at least W0=RTT*C/(8*M) where RTT = Round Trip Time in seconds, M = packet size in Bytes. The following analyses assumes W=W0. Later, the problems with the alternative W>W0 are discussed. The time needed by the TCP sender to recover from a single packet loss and have its sending rate reach the previous C value is I = 2*RTT*W/2 = RTT*W = RTT^2*C/(8*M). The total amount of data (in Bytes) missing the opportunity to be transmitted in this time interval I is: D = C/8*I/4 = C^2*RTT^2/(256*M) Consider a set of tuples sharing an IP "pipe" of rate E to be transported in N TCP sessions. Assuming all connections are processed equally, each TCP session sends at a rate of E/N. One packet loss impacts only one TCP session, and thus, the total amount of data missing the opportunity to be transmitted due to a packet loss is D(N) = E^2/(N^2)*RTT^2/(256*M). On the other hand, if the same set of tuples sharing an IP "pipe" of rate E is transported in one TCP session only, the total amount of data losing the opportunity to be transmitted due to a packet loss is D(1) = E^2*RTT^2/(256*M) = D(N)*N^2. The impact of packet losses on the single-TCP solution can be reduced by configuring the maximum congestion window to be larger than the bandwidth*delay product, W>W0. But in this case, only W0 packets can be in transit on the line, while the rest (up to the current window size) need to be stored in a queue at the line's ingress. In order to provide full line rate utilization assuming periodic losses, the maximum congestion window should be at least 2*W0, due to TCP's congestion Monia Standards Track 66 iFCP Revision 2 May 2001 Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. 2 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 Monia Standards Track 67