RDMA Connection Manager Private Messages For RPC-Over-RDMA Version One Oracle Corporation 1015 Granger AvenueAnn ArborMI48104USA+1 734 274 2396chuck.lever@oracle.com
Transport
Network File System Version 4NFS-Over-RDMAThis document specifies the format of RDMA-CM Private Data exchanged between RPC-over-RDMA Version One peers. Such messages indicate peer support for Remote Invalidation and larger-than-default inline thresholds, but can be extended. The Private Data message format defined in this document is experimental only. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in . RPC-over-RDMA Version One, specified in , enables the use of direct data placement for upper layer protocols based on RPC . However, there are some recognized shortcomings of the RPC-over-RDMA Version One protocol. The two most immediate shortcomings are: Setting up an explicit RDMA operation (RDMA Read or Write) can be costly. The small default size of inline thresholds requires the use of explicit RDMA operations even for relatively small messages and data payloads. Unlike most other contemporary RDMA-enabled storage protocols, there is no facility in RPC-over-RDMA Version One that enables the use of Remote Invalidation . The original specification of RPC-over-RDMA Version One provided an out-of-band protocol for passing inline threshold settings between connected peers. However, deprecates this protocol because it was not fully specified and thus it was never implemented. Work on has demonstrated that the RPC-over-RDMA Version One protocol as it stands is challenging to extend while maintaining interoperability. Therefore, another out-of-band mechanism is required to help relieve these limitations for RPC-over-RDMA Version One implementations. This document specifies a simple, non-XDR-based message format designed to pass between RPC-over-RDMA Version One peers when an RDMA transport connection is first established. The purpose of this message format is to enable experimentation with parameters of the base transport layer over which RPC-over-RDMA runs. Future versions of RPC-over-RDMA may make use of these experimental results, providing similar information exchange as part of the XDR-defined base transport protocol. Section 4.3.2 of defines the term "inline threshold." There are a pair of inline thresholds per transport connection, one for each direction of message flow, which limit the size of messages conveyed using RDMA Send. If an incoming message exceeds the size of a receiver's inline threshold, the receive operation fails and the connection is typically terminated. To send a message larger than a receiver's inline threshold, an NFS client uses explicit RDMA operations, which are typically more costly than RDMA Send. The default value of this threshold for RPC-over-RDMA Version One connections is 1024 bytes (see Section 4.3.3 of ). This is adequate for nearly all NFS Version 3 procedures. NFS Version 4 COMPOUNDs are larger, on average, forcing clients to use explicit RDMA operations for frequently-issued requests such as LOOKUP and GETATTR. If a sender and receiver can agree on a larger inline threshold, a greater portion of frequently-issued NFS Version 4 operations can avoid the use of explicit RDMA operations. Explicit RDMA can be avoided for smaller I/O requests as well. Thus each peer advertises the largest message size it can send and the largest size it can receive. The requester MUST use the smaller of its maximum send size and the responder's maximum receive size as the requester-to-responder inline threshold. The responder MUST use the smaller of its maximum send size and the requester's maximum receive size as the responder-to-requester inline threshold. A description of Remote Invalidation and a full discussion of the design issues can be found in . Without altering the XDR definition of RPC-over-RDMA Version One messages that carry chunk lists, it's not possible to provide fully generic support for Remote Invalidation. However, it is possible to provide a simple signaling mechanism for a requester to indicate it can deal with Responder's Choice (see Section 2.3 of ). In this case, the responder is allowed to invalidate any STag in an RPC-over-RDMA request. Thus each peer advertises its ability to support Responder's Choice Remote Invalidation. If both peers support it, then the responder MAY use RDMA Send With Invalidate rather than RDMA Send to convey RPC-over-RDMA reply messages. When an RPC-over-RDMA Version One transport connection is established, a requester and responder MAY populate the CM Private Data field exchanged as part of CM connection establishment (refer to Section 12.7.35 of ). For RPC-over-RDMA Version One, the CM Private Data field is formatted as described in this section. Requesters and responders use the same format. The first 8 octets of the CM Private Data field MUST be formatted as follows: This field contains a fixed 32-bit value that identifies the content of the Private Data field as an RPC-over-RDMA Version One CM Private Data message. The value of this field MUST be 0xf6ab0e18, in big-endian order. This 8-bit field contains a message format version number. The value "1" in this field means only the first eight octets are present, they appear in the order described in this section, and they each have the meaning defined in this section. This 8-bit field contains eight boolean flags that indicate the support status of optional features, such as Remote Invalidation. The meaning of these flags is defined in . This 8-bit field contains an encoded value corresponding to the largest message size this peer can send using RDMA Send. The value is encoded as described in . This 8-bit field contains an encoded value corresponding to the largest message size this peer can receive via posted receive buffers. The value is encoded as described in . The bits in the Flags field are labeled from bit 8 to bit 15, as shown in the diagram above. When the Version field contains the value "1", the bits in the Flags field have the following meaning: When this bit is asserted (one), the sender supports the use of Remote Invalidation, as described in . When this bit is clear (zero), the sender does not support Remote Invalidation. These bits are reserved and must be clear (zero). Inline threshold sizes from 1KB to 256KB can be represented in the Send Size and Receive Size fields. A sender computes the encoded value by dividing the actual value by 1024 and subtracting one from the result. A receiver decodes this value by performing complementary operations. The Private Data format described above can be extended to add additional optional fields which follow the first eight octets or to make use of one of the reserved bits in the Flags fields. To introduce such changes while preserving interoperability, a new Version number is allocated, and new fields and bit flags are defined. A description of how receivers should behave if they do not recognize the new format must also be provided. If this document is still a personal draft in the Experiemental category, it must be updated to document the new Private Data message format as above. This extension is intended to interoperate with other RPC-over-RDMA Version One implementations which do not support the exchange of CM Private Data. When a peer does not receive a CM Private Data message which conforms to , it MUST assume the remote peer supports only the default RPC-over-RDMA Version One settings as defined in . In other words, the peer behaves as if a Private Data message was received in which bit 8 of the Flags field is clear (zero), and both Size fields contain the value zero. There are no IANA considerations for this document. RDMA-CM Private Data typically traverses the link layer in the clear. The same considerations apply here that are described in the Security Considerations section of . Remote Direct Memory Access Transport for Remote Procedure Call, Version OneThis document specifies a protocol for conveying Remote Procedure Call (RPC) messages on physical transports capable of Remote Direct Memory Access (RDMA). It requires no revision to application RPC protocols or the RPC protocol itself. This document obsoletes RFC 5666.Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) SecurityThis document analyzes security issues around implementation and use of the Direct Data Placement Protocol (DDP) and Remote Direct Memory Access Protocol (RDMAP). It first defines an architectural model for an RDMA Network Interface Card (RNIC), which can implement DDP or RDMAP and DDP. The document reviews various attacks against the resources defined in the architectural model and the countermeasures that can be used to protect the system. Attacks are grouped into those that can be mitigated by using secure communication channels across the network, attacks from Remote Peers, and attacks from Local Peers. Attack categories include spoofing, tampering, information disclosure, denial of service, and elevation of privilege. [STANDARDS-TRACK]Using Remote Invalidation With RPC-Over-RDMA TransportsRemote Invalidation relieves RDMA requesters/initiators of some of the burden of preparing memory to be accessed remotely, thus reducing the latency of transactions that require the use of RDMA Read and Write operations. This document considers how to introduce Remote Invalidation to RPC-over-RDMA transport protocols.InfiniBand(TM) Architecture Specification Volume 1 Release 1.2 InfiniBand Trade AssociationRPC: Remote Procedure Call Protocol Specification Version 2This document describes the Open Network Computing (ONC) Remote Procedure Call (RPC) version 2 protocol as it is currently deployed and accepted. This document obsoletes RFC 1831. [STANDARDS-TRACK]Thanks to Christoph Hellwig of HGST and Devesh Sharma of Broadcom for suggesting this approach. Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 Working Group Chair Spencer Shepler, and nfsv4 Working Group Secretary Thomas Haynes for their support.