Network File System Version 4 W. Adamson
Internet-Draft NetApp
Intended status: Standards Track C. Lever, Ed.
Expires: August 13, 2017 Oracle
February 9, 2017

Trunking Discovery For Network File System Version 4.1


Connection trunking is the use of multiple transport connections to increase data and request throughput between one NFS client and server pair. This document describes a means for an NFS version 4.1 client to discover NFS version 4.1 server multipath addresses that may be used for connection trunking.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on August 13, 2017.

Copyright Notice

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents ( in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

Table of Contents

1. Introduction

Multiple transport connections can be established between an NFS client and server pair to improve the throughput of RPC operations or data transfer. These connections leverage the bandwidth of multiple network paths, potentially making use of more than one network interface or execution engine on both the client and server.

NFS version 4.1 defines two mechanisms for managing multiple transport connections between a single client-server pair. Section 2.10.5 of [RFC5661] defines "trunking" as the use of multiple transport connections to increase the speed of data transfer. Chapters 12 and 13 of that document introduce Parallel NFS (pNFS), wherein multiple transport connections may be established to pNFS Data Servers (DSs). This document refers to these multiple DS connections as "multipathing".

The NFSv4.1 GETDEVICEINFO operation enables multipathing among multiple pNFS Data Server (DS) network addresses. As noted in Section 13.5 of [RFC5661], if multiple network addresses appear in a multipath list, they designate the same Data Server. Given a such a list of multipath addresses, a client tests further for trunking support by sending an EXCHANGE_ID operation to each address in a multipath list and comparing the results.

The NFS version 4.1 protocol does not specify a similar means for an NFS version 4.1 client to discover multipath addresses to enable trunking for a pNFS Meta Data Server (MDS), nor for an NFS version 4.1 server where pNFS is not in use.

This document describes a mechanism for an NFS version 4.1 server to advertise multipath addresses that may be used for "connection trunking": establishing multiple transport connections outside the auspices of pNFS. This document does not discuss how an NFS client utilizes connection trunking to achieve better performance.

1.1. Clientid And Session Trunking

The initial interaction between an NFSv4.1 client and server is an exchange of the unique identities of both peers. During that exchange, the server presents the client with a token which the client uses as a shorthand for its identity during subsequent interactions with the server. This token is known as a client ID, which is returned to the client as a result in the NFSv4.1 EXCHANGE_ID operation.

The NFS version 4.1 protocol introduces the concept of a session. A session enables a server to manage state associated with each client independent of that client's transport connections, which are transient. Section 2.10.1 of [RFC5661] provides a detailed overview of sessions.

Each NFSv4.1 client is typically associated with one client ID. A client is allowed to instantiate multiple sessions, which are all associated with its client ID. This is referred to as client ID trunking.

An NFS version 4.1 client associates an otherwise unbound transport connection to an existing session by sending a BIND_CONN_TO_SESSION operation on that connection. It might do this if, for instance, a network partition caused the original transport connection associated with a session to be lost. Using BIND_CONN_TO_SESSION operations, more than one transport connection can be associated with, or trunked to, the same session. This is referred to as session trunking.

An NFS client can employ either client ID trunking or session trunking to trunk connections to a pNFS Meta Data Server or non-pNFS server.

2. Terminology

Client ID trunking

The association of multiple sessions to the same client ID.
Connection trunking

The use of multiple transport connections between a single NFS client and server pair, outside the context of pNFS. Includes client ID and session trunking.
fs_locations and fs_locations_info

File system attributes, retrieved via a GETATTR operation, that describe NFS server locations where a file system may be found.
Multipath address

A network address of an NFS version 4.1 server that may be used for connection trunking.

The use of multiple transport connections between a single NFS client and server pair, in the context of a pNFS layout.
pNFS Data Server

A storage service that stores only file data.
pNFS Meta Data Server

A storage service that manages pNFS layouts, which direct clients to pNFS Data Servers.
Pseudo file system

A read-only file system that bridges the non-accessible portions of a server's externally accessible file system namespace.

Alternative locations to be used to access data in place of, or in addition to, the current file system instance.
Session trunking

The association of multiple transport connections to the same session.

3. Discovering Multipath Addresses

3.1. Querying Locations

The fs_locations attribute (Section 11.9 [RFC5661]), and the fs_locations_info attribute (Section 11.10 [RFC5661]) provide a list of replica servers for an externally accessible file system. Section 11.4 of [RFC5661] defines replication as follows:

3.2. Pseudo File Systems

Section 7.3 of [RFC5661] describes the "pseudo file system" as a framework to present all exports for an NFS version 4.1 server in a single local namespace. The pseudo file system bridges the unexported portions of a server's local file system namespace providing a view of only externally accessible exported directories.

Because a pseudo file system holds a dynamically-constructed read-only local traversal path to all externally accessible file systems specific to that server, it is not normally a candidate for any fs_locations nor fs_locations_info query. This includes queries for replication or migration information, as a server's pseudo file system is never replicated or migrated because it is unique to that server.

3.3. Obtaining Multipath Information For Connection Trunking

Multipath addresses suitable for connection trunking are a server-wide resource, as they provide a means to reach all exported file systems on a server. The pseudo file system is a server-wide file system in the sense that it provides a traversal path to all exported file systems on a server.

Thus we define an fs_locations and fs_locations_info replica list on the pseudo file system as a list of multipath addresses for the server to be tested for connection trunking.

This scheme relies on a new restriction on the pseudo file system. The NFSv4.1 server exported pseudo file system root "/", as seen by clients, MUST NOT be migrated or replicated in a way that NFS clients can be aware of.

To guarantee a client is getting the location information from a server's pseudo file system, and not from a real file system on that server, the client MUST probe the root directory of the pseudo file system using GETATTR with the fs_locations or fs_locations_info attribute.

Clients can make good use of information about what transport type to use (eg. RDMA or TCP) for each multipath address, and some idea of the relative performance of each multipath address (eg. 10GbE, 40GbE, FDR RDMA, and so on). This class of information can be encoded in an fs_locations_info attribute, but is not conveyed in fs_locations.

The text in Section 11.10 of [RFC5661] suggests that the fs_locations attribute may be deprecated in favor of fs_locations_info.

Therefore, this document RECOMMENDs the use of fs_locations_info over fs_locations to convey the list of multipath addresses.

3.3.1. Constructing The Multipath List

A multi-homed server knows neither the connectivity nor the performance characteristics of the network path between a client and each of it's network interfaces. As such, the server SHOULD enumerate all of it's network interfaces in constructing the connection trunking multipath address list for the pseudo file system. This allows each client to test each multipath address and make a connectivity and performance determination.

Mixing slow and fast transports in connection trunking can be problematic if the client algorithm for choosing which trunked transport to use does not take transport characteristics into account. Indeed, Section 13.5 [RFC5661] notes that for DS multipath address the MDS SHOULD NOT mix slow and fast transports. For connection trunking multipath address list construction, the server should take the transport speed into consideration. An fs_locations_info multipath list can use fls_info flags [fsli4bx_flags] to communicate transport characteristics. An fs_locations multipath list depends on the following ordering of interfaces to convey some notion of transport characteristics: Constructing An fs_locations Multipath List

When creating an fs_locations pseudofs multipath replica list, the server fs_locations4 locations list SHOULD be ordered as described above in Section 3.3.1.

An entry in the fs_location4 server array is formed as defined in Section 11.9 [RFC5661].

The fs_locations4 fs_root and each fs_location4 rootpath MUST be set to "/" to indicate this fs_locations replica list is on the pseudo file system. Use of fs_locations_info FSLI4BX Flags With Connection Trunking

As noted in Section 3.1 both the fs_locations and fs_locations_info attributes are designed to describe alternative locations for exported file systems. The pseudo file system replica list describes a server-wide resource, so file system specific information encoded in the fs_locations_info attribute has no meaning.

When creating an fs_locations_info pseudofs multipath replica list, the server SHOULD NOT set the FSLI4BX_GFLAGS, FSLI4BX_CLSIMUL, FSLI4BX_CLHANDLE, FSLI4BX_CLFILEID, FSLI4BX_CLWRITEVER, FSLI4BX_CLCHANGE, nor FSLI4BX_CLREADDIR fs_locations_server4 fls_info flag fields. The client MUST ignore these flags.

File system specific information such as the meaning of the FSLI4BX RANK and ORDER values and read-only versus writeable file systems have no meaning for the connection trunking fs_locations_info multipath list. There is information beyond the multipath address that is useful to the client that can be expressed in the RANK and ORDER values. We arbitrarily choose to use the FSLI4BX_READRANK and FSLI4BX_READORDER values and redefine the meaning of FSLI4BX_READRANK and FSLI4BX_READORDER when used for connection trunking below.

The server SHOULD NOT set either the field at byte index FSLI4BX_WRITERANK nor FSLI4BX_WRITEORDER. The client MUST ignore these byte fields when interpreting the fs_locations_info multipath list.

Section 11.10.1 [RFC5661] describes the use of the server imposed rank and order file system values which overrides client preferences. The client connectivity characteristics of a multipath address are typically not visible to the server, so connection trunking mulipath lists do not interpret the FSLI4BX_READRANK or FSLI4BX_READORDER values as overriding client preferences, but rather as additional information that the client can use to setup connection trunking. The server SHOULD set the FSLI4BX_READRANK and FSLI4BX_READORDER fs_locations_server4 fls_info flag fields for each entry as follows.

The FSLI4BX_READRANK value is redefined as the server "interface index" with a unique value for each server interface. Two connection trunking fs_locations_server4 fls_info FSLI4BX_READRANK values that are the same indicates that the fs_locations_server4 entries refer to the same server interface. This can occur, for example, if a server interface has multiple IPv4 addresses, or an IPv4 and an IPv6 address assigned and entered in the connection trunking multipath list.

The FSLI4BX_READORDER value is redefined as the "relative interface performance". For connection trunking, the FSLI4BX_READORDER is no longer used for ordering within the FSLI4BX_READRANK value but instead orders the fs_locations_server4 fli_entries list. FSLI4BX_READORDER is a value that orders the server interface's relative performance with the higher performing interfaces having a larger FSLI4BX_READORDER value. This value MAY equal the transmit size of the Network Interface Card (NIC) e.g. a value of 40 for a 40G NIC. Constructing An fs_locations_info Multipath List

When creating an fs_locations_info connection trunking multipath list, the server fs_locations_item4 fli_entries list SHOULD be ordered as described above in Section 3.3.1 with the appropriate FSLI4BX_READRANK and FSLI4BX_READORDER fls_info values.

There is no FSLI4BX_TFLAG for ethernet, so for ethernet fs_locations_server4 entries the FSLI4BX_TFLAG is not set. The server MUST set the FSLI4BX_TFLAGS fls_info byte value to FSLI4TF_RDMA on an RDMA fs_locations_server4 entry.

The fs_locations_server4 fls_currency field has no meaning for a multipath list, and so SHOULD be set to zero. The client MUST ignore the fls_currency field.

The fs_locations_server4 fli_flags and flli_valid_for fields have no meaning for a multipath list, and so SHOULD be set to zero. The client MUST ignore the fli_flags and flli_valid_for fields.

The fs_locations_server4 fls_server is formed as described in Section 11.10.1 of [RFC5661].

The fs_locations_info connection trunking multipath list will consist of a single fs_locations_info4 fli_items entry, as all entries share a common rootpath, that of the pseudo file system. The fs_locations_info4 fli_fs_root and the fs_locations_item4 fli_rootpath MUST be set to "/" to confirm this fs_locations_info replica list is on the pseudo file system.

3.3.2. Querying for Multipath Information

Unlike the DS multipath list provided by GETDEVICEINFO, neither fs_locations nor fs_locations_info attributes has a client cache coherency feature. The client SHOULD query for multipath information on mount and reboot. The client SHOULD refresh the connection trunking multipath information whenever the connection goes away on one or more addresses without a reboot. The client MAY query every couple of hours or so to discover new multipath addresses.

The client MAY want to query every hour or so when a multipath list is not present to detect a newly instantiated list.

Section 11.9 of [RFC5661] When a multipath-capable client sends an fs_locations request to a (legacy) server that does not support the multipath list, the server SHOULD return a zero-length array of fs_location4 structures

A multipath-capable client can query a (legacy) server that supports the fs_locations or fs_locations_info attribute but does not support the connection trunking multipath list on the pseudo file system. In this case, the server SHOULD behave as Section 11.9 of [RFC5661] describes: the server SHOULD return an fs_locations4 data type with a zero-length locations array and the fs_root set to "/" on an fs_locations attribute query. For an fs_locations_info attribute query, the server SHOULD return a zero length fli_items array of fs_location_info4 structures with the fli_fs_root set to "/" and the fli_flags and fli_valid_for both set to zero.

3.3.3. Resolving Server Identity

Section 2.10.5 [RFC5661] describes how a client uses EXCHANGE_ID to resolve server identity ambiguity, and test for session and/or client ID trunking. Connection trunking uses these methods.

3.3.4. Connection Trunking Example

Here we provide an example exchange between a client and a multi-homed server. The example server has two 10G interfaces, a 1G interface, and a 40G RDMA interface. All interfaces have both IPv4 and IPv6 addresses assigned to them.

Following the rules in Section 3.3.1, the server orders it's interfaces and associated addresses to construct the connection trunking multipath address list as follows: The first 10G(IPv4) address, the second 10G(IPv4) address, the first 10G(IPv6) address, the second 10G(IPv6) address, the 1G(IPv4) address, the 1G(IPv6) address, the RDMA(IPv4) address, and finally, the RDMA(IPv6) address. This example server interface ordering is used for both the fs_locations and the fs_locations_info lists

The fs_locations list consists of an fs_locations4 structure with the fs_root set to "/" and a locations list where each fs_location4 entry has a rootpath value set to "/" and a server string representation of the interface addresses in the above example server interface list order.

The fs_locations_info list consists of an fs_locations_info4 struct with the fli_flags and fli_valid_for fields set to zero, the fli_fs_root root set to "/" and an fli_items list with one entry. The single fli_items fs_locations_item4 struct has the fli_rootpath set to "/" and an fs_locations_server4 struct for each item in the above example server interface list order. Each fs_locations_server4 structure in the list has the fls_currency set to zero, the fls_server is the same as the fs_locations server string, and the fls_info array set as described in Section and shown here:

Note that the fs_locations_info list provides more information than the fs_locations list as the FSLI4BX_READRANK identifies the interfaces, and the FSLI4BX_READORDER value is the network interface card size.

The client queries the server as described in Section 3.3.2 and parses the returned fs_locations or fs_locations_info multipath address list. The client may decide to ping a multipath address with a NULLPROC RPC to determine connectivity and round trip performance. An EXCHANGE_ID is then sent to each address that the client wants to test for connection trunking as described in Section 3.3.3.

4. Trunking Support For Other NFS Versions

NFS versions other than NFSv4.1 can also support trunking if they provide the following protocol features:

For example, NFSv4.2 can directly use the NFSv4.1 trunking support described in this document.

NFSv4.0 can provide client ID trunking by pinning the multipath list on the server's pseudo file system and using an fs_locations query as a retrieval mechanism as describe for NFSv4.1 in this document. NFSv4.0 can then use SETCLIENTID and SETCLIENTID_CONFIRM calls as described in Section 5.8 [RFC7931] to determine whether trunking is supported on a multipath address.

5. Security Considerations

The traditional NFS security model controls access to shared file systems based on a client's IP address. When multiple transport connections are in play, a client request can appear from any one of its network interfaces. Therefore, clients should rely on authentication of individual users to ensure share access is controlled appropriately. The client's IP address becomes ever less meaningful as a mode of access control.

An injection of the IP address of a man-in-the-middle system is easily done by replacing an IP address in a multipath list as a GETATTR(fs_locations) reply is conveyed back to a client. Recommendations to protect GETATTR(fs_locations) [RFC5661] and SETCLIENTID [RFC7530] (and EXCHANGE_ID for NFSv4.1) with an integrity-protecting security service are key to preventing such an attack.

As an additional step, Section of [RFC5661] recommends that clients reliably verify a server's claims of trunking support for a session or client ID using strong authentication of the server that responds on each IP address in a multipath list.

6. IANA Considerations

There are no IANA considerations for this document.

7. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC5661] Shepler, S., Eisler, M. and D. Noveck, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010.
[RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, March 2015.
[RFC7931] Noveck, D., Shivam, P., Lever, C. and B. Baker, "NFSv4.0 Migration: Specification Update", RFC 7931, DOI 10.17487/RFC7931, July 2016.

Appendix A. Acknowledgments

Andy Adamson would like to thank NetApp, Inc. for its funding of his time on this project.

Authors' Addresses

William A. (Andy) Adamson NetApp 3629 Wagner Ridge Ct Ann Arbor, MI 48103 USA EMail:
Charles Lever (editor) Oracle Corporation 1015 Granger Avenue Ann Arbor, MI 48104 USA Phone: +1 248 816 6463 EMail: