NFSv4 D. Noveck, Ed.
Internet-Draft NetApp
Intended status: Informational P. Shivam
Expires: February 23, 2019 IBM
C. Lever
B. Baker
ORACLE
August 22, 2018

NFSv4 Migration and Trunking: Implementation and Specification Issues
draft-ietf-nfsv4-migration-issues-16

Abstract

This document discusses a range of implementation and specification issues concerning features related to the use of location-related attributes in NFSv4. These include migration, which transfers responsibility for a file system from one server to another, and trunking which deals with the discovery and control of the set of server endpoints to use to access a file system. The focus of the discussion, which relates to multiple minor versions, is on defining the appropriate clarifications and corrections for existing specifications.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on February 23, 2019.

Copyright Notice

Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

This is an informational document that discusses a number of related issues in multiple minor versions of NFSv4 (Network File System Version 4).

Many of these relate to the migration feature of NFSv4, which provides for moving responsibility for a single filesystem from one server to another, without disruption to clients. A number of problems in the specification of this feature in NFSv4.0 were resolved by the publication of [RFC7931], which added trunking detection to NFSV4.0. However, NFSv4.0 remains without an appropriate discussion of trunking discovery, which has many important connections with migration. As a result, NFSv4.0 requires clarification of how the client is to respond to changes in the trunking arrangements to use, both when migration occurs and when it does not.

In addition, there are specification issues to be resolved with regard to the NFSv4.1 version of these features which are discussed in this document.

All of the issues discussed relate to the handling and interpretation of the location-related attributes fs_locations and fs_locations_info and to the proper client and server handling of changes in the values of these attributes

These issues are all related to the protocol features for effecting file system migration, or to trunking discovery but it is not possible to treat each of these features in isolation. These features are inherently linked because migration needs to deal with the possibility of multiple server addresses in location attributes and because location attributes, which provide trunking-related information, may change, which might or might not involve migration.

2. Language

2.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

2.2. Use of Normative Terms

This document, which deals with existing issues/problems in standards-track documents, is in the informational category, and while the facts it reports may have normative implications, any such normative significance is left for the readers to determine. For example, we may report that the existing definition of migration for NFSv4.1 does not properly describe how migrating state is to be merged with existing state for the destination server. While it is to be expected that client and server implementers will judge this to be a situation that it would be appropriate to resolve, the judgment as to how pressing this issue should be considered is a judgment for the reader, and eventually the nfsv4 working group to make.

We do explore possible ways in which such issues can be dealt with, with minimal negative effects, given that the working group has decided to address these issues, but the choice of exactly how to address these is best given effect in one or more standards-track documents and/or errata.

In the context of this informational document, these normative keywords will generally occur in the context of a quotation, most often direct but sometimes indirect. The context will make it clear whether the quotation is from:

An additional possibility is that these terms may appear in a proposed or possible text to serve as a replacement for some part of a current protocol specification. Sometimes, a number of possible alternative texts may be listed and benefits and detriments of each examined in turn.

2.3. Terminology Used in this Document

In this document the phrase "client ID" always refers to the 64-bit shorthand identifier assigned by the server (a clientid4) and never to the structure which the client uses to identify itself to the server (called an nfs_client_id4 or client_owner in NFSv4.0 and NFSv4.1 respectively). The opaque identifier within those structures is referred to as a client id string".

Regarding the discussion of potential network endpoints, we use the following terminology:

Regarding trunking of connections to server network endpoints, we use the following terminology:

Regarding terminology relating to attributes used in trunking discovery and other multi-server namespace features:

Each set of server-trunkable location elements defines the available access paths to a particular file system. When there are multiple such file systems, each of these, which contains the same data, is a replica of the others. Logically, such replication is symmetric, since the fs currently in use and an alternate fs are replicas of each other. Often, in other documents, the term "replica" is not applied to the fs currently in use, despite the fact that the replication relation is inherently symmetric.

3. Issues that Apply to Multiple Minor Versions and their Resolution

Although there are a common set of issues that need to be addressed, the differences between NFSv4.0 and NFSv4.1 means that the detailed handling of these issues will be significantly different in each protocol.

In order to accommodate this situation, this section will deal with the commonalities across protocol minor versions while the specifics appropriate to each minor version are dealt with in Sections 4 and 5 respectively.

3.1. Issue Summary

Many of these issues arise from a lack of clarity regarding the meaning of and proper handling for location attributes that specify more than a single server address. Such situations can arise as a result of multiple entries in the same attribute or because a single entry has a server name which, when processed by DNS, is mapped to multiple server addresses.

Another set of issues arises from the fact that many of the facilities that must deal with multiple network addresses assume there is only a single connection type shared by all of the addresses. It is necessary to deal with a mixture of connection types.

Both [RFC7530] and [RFC5661] indicate that multiple addresses may be present and that these addresses may be different paths to the same server as well as different copies of the same data. However, the following issues have, for both protocols, interfered with the recognition of the existing location attributes as a way of providing a trunking discovery function:

In addition, there are factors regarding trunking that relate to specific protocol versions and documents:

The issues that need to be addressed for both versions are:

Note that although these issues need to be addressed for both protocols, the resolutions need not be the same and the protocol facilities within each protocol may limit the completeness of the resolutions provided.

3.2. Resolution of Multi-Version Issues

Although the specifics of addressing these issues will be different for different versions, there are some common aspects discussed in the subsections below:

3.2.1. Providing Trunking Discovery

A client can discover a set network addresses to use to access a file system using an NFSv4 server in a number of ways:

Note that the last two of these are usable in situations in which NFS4ERR_MOVED was returned. Note that this does not necessarily mean that migration has occurred since there may be a shift in the set of network addresses to be used without changing to a different server. See Section 3.2.3 for further discussion.

Which of the above means of providing trunking information is appropriate to use in a given environment will depend on security considerations, the possible need for the server to direct different clients to different sets of addresses, and the availability of trunking detection facilities on the clients.

With regard to security, the possibility that requests to determine the set of network addresses corresponding to a given server might be interfered with or have their responses corrupted needs to be taken account of. As a result, when use of DNSSEC is not available, it might not be advisable to present server names in location attributes and present the network addresses directly, eliminating the need to use DNS to effect this translation. Fetching of location attributes should be done with integrity protection.

In many cases, the server will provide all the network addresses to be used to access a given server, allowing the client to select the address or set of addresses most suited to its purposes. However, in some situations, the server will want to direct clients to use specific sets of network addresses to effect load balancing, to meet quality-of-service goals, or to optimize use of clustered servers by directing traffic to the cluster element most able to handle it efficiently. In such environments, presentation of network addresses directly in the location attribute can help give the server the necessary control over the paths to be used when accessing particular file systems. When such techniques are used, servers typically present their own network addresses in the location attribute while adding the names of other servers, such as those used to access replicas.

Trunking detection allows the client to determine whether two network addresses can be used to access the same server. The availability of trunking detection depends on the protocol version, and, in some case, on client implementation choices:

As a result, direct presentation of network addresses in location entries may be problematic for NFSv4.0, since some clients might not have the trunking detection facilities that allow them to take advantage of this information. For further discussion of issues related to NFSv4.0, see Section 4.4.

3.2.2. Addressing Changes in Trunking Configuration

When the client is capable of finding out a set of network addresses to use in accessing a server, it is always possible for that set to change.

This sometimes requires that a network address previously used to access a server becomes invalid for that purpose. This requires a way of notifying the client and a way for the client to adapt to this change by using a new set of network addresses to access the server. This will involve recovery much like that for migration although the same server and file system is used throughout.

3.2.3. Interaction of Trunking and Migration

When the set of network addresses designated by a location attribute changes, NFS4ERR_MOVED may or may not result, and in some of the cases in which it is returned migration will occur, while in others there a shift in the network addresses used to access a particular file system with no migration.

When migration does occur, multiple addresses may be in use on the server previous to migration and multiple addresses may be available for use on the destination server.

With regard to the server in use, it may be that return of NFS4ERR_MOVED indicates that a particular network address is no longer to be used, without implying that migration of the file system to a different server is needed In light of this possibility, clients are best off not concluding that migration has occurred until concluding that all the network addresses known to be associated with the server are not usable.

It should be noted that the need to defer this determination is not absolute. If a client is not aware of all network addresses for any reason, if may conclude that migration has occurred when it has not and treat a switch to a different server address as if it were a migration event. This is generally harmless since the use of the same server via a new address will appear as a successful Transparent State Migration.

While significant harm will not arise from this misapprehension, it can give rise to disconcerting situations. For example, if a lock has been revoked during the address shift, it will appear to the client as if the lock has been lost during migration, normally calling for it to be recoverable via an fs-specific grace period associated with the migration event.

With regard to the destination server, it is desirable for the client to be aware of all the valid network addresses that can be used to access the destination server. However, there is no need for this to be done immediately. Implementations can process the additional location elements in parallel with normal use of the first valid location entry found to access the destination.

Because a location attribute may include entries relating to the current server, the migration destination and possible replicas to use, scanning for available network addresses could potentially be a long process. The following list of helpful practices, here presented as suggestions, could become RECOMMENDATIONs or REQUIREMENTs in future standards-track documents

3.2.4. Dealing with Multiple Connection Types

Because of the use of RPC-over-RDMA [RFC8166] as an underlying transport for NFSV4, as described in [RFC8267], a client may have multiple connection types to the same server network address. This gives rise to a number of issues with regard to NFSv4 multi-server namespace features.

Although the situation is similar for both protocol versions, differences in the attributes supported may result in important differences in how connection types are selected.

In addition to the selection of an appropriate connection type to use when multiple connection types are available, the simultaneous availability of multiple connection types raises issues related to trunking, in the same way as the availability of multiple network addresses connected to the same server. These issues, including the relationship of such trunking to migration, might could potentially be dealt differently within NFSv4.0 and NFSv4.1, although similar treatment is desirable. The treatment of these issues is discussed in Sections 4.4.1 and 5.2.5 respectively.

Note that the handling of trunking for NFSv4.0 and for an NFSv4.1 metadata server differs from that for an NFSv4.1 data server. In that latter case, specification of trunking patterns including the connection type of endpoints is under the control of the metadata server and the client simply uses the information presented by the metadata server to guide selection of the endpoints to be accessed.

One potential difference between the versions that needs to be resolved concerns the issue of the trunking of multiple connections directed to endpoints that share a network address while differing as to connection type. While NFSv4.1 is specified in [RFC5661] as requiring that such connections be trunkable, neither [RFC7530] nor [RFC7931] contains a corresponding statement.

4. NFSv4.0 Issues

4.1. Core NFSv4.0 Migration Issues

Many of the problems seen with Transparent State Migration derived from the inability of NFSv4.0servers to determine whether two client IDs, issued on different servers, corresponded to the same client. This difficulty derived in turn from the common practice, recommended by [RFC7530], in which each client presented different client identification strings to different servers, rather than presenting the same identification string to all servers.

This practice, later referred to as the "non-uniform" client id string approach, derived from concern that, since NFSv4.0 provided no means to determine whether two IP addresses correspond to the server, a single client connected to both might be confused by the fact that state changes made via one IP address might unexpectedly affect the state maintained with respect to the second IP address, thought of as a separate server

To avoid this unexpected behavior, clients used the non-uniform client id string approach. By doing so, a client connected to two different servers (or to two IP addresses connected to the same server) appeared to be two different servers. Since the server is under the impression that two different clients are involved, state changes made on each distinct IP address cannot be reflected on another.

However, by doing things in that way, state migrated from server to server cannot be referred to the actual client which generated it, leading to confusion.

In addition to this core problem, the following issues with regard to Transparent State Migration needed to be addressed:

4.2. Resolution of Core Migration Protocol Difficulties in NFSv4.0

The client string identification issue was addressed in [RFC7931] as follows:

Since all of the other issues noted in Section 4.1 were also addressed by [RFC7931], publication of that document updating [RFC7530] addressed all issues with Transparent State Migration in NFSv4.0 known at that time.

4.3. Additional NFSv4.0 Issues

In light of the fact that a large set of migration-specific issues were addressed by the publication of [RFC7931], the remaining issues derive from those mentioned in Section 3.1. These include:

4.4. Resolution of Additional NFSv4.0 Issues

One possible approach to addressing these issues would entail publication of an additional standards-track document updating [RFC7530].

Fortunately, it appears that all of the material to be updated appears in Section 8 of that document, whether it concerns the provision of trunking discovery or the interaction of trunking and migration. It also appears that none of the material to be updated is in sections updated by [RFC7931].

A review of the existing Section 8 of [RFC7530], shows the following sections as requiring significant attention:

Overall, it appears that, in addition to the revision of Sections 8.1 and 8.5, Section 8.4 need to be reorganized. One possible approach is to divide the material into sub-sections as follows:

4.4.1. Resolution of NFSv4.0 Issues with Multiple Connection Types

The existence of multiple connection types raises issues regarding how the connection type to be used is determined by the client. Such issues need to be addressed when a new server is accessed and also when NFS4ERR_MOVED is returned and a server endpoint is to be selected to access the current file system.

The absence of explicit support for multiple connection types within NFSv4.0 means that the client has a great deal of freedom in making this determination, although some implementation guidance could be provided. A client could attempt to establish a connection of each connection type and the connection type (or types) that it chooses. To make this an efficient process, servers which do not provide support for a particular connection type should promptly indicate that non-support. It should be the case that all server endpoints sharing a particular network address are to be considered trunkable,, even though currently neither [RFC7530] nor [RFC7931] explicitly states that.

The approach mentioned above should, in general, be usable in the cases of migration and referral, as well as for initial mount. Clients might well treat these situations differently, for example by using the type of the current connection as the initial type to try in the migration case, while not doing in other cases.

Situations in which NFS4ERR_MOVED is returned without requiring any shift in target network address require special attention, in order to allow a shift in the network endpoint to be used to be indicated even if there is no corresponding shift in network address. In the absence of multiple connection types, receiving NFS4ERR_MOVED when accessing one file system serves as an indication that that address is not to be used to access that file system subsequently, making it necessary to use other network addresses to access the file system, after migration or a shift in trunking patterns without migration.

Since NFSv4.0 does not provide any way for the server to specify the use of particular connection types, it might seem that there is no way for the server to direct such a shift. However, when NFS4ERR_MOVED is returned and the network address on which it was returned is still present in the location entries returned, a client may reasonably conclude that:

This gives the client a set of server endpoints to test for access to the filesystem. In cases in which there is already a connection established to that endpoint, file system access can be tested using a PUTFH within the target file system followed by a GETFH, which will either succeed or return NFS4ERR_MOVED depending on whether the endpoint used can validly access the file system. In other cases a connection will need to be established before such a test can be performed.

5. Issues for NFSv4.1 and Beyond

5.1. Trunking-focused Issues to Address for NFSv4.1

While the addition of trunking discovery will be addressed in the same way for both protocols, there are a number of cases in which there are issues where the specifics of v4.1 require special treatment:

5.1.1. Handling of Additional Information in fs_locations_info

The more extensive structure of the fs_locations_info attribute, as compared with fs_locations, means that a number of areas may need clarification, when fs_locations_info is used in connection with trunking discovery:

5.1.2. Addressing Server_owner Changes in NFSv4.1

This issue is addressed in [RFC5661], although it does not provide a clear description of the needed handling.

Section 2.10.5 of [RFC5661] states the following.

While this paragraph is literally true in that such reconfiguration events can happen and clients have to deal with them, it is confusing in that it can be read as suggesting that clients have to deal with them without disruption, which in general is impossible.

A clearer alternative would be:

5.2. Migration-related Issues to Address for NFSv4.1

Because NFSv4.1 embraces the uniform client-string approach, as advised by section 2.4 of [RFC5661], addressing migration issues is simpler, in that a shift in client id string models is not required. Instead, NFSv4 returns information in the EXCHANGE_ID response to enable trunking relationships to be determined by the client.

Despite this simplification, there are substantial issues that need to be dealt with:

Addressing migration in NFSv4.1 will also require adaptation of the approaches used in [RFC7931] to the NFSv4.1 environment including:

In addition, there are a number of new features within NFSv4.1 whose relationship with migration needs to be clarified. Some examples:

Discussion of how to resolve these issues will appear in the sections below.

5.2.1. Addressing State Merger in NFSv4.1

The existing treatment of state transfer in [RFC5661], has similar problems to that in [RFC7530] in that it assumes that the state for multiple filesystems formerly on different servers will not be merged so that it appears under a single common client ID. We've already seen the reasons that this is a problem with regard to NFSv4.0.

Although we don't have the problems stemming from the non-uniform client-string approach, there are a number of complexities in the existing treatment of state management in the section entitled "Lock State and File System Transitions" in [RFC5661] that make this non-trivial to address:

5.2.2. Addressing pNFS Relationship with Migration

This is made difficult because, within the pNFS framework, migration might mean any of several things:

Migration needs to support both the first and last of these models.

5.2.3. Addressing Confirmation Status of Migrated Client IDs in NFSv4.1

When a client ID is transferred between systems as a part of migration, it has never been clear whether it should be considered confirmed or unconfirmed on the target server. In the case in which an associated session is transferred together with the client ID, it is clear that the transferred client ID needs to be considered confirmed, as the existence of an associated session is incompatible with an unconfirmed client ID.

The case in which a client ID is transferred without an associated session is less clear-cut, particularly since the treatment of EXCHANGE_ID in [RFC5661] assumes that CREATE_SESSION is the only means by which a client id may be confirmed. While this assumption is valid in the absence of Transparent State Migration, implementation of migration means that if this assumption is maintained, it is not clear how migrated client IDs can be a accommodated. If this assumption were maintained, we would have to choose between the following two alternatives, regarding whether the client ID to be reported as confirmed when EXCHANGE_ID is used to register an already-known client_owner with the server.

Although the first approach makes it simpler for the client to determine whether there is an associated session transferred at the same time, it makes it more difficult to determine whether Transparent State Migration has occurred. Section 5.2.4.

In any case, adjustments will be required to deal with the fact that [RFC5661] currently assumes that a client id can only be confirmed by issuing a CREATE_SESSION. In order to properly deal with the status of migrated client ids, we have to distinguish among:

In [RFC5661] as it currently stands all of these are tied together and it is not obvious how migrated client IDs could be accommodated in this structure, and what changes are necessary to make this possible. For more discussion of this issue, see Section 5.3.1.

5.2.4. Addressing Session Migration in NFSv4.1

Some issues that need to be addressed regard the migration of sessions, in addition to client IDs and stateids

5.2.5. Dealing with Multiple Connection Types in NFSv4.1

The existence of multiple connection types raises issues regarding how the connection type to be used is determined by the client. Such issues need to be addressed when a new server is accessed and also when NFS4ERR_MOVED is returned and a server endpoint is to be selected to access the current file system.

The limited support for multiple connection types within NFSv4.1 means that a client can make this determination by first establishing a non-RDMA connection and then using the FSLI4TF_RDMA flag in the fs_locations_info attribute for the root file system to determine if an RDMA connection should be established. Such a connection can then, at the client's option, replace or remain trunked with the original connection.

The approach mentioned above should, in general, be usable in the cases of migration and referral, as well as for initial mount.

Situations in which NFS4ERR_MOVED is returned without requiring any shift in target network address require special attention, in order to allow a shift in the network endpoint to be used to be indicated even if there is no corresponding shift in network address. In the absence of multiple connection types, receiving NFS4ERR_MOVED when accessing one file system serves as an indication that that address is not to be used to access that file system subsequently, making it necessary to use other network addresses to access the file system, after migration or a shift in trunking patterns without migration.

Since NFSv4.1 only limited facilities for the server to specify the use of particular connection types, there are difficulties in directing such a shift. When NFS4ERR_MOVED is returned and the network address on which it was returned is still present in the location entries returned, a client may reasonably conclude that:

This generally allows client to determine set of server endpoints to be used to access the filesystem. In cases in which there is some ambiguity file system access can be tested by establishing a connection if not already present and then using a PUTFH within the target file system followed by a GETFH, which will either succeed or return NFS4ERR_MOVED depending on whether the endpoint used can validly access the file system.

5.3. Possible Resolutions for NFSv4.1 Protocol Issues

The subsections below explore some ways of dealing with clarifying the protocol to address issues discussed in Section 5.2

5.3.1. Client ID Confirmation Issues

As mentioned previously [RFC5661], makes no provision for client IDs that are confirmed other than through the use of CREATE_SESSION. For example Section 18.35 of [RFC5661] states:

In deciding how to address the status of migrated client IDs in the case of Transparent State Migration, we should avoid giving undue weight to the last sentence of the above simply because it is stated in the form of a normative requirement. We should instead focus on the reasons such terms (i.e. those defined by [RFC2119]) are to be used, to state interoperability constraints. In this case, the "MUST" applies to a conclusion based on the premise that a CREATE_SESSION must have been done to assure that the client ID is reliably known to the server.

In that light, let us consider a possible replacement, that treats confirmation by means of CREATE_SESSION as one of a number of possible means and avoids some the undesirable consequences of adherence to the current approach, originally conceived without taking state migration into account.

5.3.2. Dealing with Multiple Location Entries

The possibility that more than one server address may be present in location attributes requires further clarification. This is particularly the case, given the potential role of trunking for NFSv4.1, whose connection to migration needs to be clarified.

The description of the location attributes in [RFC5661], while it indicates that multiple address entries in these attributes may be used to indicate alternate paths to the file system, does so mainly in the context of replication and does so without mentioning trunking. The discussion of migration does not discuss the possibility of multiple location entries or trunking, which we will explore here.

We will cover cases in which multiple addresses appear directly in the attributes as well as those in which the multiple addresses result because a single location entry is expanded into multiple location elements using addresses provided by DNS.

When the set of valid location elements by which a file system may be accessed changes, migration need not be involved. Some cases to consider:

When a specific server address being used becomes unavailable to service a particular file system, NFS4ERR_MOVED will be returned, and the client will respond based on the available locations. Whether continuity of locking state will be available depends on a number of factors:

Whether migration has occurred or not, the client can use the procedure described in Section 5.5.3 to recover access to existing locking state and, in some cases, sessions.

One should note the following differences between migration with Transparent State Migration and the similar cases in which there is a continuity of locking state with no change in the server.

5.3.3. Migration and pNFS

When pNFS is involved, the protocol is capable of supporting:

Migration of the MDS function is directly supported by Transparent State Migration. Layout state will normally be transparently transferred, just as other state is. As a result, Transparent State Migration provides a framework in which, given appropriate inter-MDS data transfer, one MDS can be substituted for another.

Migration of the file system function as a whole can be accomplished by recalling all layouts as part of the initial phase of the migration process. As a result, IO will be done through the MDS during the migration process, and new layouts can be granted once the client is interacting with the new MDS. An MDS can also effect this sort of transition by revoking all layouts as part of Transparent State Migration, as long as the client is notified about the loss of state.

In order to allow migration to a file system on which pNFS is not supported, clients need to be prepared for a situation in which layouts are not available or supported on the destination file system and so direct IO requests to the destination server, rather than depending on layouts being available.

Replacement of one DS by another is not addressed by migration as such but can be effected by an MDS recalling layouts for the DS to be replaced and issuing new ones to be served by the successor DS.

Migration may transfer a file system from a server which does not support pNFS to one which does. In order to properly adapt to this situation, clients which support pNFS, but function adequately in its absence, should check for pNFS support when a file system is migrated and be prepared to use pNFS when support is available.

5.4. Defining Server Responsibilities for NFSv4.1 Transitions

The subsections below discuss server responsibilities in providing for the propagation of locking state when a file system is migrated.

Sections 5.4.1 and 5.4.2 discuss the responsibilities of source and destination servers in effecting the necessary transfer of information to support Transparent State Migration.

Section 5.4.3 discusses issues relating to the handling of state recovery using client-directed reclaim of existing locks, used when Transparent State Migration is not available

5.4.1. Server Responsibilities in Effecting Transparent State Migration

The basic responsibility of the source server in effecting Transparent State Migration is to make available to the destination server a description of each piece of locking state associated with the file system being migrated. In addition to client id string and verifier, the source server needs to provide, for each stateid:

A further server responsibility concerns locks that are revoked or otherwise lost during the process of file system migration. Because locks that appear to be lost during the process of migration will be reclaimed by the client, the servers have to take steps to ensure that locks revoked soon before or soon after migration are not inadvertently allowed to be reclaimed in situations in which the continuity of lock possession cannot be assured.

An additional responsibility of the cooperating servers concerns situations in which a stateid cannot be transferred transparently because it conflicts with an existing stateid held by the client and associated with a different file system. In this case there are two valid choices:

5.4.2. Synchronizing Session Transfer

When transferring state between the source and destination, the issues discussed in Section 7.2 of [RFC7931] must still be attended to. In this case, the use of NFS4ERR_DELAY is still necessary in NFSv4.1, as it was in NFSv4.0, to prevent locking state changing while it is being transferred.

There are a number of important differences in the NFS4.1 context:

As a result, when sessions are not transferred, the techniques discussed in [RFC7931] are adequate and will not be further discussed.

When sessions are transferred, there are a number of issues that pose challenges since,

As a result, when the filesystem state might otherwise be considered unmodifiable, the client might have any number of in-flight requests, each of which is capable of changing session state, which may be of a number of types:

  1. Those requests that were processed on the migrating file system, before migration began.
  2. Those requests which got the error NFS4ERR_DELAY because the file system being accessed was in the process of being migrated.
  3. Those requests which got the error NFS4ERR_MOVED because the file system being accessed had been migrated.
  4. Those requests that accessed the migrating file system, in order to obtain location or status information.
  5. Those requests that did not reference the migrating file system.

It should be noted that the history of any particular slot is likely to include a number of these request classes. In the case in which a session which is migrated is used by filesystems other than the one migrated, requests of class 5 may be common and be the last request processed, for many slots.

Since session state can change even after the locking state has been fixed as part of the migration process, the session state known to the client could be different from that on the destination server, which necessarily reflects the session state on the source server, at an earlier time. In deciding how to deal with this situation, it is helpful to distinguish between two sorts of behavioral consequences of the choice of initial sequence ID values.

One part of adapting to these sorts of issues would restrict enforcement of normal slot sequence enforcement semantics until the client itself, by issuing a request using a particular slot on the destination server, established the new starting sequence for that slot on the migrated session.

An important issue is that the specification needs to take note of all potential COMPOUNDs, even if they might be unlikely in practice. For example, a COMPOUND is allowed to access multiple file systems and might perform non-idempotent operations in some of them before accessing a file system being migrated. Also, a COMPOUND may return considerable data in the response, before being rejected with NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as sa_cachethis.

Some possibilities that need to be considered to address the issues:

5.4.3. Server Issues Dealing with RECLAIM_COMPLETE

When Transparent State Migration is not available, servers can provide a grace-period limited to a single file system, giving clients the opportunity to reestablish their locks, originally held on the source server, on the destination server, using the same reclaim options normally used to recover from a server restart.

As part of that process, clients need to signal the end of their contribution to the lock recovery process for a particular file system transition by using the RECLAIM_COMPLETE operation described in [RFC5661] specifying an rca_one_fs value of TRUE.

Since the publication of that document there have been a number of developments regarding the handling of this form of RECLAIM_COMPLETE that create issues that need to be addressed:

These developments, while troubling, do not raise any substantive difficulty, if the servers do not support fs migration. However, to enable file system migration to be implemented, some work must be done to make the rca_one_fs useful, while maintaining necessary compatibility with existing implementations.

5.5. Defining Client Responsibilities for NFSv4.1 Transitions

The subsections below discuss the responsibilities of the client in dealing with transition to a new server (migration) and to use of new network addresses in accessing existing servers.

5.5.1. Client Recovery from Migration Events

When a file system is migrated, there a number of migration-related status indications with which clients need to deal:

Unlike the case of NFSv4.0 in which the corresponding conditions are distinct errors and thus mutually exclusive, in NFSv4.1 the client can, and often will, receive both indications on the same request. As a result, implementations need to address the question of how to co-ordinate the necessary recovery actions when both indications arrive simultaneously. It should be noted that when the server decides whether SEQ4_STATUS_LEASE_MOVED is to be set, it has no way of knowing which file system will be referenced or whether NFS4ERR_MOVED will be returned.

While it is true that, when only a single migrated file system is involved, a single set of actions will clear both indications, the possibility of multiple migrated file systems calls for an approach in which there are separate recovery actions for each indication. In general, the response to neither indication can be subsumed within the other since:

Similar considerations apply to other arrangements in which one of the indications, while not ignored per se, is subsumed within a single recovery process focused on recovery for the other indication.

Although clients are free to decide on their own approaches to recovery, we will explore below an approach with the following characteristics:

5.5.2. The Migration Discovery Process

As noted above, LEASE_MOVED indications are best dealt with in a migration discovery thread. Because of this structure,

This leaves a potential difficulty in situations in which the migration discovery thread is near to completion but is still operating. One should not ignore a LEASE_MOVED indication if the discovery thread is not able to respond to migrated file system without additional aid. A further difficulty in addressing such situation is that a LEASE_MOVED indication may reflect the server's state at the time the SEQUENCE operation was processed, which may be different from that in effect at the time the response is received.

A useful approach to this issue involves the use of separate externally-visible discovery thread states representing non-operation, normal operation, and completion/verification of migration discovery processing.

Within that framework, discovery thread processing would proceed as follows.

When the request used in the completion/verification state completes:

5.5.3. Overview of Client Response to NFS4ERR_MOVED

This section outlines a way in which a client that receives NFS4ERR_MOVED can respond by using a new server or network address if one is available. As part of that process, it will determine:

During the first phase of this process, the client proceeds to examine location entries to find the initial network address it will use to continue access to the file system or its replacement. For each location entry that the client examines, the process consists of five steps:

  1. Performing an EXCHANGE_ID is directed at the location address. This operation is used to register the client-owner with the server, to obtain a client ID to be use subsequently to communicate with it, to obtain tat client ID's confirmation status and, to determine server_owner and scope for the purpose of determining if the entry is trunkable with that previously being used to access the file system (i.e. that it represents another path to the same file system and can share locking state with it).
  2. Making an initial determination of whether migration has occurred. The initial determination will be based on whether the EXCHANGE_ID results indicate that the current location element is server-trunkable with that used to access the file system when access was terminated by receiving NFS4ERR_MOVED. If it is, then migration has not occurred and the transition is dealt with, at least initially, as one involving continued access to the same file system on the same server through a new network address.
  3. Obtaining access to existing session state or creating new sessions. How this is done depends on the initial determination of whether migration has occurred and can be done as described in Section 5.5.4 in the case of migration or as described in Section 5.5.5 in the case of a network address transfer without migration.
  4. Verification of the trunking relationship assumed in step 2 as discussed in Section 2.10.5.1 of [RFC5661]. Although this step will generally confirm the initial determination, it is possible for verification to fail with the result that an initial determination that a network address shift (without migration) has occurred may be invalidated and migration determined to have occurred. There is no need to redo step 3 above, since it will be possible to continue use of the session established already.
  5. Obtaining access to existing locking state and/or reobtaining it. How this is done depends on the final determination of whether migration has occurred and can be done as described in Section 5.5.4 in the case of migration or as described in Section 5.5.5 in the case of a network address transfer without migration.

Once the initial address has been determined, clients are free to apply an abbreviated process to find additional addresses trunkable with it (clients may seek session-trunkable or server trunkable addresses depending on whether they support clientid trunking). During this later phase of the process, further location entries are examined using the abbreviated procedure specified below:

  1. Before the EXCHANGE_ID, the fs_name field is examined and if it does not match that currently being used, the entry is ignored. otherwise, one proceeds as specified by step 1 above,.
  2. In the case that the network address is session-trunkable with one used previously a BIND_CONN_TO_SESSION is used to access that session using new network address. Otherwise, or if the bind operation fails, a CREATE_SESSION is done.
  3. The verification procedure referred to in step 4 above is used. However, if it fails, the entry is ignored and the next available entry is used.

5.5.4. Obtaining Access to Sessions and State after Migration

In the event that migration has occurred, the determination of whether Transparent State Migration has occurred is driven by the client ID returned by the EXCHANGE_ID and the reported confirmation status.

Once the client ID has been obtained, it is necessary to obtain access to sessions to continue communication with the new server. In any of the cases in which Transparent State Migration has occurred, it is possible that a session was transferred as well. To deal with that possibility, clients can, after doing the EXCHANGE_ID, issue a BIND_CONN_TO_SESSION to connect the transferred session to a connection to the new server. If that fails, it is an indication that the session was not transferred and that a new session needs to be created to take its place.

In some situations, it is possible for a BIND_CONN_TO_SESSION to succeed without session migration having occurred. If state merger has taken place then the associated client ID may have already had a set of existing sessions, with it being possible that the sessionid of a given session is the same as one that might have been migrated. In that event, a BIND_CONN_TO_SESSION might succeed, even though there could have been no migration of the session with that sessionid.

Once the client has determined the initial migration status, and determined that there was a shift to a new server, it needs to re-establish its lock state, if possible. To enable this to happen without loss of the guarantees normally provided by locking, the destination server needs to implement a per-fs grace period in all cases in which lock state was lost, including those in which Transparent State Migration was not implemented.

Clients need to be deal with the following cases:

For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value of true should be done before normal use of the file system including obtaining new locks for the file system. This applies even if no locks were lost and needed to be reclaimed.

5.5.5. Obtaining Access to Sessions and State after Network Address Transfer

The case in which there is a transfer to a new network address without migration is similar to that described in Section 5.5.4 in that there is a need to obtain access to needed sessions and locking state. However, the details are simpler and will vary depending on the type of trunking between the address receiving NFS4ERR_MOVED and that to which the transfer is to be made

To make a session available for use, a BIND_CONN_TO_SESSION should be used to obtain access to the session previously in use. Only if this fails, should a CREATE_SESSION be done. While this procedure mirrors that in Section 5.5.4, there is an important difference in that preservation of the session is not purely optional but depends on the type of trunking.

Access to appropriate locking state should need no actions beyond access to the session. However. the SEQ4_STATUS bits should be checked for lost locking state, including the need to reclaim locks after a server reboot.

5.6. Resolution of NFSv4.1 Issues

One possibility is that addressing all of the NFSv4.1 issues would entail publication of a standards-track document updating [RFC5661].

Such a document would have three major elements:

In addition, there is a set of smaller changes necessary

The replacement for the existing section 11.7 would maintain most sections essentially as they are, only making minor changes to include server-trunking in the discussion. However, in some cases involving more significant changes to existing sub-sections, and potential new sub-sections are listed below:

The new section would present the NFSv4.1-equivalent of Transparent State Migration as described in [RFC7931]. This would address the issues presented in Section 5.2 along the lines suggested in Sections 5.3, 5.4, and 5.5.

5.7. Potential Protocol Extensions

There are a number possibilities to provides additional facilities related to issues discussed in this document using the protocol extension mechanisms described in [RFC8178]. These facilities relate to the handling of multiple connection types.

The possibility of additional connection types was not addressed in NFSv4.0, either in [RFC3530] or [RFC7530]. While the use of multiple connection types is allowed, facilities to determine the connection type to be used are sub-optimal and are expected to remain so.

In the case of NFSv4.1, there are facilities to aid in the determination of connection types that can be used. However, such facilities are limited to the two connection types already defined and may have weaknesses in dealing with changes in the set of connection types to be used and in selecting connections to be used, particularly in clustered server environments, in which the set of potential trunked server endpoints can be large.

In light of this situation, it appears that a number of potential extensions to NFSv4 might be considered, as provided for by [RFC8178]. Such extensions could take the form of additional OPTIONAL attributes. While these attributes would be part of NFSv4.2, the fact that there is no change in the set of REQUIRED features between NFSv4.1 and NFSv4.2 means that the upgrade path for clients and servers can be made relatively simple.

The additional attributes sketched out below would provide a more complete way of addressing the possibility of trunking of a large set of server endpoints, of multiple connection types:

A fuller elaboration of these proposals would require the writing of one or more standards-track documents, assuming sufficient interest in proceeding along this route. Any such work would be separate from other work suggested to resolve existing protocol issues and will not be mentioned in Section 6.2

6. Evolution of Issue Handling

6.1. History of this Document

The contents of successive versions of this document have changed because new issues have been discovered, because there have been changes in our understanding of how these features should interact, and because some of the issues have been adequately addressed with regard to certain protocol versions.

As a result, it may be helpful to understand the history of these issues, which is complicated because multiple NFSv4 protocols have been involved.

This history can be summarized as follows

Although there is a need for further working group discussion and review, it appears that the issues to be dealt with have been identified and that most work to address these issues need to take place as part of the construction of one or more standards-track documents. See Section 6.2 for further information about possible approaches to providing the necessary documents.

6.2. Further Work Needed

The following table classifies issues in this area and indicates which are currently adequately addressed and where the protocol specifications need further correction or clarification. Where the topic is adequately addressed, a reference is given to the RFC providing support for the issue. In other cases, an area name (explained below) is given.

Vers. Trunking Detection Trunking Disc. State Migration Multiple Conn. Types Interaction of Trunking and Migration
v4.0 [RFC7931] TrDisc-0 [RFC7931] Mct-0 Int-0
v4.1+ [RFC5661] TrDisc-1 SM-1 Mct-1 Int-1

The following table explains the work that needs to be done corresponding to each area name above.

Area Name Description
TrDisc-0 Although it is possible for there to be multiple location entries for a given file system, the possibility of using these to enable trunking discovery was not addressed in [RFC7530], most likely because trunking was considered a problem to be avoided (rather than a helpful feature) at that time. This situation could have been addressed by the publication of [RFC7931] but unfortunately that did not happen.
TrDisc-1 Despite the fact that [RFC5661] provides a means of trunking detection, trunking discovery was not addressed. This problem was compounded by confusion regarding multiple file system replicas arising from the fact that multiple network addresses connected to the same server were treated as if they were referring to distinct sets of replicas.
SM-1 Unlike [RFC7530], which mishandled Transparent State Migration because of confusion arising from the lack of appropriate trunking support, [RFC5661] simply neglected to provide any description of this feature. It appears likely that confusion between the needs of migration and those of dealing with shifts in responsibility for clustered file system access had significant role in allowing this issue to be ignored. Rectifying this situation along the lines of [RFC7931] is complicated by the need to rewrite significant pieces of the section about multi-server namespace to address this confusion. Beyond this, the necessary treatment will need to reflect changes required by the use of the sessions model and related changes in NFSv.1 and also address migration-related issues raised by optional features such as pNFS and the fs_locations_info attribute. In addition to correcting the handling of Transparent State migration, work also needs to be done to address migration-related issues in the handling of RECLAIM_COMPLETE.
Mct-0 Even though protocol support for multiple connection types is quite limited in NFSv4.0, there still are multiple connection types specified and implemented. As a result, some guidance has to be given to allow interoperable implementations to be developed, and used, without extensive user configuration effort. This should include some treatment of situations in which the set of connection types to be used to access a given file system changes, requiring appropriate recovery from an NFS4ERR_MOVED error.
Mct-1 Even though protocol support for multiple connection types is more limited than one might like, there are helpful facilities that can be used to simplify the process of determining the connection type(s) to be used. The proper use of the available facilities needs to be clarified including examination of cases in which the set of connection types to be used to access a given file system changes, requiring appropriate recovery from an NFS4ERR_MOVED error.
Int-0 The need to provide trunking-related information puts additional focus on the issue of dealing with changes in the value of location-related attributes. This applies when trunking configurations change and at other times as well. In addition, the existence of multiple network addresses connected to the same server requires clarification when migration and replication features are used.
Int-1 This requires similar handling to the case above. However, further work is made necessary by the fact that shifts between different sets of network addresses are erroneously treated as instances of migration in [RFC5661].

There are number of possible ways of packaging the necessary changes into RFCs. Some of these are impractical for various reasons:

The alternative, of organizing the changes by minor version, is being actively pursued by work on following Standards Track working group documents:

These two documents will require additional review and discussion before proceeding to publication as Proposed standards, updating [RFC7530] and [RFC5661] respectively.

If the working group decides to continue along this path, it may be desirable to consolidate the changes currently specified in these documents. Currently, these document replace individual sub-sections of Section 8 (of [RFC7530]) or Section 11 (of [RFC5661]). While this is helpful in explaining what is changing and why, things might be different when the eventual RFC is published. At that point, it is could be judged more important to have simply understood specifications of NFS versions 4.0 and 4.1. At that point, a full replacement section of the affected section might be more desirable as the basis of the RFC to be published. Alternatively, that consolidation might be delayed and done later as part of publication of rfc7530bis and rfc5661bis documents.

7. Security Considerations

In general, the Security Considerations sections of existing specifications for NFS versions 4.0 and 4.1 provide recommendations for appropriate handling of requests obtaining location-related information. In particular, it is recommended that integrity protection be used when fetching location-related attributes:

Despite this however, there is a need for further changes in the Security Considerations with regard to both minor versions dealt with here. The following issues need to be addressed:

8. IANA Considerations

This document does not require actions by IANA.

9. References

9.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC2203] Eisler, M., Chiu, A. and L. Ling, "RPCSEC_GSS Protocol Specification", RFC 2203, DOI 10.17487/RFC2203, September 1997.
[RFC4033] Arends, R., Austein, R., Larson, M., Massey, D. and S. Rose, "DNS Security Introduction and Requirements", RFC 4033, DOI 10.17487/RFC4033, March 2005.
[RFC5403] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, DOI 10.17487/RFC5403, February 2009.
[RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol Specification Version 2", RFC 5531, DOI 10.17487/RFC5531, May 2009.
[RFC5661] Shepler, S., Eisler, M. and D. Noveck, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010.
[RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, March 2015.
[RFC7861] Adamson, A. and N. Williams, "Remote Procedure Call (RPC) Security Version 3", RFC 7861, DOI 10.17487/RFC7861, November 2016.
[RFC7931] Noveck, D., Shivam, P., Lever, C. and B. Baker, "NFSv4.0 Migration: Specification Update", RFC 7931, DOI 10.17487/RFC7931, July 2016.
[RFC8166] Lever, C., Simpson, W. and T. Talpey, "Remote Direct Memory Access Transport for Remote Procedure Call Version 1", RFC 8166, DOI 10.17487/RFC8166, June 2017.
[RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017.
[RFC8267] Lever, C., "Network File System (NFS) Upper-Layer Binding to RPC-over-RDMA Version 1", RFC 8267, DOI 10.17487/RFC8267, October 2017.

9.2. Informative References

[I-D.ietf-nfsv4-mv0-trunking-update] Lever, C. and D. Noveck, "NFS version 4.0 Trunking Update", Internet-Draft draft-ietf-nfsv4-mv0-trunking-update-01, July 2018.
[I-D.ietf-nfsv4-mv1-msns-update] Noveck, D. and C. Lever, "NFSv4.1 Update for Multi-Server Namespace", Internet-Draft draft-ietf-nfsv4-mv1-msns-update-01, June 2018.
[RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, C., Eisler, M. and D. Noveck, "Network File System (NFS) version 4 Protocol", RFC 3530, DOI 10.17487/RFC3530, April 2003.

Acknowledgments

The editor and authors of this document gratefully acknowledge the contributions of Trond Myklebust of Primary Data, Robert Thurlow of Oracle, and Andy Adamson of NetApp. We also thank Tom Haynes of Primary Data and Spencer Shepler of Microsoft for their guidance and suggestions.

Rick Macklem provided an analysis of the current description of RECLAIM_COMPLETE and information about its implemenation for which we are grateful.

Special thanks go to members of the Oracle Solaris NFS team, especially Rick Mesta and James Wahlig who were then part of that team, for their work implementing an NFSv4.0 migration prototype and identifying many of the issues documented here. Also, the work of Xuan Qi for Oracle using NFSv4.1 client and server prototypes was helpful.

Authors' Addresses

David Noveck (editor) NetApp 1601 Trapelo Road Waltham, MA 02451 US Phone: +1 781 572 8038 EMail: davenoveck@gmail.com
Piyush Shivam IBM Corporation 11501 Burnet Road Austin, TX 78758 US EMail: piyush.shivam@ibm.com
Charles Lever Oracle Corporation 1015 Granger Avenue Ann Arbor, MI 48104 US Phone: +1 248 614 5091 EMail: chuck.lever@oracle.com
Bill Baker Oracle Corporation 5300 Riata Park Ct. Austin, TX 78727 US Phone: +1 512 401 1081 EMail: bill.baker@oracle.com