NFSv4 D. Noveck, Ed.
Internet-Draft NetApp
Intended status: Informational P. Shivam
Expires: October 1, 2017 C. Lever
B. Baker
ORACLE
March 30, 2017

NFSv4 migration: Implementation Experience and Specification Issues
draft-ietf-nfsv4-migration-issues-12

Abstract

The migration feature of NFSv4 provides for moving responsibility for a single filesystem from one server to another, without disruption to clients. A number of problems in the specification of this feature in NFSv4.0 were resolved by the publication of RFC 7931. In addition, there are specification issues to be resolved with regard to the NFSv4.1 version of this feature which are discussed in this document.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on October 1, 2017.

Copyright Notice

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

This document. which deals with existing issues/problems in standards-track documents, is in the informational category, and while the facts it reports may have normative implications, any such normative significance reflects the readers' preferences. For example, we may report that the existing definition of migration for NFSv4.1 does not properly describe how migrating state is to be merged with existing state for the destination server. While it is to be expected that client and server implementers will judge this to be a situation that is best avoided, the judgment as to how pressing this issue should be considered is a judgment for the reader, and eventually the nfsv4 working group to make.

We do explore possible ways in which such issues can be avoided, with minimal negative effects, given that the working group has decided to address these issues, but the choice of exactly how to address these is best given effect in one or more standards-track documents and/or errata.

This document focuses on NFSv4.1, since the analogous issues for NFSv4.0 have already been addressed by the publication of [RFC7931]. Nevertheless, the history of these issues in NFSv4.0 is presented, since understanding the similarities and differences between these protocols may be helpful in deciding how best to address remaining issues.

2. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

In the context of this informational document, these normative keywords will always occur in the context of a quotation, most often direct but sometimes indirect. The context will make it clear whether the quotation is from:

3. NFSv4.0 Issues and Their Resolution

3.1. NFSv4.0 Issues

Many of the problems seen with Transparent State Migration derived from the inability of servers to determine whether two client IDs, issued on different servers, corresponded to the same client. This difficulty derived in turn from the common practice, recommended by [RFC7530], in which each client presented different client identification strings to different servers, rather than presenting the same identification string to all servers.

This practice, later referred to as the "non-uniform" client string approach, derived from concern that, since NFSv4.0 provided no means to determine whether two IP addresses correspond to the server, a single client connected to both might be confused by the fact that state changes made via one IP address might unexpectedly affect the state maintained with respect to the second IP address, thought of as a separate server

To avoid this unexpected behavior, clients used the non-uniform client id string approach. By doing so, a client connected to two different servers (or to two IP addresses connected to the same server) appeared to be two different servers. Since the server is under the impression that two different clients are involved, state changes made on each distinct IP address cannot be reflected on another.

However, by doing things this way, state migrated from server to server cannot be referred to the actual client which generated it, leading to confusion.

In addition to this core problem, the following issues with regard to Transparent State Migration needed to be addressed:

3.2. Resolution of NFSv4.0 Protocol Difficulties

The client string identification issue was addressed in [RFC7931] as follows:

Since all of the other issues noted in Section 3.1 were also addressed, publication of [RFC7931] updating [RFC7530] addressed all known issues with Transparent State Migration in NFSv4.0.

4. Issues for NFSv4.1

4.1. Issues to Address for NFSv4.1

Because NFSv4.1 embraces the uniform client-string approach, as advised by section 2.4 of [RFC5661], addressing migration issues is simpler, in that a shift in client id string models is not required. Instead, NFSv4 returns information in the EXCHANGE_ID response to enable trunking relationships to be determined by the client.

The other necessary part of addressing migration issues, providing for the server's merger of leases that relate to the same client, is not currently addressed by [RFC5661] and changes need to be made to make it clear that state needs to be appropriately merged as part of migration, to avoid multiple client IDs between a client-server pair.

In addition, there are a number of new features within NFSv4.1 whose relationship with migration needs to be clarified. Some examples:

Discussion of how to resolve these issues will appear in the sections below.

4.1.1. Addressing state merger in NFSv4.1

The existing treatment of state transfer in [RFC5661], has similar problems to that in [RFC7530] in that it assumes that the state for multiple filesystems formerly on different servers will not be merged so that it appears under a single common client ID. We've already seen the reasons that this is a problem with regard to NFSv4.0.

Although we don't have the problems stemming from the non-uniform client-string approach, there are a number of complexities in the existing treatment of state management in the section entitled "Lock State and File System Transitions" in [RFC5661] that make this non-trivial to address:

To summarize, there is a need for an NFSv4.1 treatment of Transparent State Migration that is an extension of that in [RFC7931] and that includes appropriate handling for NFSv4.1 features such as trunking.

4.1.2. Addressing pNFS relationship with migration

This is made difficult because, within the pNFS framework, migration might mean any of several things:

Migration needs to support both the first and last of these models.

4.1.3. Addressing server_owner changes in NFSv4.1

Section 2.10.5 of [RFC5661] states the following.

While this paragraph is literally true in that such reconfiguration events can happen and clients have to deal with them, it is confusing in that it can be read as suggesting that clients have to deal with them without disruption, which in general is impossible.

A clearer alternative would be:

4.1.4. Addressing Confirmation Status of Migrated Client IDs in NFSv4.1

When a client ID is transferred between systems as a part of migration, it is not always clear whether it should be considered confirmed or unconfirmed on the target server. In the case in which an associated session is transferred together with the client ID, it is clear that the transferred client ID needs to be considered confirmed, as the existence of an associated session is incompatible with an unconfirmed client ID.

The case in which a client ID is transferred without an associated session is less clear-cut and there needs to be a choice between two possibilities:

A related issue concerns the potential use the SEQ4_STATUS flags to determine whether all or some of the state present on the source has been transferred the destination server. This could be done using either of the alternatives above but it is more in the spirit of the second alternative. One potential use of these flags is discussed in more detail in Section 4.2.2.

4.1.5. Addressing Session Migration in NFSv4.1

Some issues that need to be addressed regard the migration of sessions, in addition to client IDs and stateids

4.2. Possible Resolutions for NFSv4.1 Issues

The subsections below explore some ways of dealing with the issues discussed in Section 4.1

First we introduce some terminology we will be using in these sections:

4.2.1. Server Responsibilities in Effecting Transparent State Migration

The basic responsibility of the source server in effecting Transparent State Migration is to make available to the destination server a description of each piece of locking state associated with the file system being migrated. In addition to client id string and verifier, the source server needs to provide. for each stateid:

A further server responsibility concerns locks that are revoked or otherwise lost during the process of file system migration. Because locks that appear to be lost during the process of migration will be reclaimed by the client, the servers have to take steps to ensure that locks revoked soon before or soon after migration are not inadvertently allowed to be reclaimed in situations in which the continuity of lock possession cannot be assured.

A further responsibility of the servers concerns situations in which stateid cannot be transferred transparently because it conflicts with an existing stateid held by the client and associated with a different file systems. In this case there are two valid choices:

4.2.2. Determining Initial Migration Status in NFSv4.1

This section proposes a way in which a client which receives NFS4ERR_MOVED can determine:

This is written assuming that the second option regarding client ID confirmation status after migration (as discussed in Section 4.1.4) is adopted. However that choice is not essential to the procedure and could be changed.

The process begins by the client examining the location entries using either of the location attributes. For those whose fs name matches that currently being used, an EXCHANGE_ID is directed at the location address and the server_owner and scope used to determine if the entry is trunkable with that previously being used to access the file system (i.e. that it represents another path to the same file system and can share locking state with it). If it is, then this should be treated as a transition from one set of paths to another, as described in Section 4.2.4, rather than a migration event.

Otherwuse, if one or more of the EXCHANGE_ID operations above has encountered a distinct server, then migration has occurred and the procedure continues. If there were no location entries with a matching fs name, then one with another fs name is selected, an EXCHANGE_ID is done, and the procedure continues using the result of that operation.

The determination of whether Transparent State Migration has occurred is driven by the client ID returned and its confirmation status.

In the state merger case, it is possible that the server has not attempted Transparent State Migration, in which case state may have been lost without it being reflected in the SEQ4_STATUS bits. To determine whether this has happened, the client can use TEST_STATEID to check whether the stateids created on the source server are still accessible on the destination server. Once a single stateid is found to have been successfully transferred, the client can conclude that Transparent State Migration was begun and any failure to transport all of the stateids will be reflected in the SEQ4_STATUS bits.

In any of the cases in which Transparent State Migration has occurred, it is possible that a session was transferred as well. To deal with that possibility, clients can, after doing the EXCHANGE_ID, issue a BIND_CONN_TO_SESSION to connect the transferred session to a connection to the new server. If that fails, it is an indication that the session was not transferred and that a new session needs to be created to take its place.

4.2.3. Client Response to Migration in NFSv4.1

Once the client has determined the initial migration status, it needs to re-establish its lock state, if possible. To enable this to happen without loss of the guarantees normally provided by locking, the destination server needs to implement a per-fs grace period in all cases in which lock state was lost, including those in which Transparent State Migration was not implemented.

The following cases need to be dealt with:

For all of the cases above, RECLAIM_COMPLETE with an rca_one_fs value of true should be done before normal use of the file system including obtaining new locks for the file system. This applies even if no locks were lost and needed to be reclaimed.

4.2.4. Dealing with Multiple Location Entries

The possibility that more than one server address may be present in location attributes requires further clarification. This is particularly the case, given the potential role of trunking for NFSv4.1, whose connection to migration needs to be clarified.

The description of the location attributes in [RFC5661], while it indicates that multiple address entries in these attributes may be used to indicate alternate paths to the file system, does so mainly in the context of replication and does so without mentioning trunking. The discussion of migration does not discuss the possibility of multiple location entries or trunking, which we will explore here.

We will cover cases in which multiple addresses appear directly in the attributes as well as those in which the multiple addresses result because a single location entry is expanded into multiple location elements using addresses provided by DNS.

When the set of valid location elements by which a file system may be accessed changes, migration need not be involved. Some cases to consider:

When a specific server address being used becomes unavailable to service a particular file system, NF4ERR_MOVED will be returned, and the client will respond based on the available locations. Whether continuity of locking state will be available depends on a number of factors:

One should note the following differences between migration with Transparent State Migration and the similar cases in which there is a continuity of locking state with no change in the server.

4.2.5. Client Recovery from Migration Events

When a file system is migrated, there a number of migration-related status indications with which clients need to deal:

Unlike the case of NFSv4.0 in which the corresponding conditions are both errors, in NFSv4.1 the client can, and often will, receive both indications on the same request. As a result, the question of how to co-ordinate the necessary recovery actions when both indications arrive simultaneously must be resolved. It should be noted that when the server decides whether SEQ4_STATUS_LEASE_MOVED is ti be set, it has no way of knowing which file system will be referenced or whether NFS4ERR_MOVED will be returned.

While it is true that, when only a single migrated file system is involved, a single set of actions will clear both indications, the possibility of multiple migrated file systems calls for an approach in which there are separate recovery actions for each indication. In general, the response to neither indication can be subsumed within the other since:

Similar considerations apply to other arrangements in which one of the indications, while not ignored per se, is subsumed within a single recovery process focused on recovery for the other indication.

Generally speaking, client recovery for these indications should have the following characteristics:

4.2.6. The Migration Discovery Process

As noted above, LEASE_MOVED indications are best dealt with in a migration discovery thread. Because of this structure,

This leaves a potential difficulty in situations in which the migration discovery thread is near to completion but is still operating. One should not ignore a LEASE_MOVED indication if the discovery thread is not able to respond to migrated file system without additional aid. A further difficulty in addressing such situation is that a LEASE_MOVED indication may reflect the server's state at the time the SEQUENCE operation was processed, which may be different from that in effect at the time the response is received.

A useful approach to this issue involves the use of separate externally-visible discovery thread states representing non-operation, normal operation, and completion/verification of migration discovery processing.

Within that framework, discovery thread processing would proceed as follows.

When the request used in the completion/verification state completes:

4.2.7. Synchronzing Session Transfer

When transferring state between the source and destination, the issues discussed in Section 7.2 of [RFC7931] must still be attended to. In this case, the use of NFS4ERR_DELAY is still necessary in NFSv4.1, as it was in NFSv4.0, to prevent locking state changing while it is being transferred.

There are a number of important differences in the NFS4.1 context:

As a result, when sessions are not transferred, the techniques discussed in [RFC7931] are adequate and will not be further discussed.

When sessions are transferred, there are a number of issues that pose challenges since,

As a result, when the filesystem state might otherwise be considered unmodifiable, the client might have any number of in-flight requests, each of which is capable of changing session state, which may be of a number of types:

  1. Those requests that were processed on the migrating file system, before migration began.
  2. Those requests which got the error NFS4ERR_DELAY because the file system being accessed was in the process of being migrated.
  3. Those requests which got the error NFS4ERR_MOVED because the file system being accessed had been migrated.
  4. Those requests that accessed the migrating file system, in order to obtain location or status information.
  5. Those requests that did not reference the migrating file system.

It should be noted that the history of any particular slot is likely to include a number of these request classes. In the case in which a session which is migrated is used by filesystems other than the one migrated, requests of class 5 may be common and be the last request processed, for many slots.

Since session state can change even after the locking state has been fixed as part of the migration process, the session state known to the client could be different from that on the destination server, which necessarily reflects the session state on the source server, at an earlier time. In deciding how to deal with this situation, it is helpful to distinguish between two sorts of behavioral consequences of the choice of initial sequence ID values.

One part of adapting to these sorts of issues would restrict enforcement of normal slot sequence enforcement semantics until the client itself, by issuing a request using a particular slot on the destination server, established the new starting sequence for that slot on the migrated session.

An important issue is that the specification needs to take note of all potential COMPOUNDs, even if they might be unlikely in practice. For example, a COMPOUND is allow to access multiple file systems and might perform non-idempotent operations in some of them before accessing a file system being migrated. Also, a COMPOUND may return considerable data in the response, before being rejected with NFS4ERR_DELAY or NFS4ERR_MOVED, and may in addition be marked as sa_cachethis.

Some possibilities that need to be considered to address the issues:

4.2.8. Migration and pNFS

When pNFS is involved, migration is capable of supporting:

Migration of the MDS function is directly supported by Transparent State Migration. Layout state will normally be transparently transferred, just as other state is. As a result, Transparent State Migration provides a framework in which, given appropriate inter-MDS data transfer, one MDS can be substituted for another.

Migration of the file system function can be accomplished by recalling all layouts as part of the initial phase of the migration process. As a result, IO will be done through the MDS during the migration process, and new layouts can be granted once the client is interacting with the new MDS. An MDS can also effect this sort of transition by revoking all layouts as part of Transparent State Migration, as long as the client is notified about the loss of state.

In order to allow migration to a file system on which pNFS is not supported, clients need to be prepared for a situation in layouts are not available or supported on the destination file system and be prepared to direct IO request to the destination server, rather than depending on layouts being available.

Replacement of one DS by another is not addressed by migration as such but can be effected by an MDS recalling layouts for the DS to be replaced and issuing new ones to be served by the successor DS.

Migration may transfer a file system from a server which does not support pNFS to one which does. In order to properly adapt to this situation, clients which support pNFS, but function adequately in its absence, should check for pNFS support when a file system is migrated and be prepared to use pNFS when support is available.

5. Security Considerations

With regard to NFSv4.0, the Security Considerations section of [RFC7530] encourages clients to protect the integrity of the SECINFO operation, any GETATTR operation for the fs_locations attribute. A needed change is to include the operations SETCLIENTID/SETCLIENTID_CONFIRM as among those for which integrity protection is recommended. A migration recovery event can use any or all of these operations.

With regard to NFSv4.1, the Security Considerations section of [RFC5661] takes proper care of migration-related issues. No change is needed.

6. IANA Considerations

This document does not require actions by IANA.

7. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC5661] Shepler, S., Eisler, M. and D. Noveck, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010.
[RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, March 2015.
[RFC7931] Noveck, D., Shivam, P., Lever, C. and B. Baker, "NFSv4.0 Migration: Specification Update", RFC 7931, DOI 10.17487/RFC7931, July 2016.

Appendix A. Acknowledgements

The editor and authors of this document gratefully acknowledge the contributions of Trond Myklebust of NetApp and Robert Thurlow of Oracle. We also thank Tom Haynes of Primary Data and Spencer Shepler of Microsoft for their guidance and suggestions.

Special thanks go to members of the Oracle Solaris NFS team, especially Rick Mesta and James Wahlig, for their work implementing an NFSv4.0 migration prototype and identifying many of the issues documented here.

Authors' Addresses

David Noveck (editor) NetApp 26 Locust Avenue Lexington, MA 02421 US Phone: +1 781 572 8038 EMail: davenoveck@gmail.com
Piyush Shivam Oracle Corporation 5300 Riata Park Ct. Austin, TX 78727 US Phone: +1 512 401 1019 EMail: piyush.shivam@oracle.com
Charles Lever Oracle Corporation 1015 Granger Avenue Ann Arbor, MI 48104 US Phone: +1 248 614 5091 EMail: chuck.lever@oracle.com
Bill Baker Oracle Corporation 5300 Riata Park Ct. Austin, TX 78727 US Phone: +1 512 401 1081 EMail: bill.baker@oracle.com