Internet Engineering Task Force R. Cole Internet-Draft Johns Hopkins University Intended status: Informational D. Romascanu Expires: December 26, 2009 Avaya A. Bierman Netconf Central June 24, 2009 Robust Configuration Management within NETCONF draft-cole-netconf-robust-config-01 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 18, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Cole, et al. Expires December 18, 2009 [Page 1] Internet-Draft Robust Management June 2009 Abstract This document extends the capabilities of the NETCONF configuration management protocol to validate the configuration on servers and to perform a set of active tests (i.e., verification) against the server's running configuration over a period of time to afford the client and server a more robust and resilient configuration management capability. This is of value to commercial enterprise and public networks as well as wireless emergency and military networks. We propose an initial new NETCONF capability. We also explore the future alternatives for developing these capabilities within the context of the existing NETCONF protocol, the YANG modeling language and existing related IETF, IEEE and ITU-T standards. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Benefits of This Work . . . . . . . . . . . . . . . . . . 5 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 5 1.3. Outline . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 6 3. The Verified Commit Procedure . . . . . . . . . . . . . . . . 6 4. Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1. Phases . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5. Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 9.1. Normative References . . . . . . . . . . . . . . . . . . . 12 9.2. Informative References . . . . . . . . . . . . . . . . . . 12 Appendix A. Appendix A: Motivational Use Cases . . . . . . . . . 14 A.1. Use Case A: MANET . . . . . . . . . . . . . . . . . . . . 14 A.2. Use Case B: IpTables . . . . . . . . . . . . . . . . . . . 17 A.3. Use Case C: DTN . . . . . . . . . . . . . . . . . . . . . 19 A.4. Use Case D: Dual Homing . . . . . . . . . . . . . . . . . 21 Appendix B. Appendix B: Network-wide Upgrades . . . . . . . . . . 22 Appendix C. Appendix C: verify-commit.yang Module . . . . . . . . 24 Appendix D. Appendix D: Example ping.yang Module . . . . . . . . 29 Appendix E. Appendix E: Existing Capabilities . . . . . . . . . . 32 E.1. NETCONF Capabilities . . . . . . . . . . . . . . . . . . . 32 E.2. YANG Capabilities . . . . . . . . . . . . . . . . . . . . 34 E.3. RMON Capabilities . . . . . . . . . . . . . . . . . . . . 35 E.4. OAM for Carrier Class Ethernet . . . . . . . . . . . . . . 36 E.5. OAM for MPLS Services . . . . . . . . . . . . . . . . . . 36 E.6. Active Tests for Performance Monitoring . . . . . . . . . 37 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 37 Cole, et al. Expires December 18, 2009 [Page 2] Internet-Draft Robust Management June 2009 1. Introduction This document identifies enhancements to NETCONF and NETMOD capabilities to achieve a more robust model of configuration management for IETF systems. Most network management systems which are required to provide a highly robust network service rely upon some form of out-of-band access for configuration management. This provides an alternative management entry into devices in the event that in-band access is unavailable due to mis-configuration. However, not all network deployments can afford the luxury of alternative networks for management access to all networking devices, nor should this be necessary. Examples include Mobile Ad-Hoc Wireless Networks (MANETs) and other forms of Disruption Tolerant Networks (DTNs). All managed networks, as well, would benefit from a more robust configuration management capability from the IETF, e.g., to provide equivalent network reliability at reduced infrastructure costs. To accomplish this, the NETCONF protocol RFC 4741 [RFC4741], and its associated modeling language, i.e., YANG [YANG], need to more fully define and extend their capabilities to a) perform rules checking, i.e., Validation, against proposed configuration changes to be placed into running configuration and b) define active tests and success criteria, i.e., Verification, (from both the client and the servers) involving server-side running configuration. As an example, we envision a NETCONF client-server interaction model shown in the below figure. Here, as part of a new operation, the client passes a reference to the server indicating specification of network tests that the server executes as part of the NETCONF Verification testing process. Simultaneously, the client may also run a set of tests to gain confidence in the proposed configuration changes to the server. Once the server completes its test execution, it indicates success through a notification message. Once the client is comfortable with its own tests and those of the server, it issues the to the server which forces the server to commit to the proposed configuration change. The server indicates this in its reply to the client. Cole, et al. Expires December 18, 2009 [Page 3] Internet-Draft Robust Management June 2009 Client Server ------ ------ +------------------------------> Sets up config +------------------------------> Sets up test control --- +------------------------------> | Sends (set - timeout timeout) - test-template:instanceID | <-----------------------------+ | reply(OK) (running (run server-side tests) client-side tests) | (server-side test success) | <-----------------------------+ | notification | | | +-----------------------------> | Sends | | <-----------------------------+ | reply(OK) | --- Figure 1 NETCONF defines the term 'Validation' as the set of checks performed on proposed configuration code up to the point that the server places it into its running-configuration. We use the term 'Verification' as the act of performing active tests against configuration code in the running-configuration on the server. Verification tests can be executed from either the NETCONF client or the NETCONF server, or from a NETCONF server(a) against running configuration code on a NETCONF server(b), or all combinations. Cole, et al. Expires December 18, 2009 [Page 4] Internet-Draft Robust Management June 2009 1.1. Benefits of This Work Our objective is to further develop a robust and resilient network configuration capability, building upon the improvements afforded by the NETCONF protocol and it's associated modeling language, YANG. The envisioned benefits of expanded specification of Validation checking, Verification testing and extension of the Verification tests to the server include: o Minimize faulty configuration, o Minimize disconnects in networks with no 'out-of-band' access, e.g., wired-networks, wireless MANETs or DTNs. For example, information can flow over paths for which data transport is not possible. This can occur due to asymmetric links, mis- configuration of control and data protocols, mis-configured security filters allowing control but not data traffic, etc. The best way to test correctness of configuration is from the perspective of the server itself, the actual data or control path followed and the specific configuration objects affected. Appendix A presents a set of example use cases which illustrate benefits of enhanced NETCONF capabilities. o Provide opportunity for device modelers to associate/recommend tests tied to specific configuration items, and o Improve efficiency of coordinated network upgrades. (See the below discussion in Appendix B.) 1.2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1.3. Outline In the remainder of this document we next give a set of definitions to be adhered to for the remainder of this discussion. We then provide a new Verify Commit procedure which achieves initial aspects of the Robust-NETCONF capabilities. We then examine in the Framework section the options for Validation and Verification, associated test, test types, and test success definitions within existing standards. We identify areas within NETCONF and YANG where enhancements are required to support these new capabilities, e.g.,new operations, error messages and definitions. We also discuss potential security issues associated with the development of a more automated Validation Cole, et al. Expires December 18, 2009 [Page 5] Internet-Draft Robust Management June 2009 checking and Verification testing. We conclude with a section discussing a set of next steps. 2. Definitions In this section we provide a few definitions strictly adhered to throughout this document. The NETCONF specification maintains the following terms: o NETCONF Client (or Client) - this is the management application responsible for the configuration management of network devices. o NETCONF Server (or Server) - this is the device being managed in the network. We maintain the following distinction between Validation checks and Verification tests: o Validation checks - checking non-running configuration code against a set of rules, constraints or other requirements. This addresses the total set of checks performed prior to the Server placing the code into its running-configuration. o Verification tests - measuring behavior of running configuration code against a set of expectations or success criteria. This is generally performed through active testing and comparison of results against expectations. o Active measurements perform Verification while rule-based checks perform Validation. 3. The Verified Commit Procedure In this section we describe a protocol development which we refer to as the Verified Commit procedure. The Yang module for this procedure, i.e., verify-commit.yang, is listed in Appendix C below. This provides an initial example of the Robust-NETCONF capabilities. Basically, there would be a main YANG module (e.g. robust- config.yang) with three RPC operations and one or more notifications: o : start the verified commit on the server. o : cancel a verified commit in progress on the server. Cole, et al. Expires December 18, 2009 [Page 6] Internet-Draft Robust Management June 2009 o : serves as the second 'commit' in the confirmed commit procedures. o : notification eventType returning the current or termination status for the verification test(s). o : notification eventType returning the termination status for the start-verified-commit operation. Note: this capability has several prerequisites, including support for configuration and notifications. Additionally, there will be secondary modules for specific verification tests. Note: ultimately we need a way to run multiple tests, e.g., by making test-template a leaf-list instead of a leaf, but for now our examples specify a single test associated with each operation. We present our examples in terms of a ping.yang module sketched out in Appendix D below. So, the following client/server interaction model would hold: 1. Client sets up the configuration on all relevant agents. 2. Client sets up all the relevant test control configuration needed for the verification tests on all relevant agents. 3. Client sends to all agents with parameters (timeout:seconds, test-template:instance-identifier), i.e., 1800 /at:ping/at:pingControlEntry[at:pingControlIndex=42] Figure 2 4. Server returns . 5. The Server starts and runs the (e.g., ping) test with the specified (e.g., pingControlEntry) configuration subtree. Cole, et al. Expires December 18, 2009 [Page 7] Internet-Draft Robust Management June 2009 6. If success: The client has received good notifications (or by some means) and decides to complete the verified-commit. Every server is sent this PDU: Figure 3 7. If failure: The operator has received bad notifications (or by some means) and decides to cancel the verified-commit. Every agent is sent this PDU: Figure 4 8. Servers return . 9. Once the tests are complete the agent will send the notification (If the manager fails to do anything before the timeout occurs, then this notification will be sent as well.) 4. Framework Here we discuss the more general development of an enhanced Validation and Verification capability within NETCONF. Enhancements are probably not confined to NETCONF but may also include enhancements to YANG, or at least the development of new device models within YANG. Further, other working group activities and capabilities may be impacted or leveraged. Capabilities exist in current standards which are relevant to the discussion in this section. These existing standards are identified in Appendix E of this document. First, the NETCONF capability termed Verify-Commit is to be developed to provide for a initial Verification capability operating on the NETCONF server which can: Cole, et al. Expires December 18, 2009 [Page 8] Internet-Draft Robust Management June 2009 o Specify a set of tests to run on the server associated with specific configuration modifications. (Multiple methods are possible and under consideration, see below.) o One specific class of tests would be network tests (network test imply a set of active measurement probes injected into the network). This class of tests was considered above in the form of your ping-yang module example. This class would provide our first set of work tasks. Clearly, other test classes are in play associated with other types of managed objects, e.g., CPU monitoring associated with a managed host. o Define mechanisms to specify pass/fail criteria for Verification tests. There exist several options for the method to specify tests and their associated pass/fail criteria depending upon the network technologies or configuration objects in question. Potential specification options include: o Local or remote script specifications, i.e., the NETCONF operation could pass to the server an URL pointing to the script and passes a specification of 'success'. However, this carries a set of security concerns and this option should be deferred till more experience is gained with other methods. o Tests can be separately specified via a modeling model, similar to SSPM-MIB (for network test specification) but using YANG, and invoked on the server through the operation. This is the option we have opted for as our first protocol development activity. o Tests can be associated with specific configuration objects within the device's (YANG) model. The module developer passes on their expertise on the network configuration process by ``recommending'' specific tests tied to specific configuration objects within the YANG device model. Success criteria, but not specific values, are defined in module. Specific values for success criteria could be passed through, e.g., operations to the server. Second, the NETCONF ':validate' Validation capability is to be enhanced to provide for a more general specification of rules checking prior to placing configuration into the Server's running- configuration: Cole, et al. Expires December 18, 2009 [Page 9] Internet-Draft Robust Management June 2009 o Allow for general checking through reference to a document defining the additional Validation checks. This is similar to defining Verification tests through reference, only here the 'Verification' document defines additional checks on the non- running configuration code prior to storage on the server or moving into running configuration. Several objectives will be adhered to to help direct decisions as this work progresses. We list a set here, but expect these to evolve over the future revisions of the framework draft. These include: o Improve the resiliency of NETCONF by extending to a local-to-the- server automated test capability. o Leverage existing capabilities within NETCONF, YANG and other IETF, IEEE and ITU-T standards where possible. o Maintain the strong security model between the NETCONF client and server. o Consider enhancements which potentially simplify network-wide configuration upgrades as outlined in Appendix B of the NETCONF specification and discussed in Section 1.1 above. 4.1. Phases We fully expect this work to progress in a phased approach, starting from a simple set of enhancements and evolving towards more complex yet beneficial phases. As experience is gained in development of the earlier phases, this can be applied to the development of the capabilities associated with the later phases. The following set of phases are identified for this work plan: o Phase 1 - Allow the server to autonomously run connectivity tests and to have the server execute these during the 'start-verified- commit-timeout' period associated with a operation. If the server decides to fail the Verification testing it is performing then it needs to indicate this to the client prior to the timeout. o Phase 2 - Build more extensible YANG models of active test measurements and pass references to these tests and their success criteria within NETCONF operations. o Phase 3 - Build YANG device models which embedded recommended test associated explicitly with configuration objects. Here, the NETCONF operations would indicate whether the server should Cole, et al. Expires December 18, 2009 [Page 10] Internet-Draft Robust Management June 2009 execute recommended tests associated with the suggested changes to the values of configuration objects. o Phase 4 - Network-wide configuration changes or upgrades. This phase's work defines methods to leverage the improved Validation and Verification capabilities to develop simultaneous multiple device upgrades represented a network-wide upgrade (or back-out upon defined network-wide failure conditions). There is nothing requiring a strict adherence to the above phases; they are merely initial thoughts on simple steps evolving to more complex. Certainly some of the later phases could be under development prior to completing the earlier phases. 5. Next Steps We conclude this document with a brief discussion of the some of the challenges to performing and completing this work program. These include: o Identifying methods to specify specific Verification tests and Validation checks in a simple yet extensible fashion. o Can (and should) specific tests be tied to specific configuration parameters within the server's data models? o What are the security implications of this work and what security mechanisms need development? We conclude with a set of proposed first steps to move forward towards our objectives. E.g., o Continue to flesh out this draft, continuing the development of NETCONF extensions to support server-side active tests as part of a 'Verify-Commit' procedure as identified above. o Investigate YANG model to define active tests in an extensible manner. o Investigate YANG models which embed active tests within a YANG device model. o Other steps to be determined as this work evolves. Cole, et al. Expires December 18, 2009 [Page 11] Internet-Draft Robust Management June 2009 6. Acknowledgements 7. IANA Considerations This memo includes no request to IANA. All drafts are required to have an IANA considerations section (see the update of RFC 2434 [I-D.narten-iana-considerations-rfc2434bis] for a guide). If the draft does not require IANA to do anything, the section contains an explicit statement that this is the case (as above). If there are no requirements for IANA, the section will be removed during conversion into an RFC by the RFC Editor. 8. Security Considerations All drafts are required to have a security considerations section. See RFC 3552 [RFC3552] for a guide. This section addresses the security concerns and objectives for the development of a more robust ':confirmed-commit' capability within NETCONF. This section is currently TBD. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9.2. Informative References [802.1ag] IEEE 802.1, "IEEE 802.1ag - Connectivity Fault Management", September 2007. [802.3ah] IEEE 802.3, "IEEE 8023ah - Ethernet in the First Mile", December 2005. [I-D.narten-iana-considerations-rfc2434bis] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", draft-narten-iana-considerations-rfc2434bis-09 (work in progress), March 2008. [RFC2021] Waldbusser, S., "Remote Network Monitoring Management Information Base Version 2 using SMIv2", RFC 2021, Cole, et al. Expires December 18, 2009 [Page 12] Internet-Draft Robust Management June 2009 January 1997. [RFC2074] Bierman, A. and R. Iddon, "Remote Network Monitoring MIB Protocol Identifiers", RFC 2074, January 1997. [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003. [RFC3577] Waldbusser, S., Cole, R., Kalbfleisch, C., and D. Romascanu, "Introduction to the Remote Monitoring (RMON) Family of MIB Modules", RFC 3577, August 2003. [RFC3729] Waldbusser, S., "Application Performance Measurement MIB", RFC 3729, March 2004. [RFC4149] Kalbfleisch, C., Cole, R., and D. Romascanu, "Definition of Managed Objects for Synthetic Sources for Performance Monitoring Algorithms", RFC 4149, August 2005. [RFC4150] Dietz, R. and R. Cole, "Transport Performance Metrics MIB", RFC 4150, August 2005. [RFC4377] Nadeau, T., Morrow, M., Swallow, G., Allan, D., and S. Matsushima, "Operations and Management (OAM) Requirements for Multi-Protocol Label Switched (MPLS) Networks", RFC 4377, February 2006. [RFC4378] Allan, D. and T. Nadeau, "A Framework for Multi-Protocol Label Switching (MPLS) Operations and Management (OAM)", RFC 4378, February 2006. [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. Zekauskas, "A One-way Active Measurement Protocol (OWAMP)", RFC 4656, September 2006. [RFC4687] Yasukawa, S., Farrel, A., King, D., and T. Nadeau, "Operations and Management (OAM) Requirements for Point- to-Multipoint MPLS Networks", RFC 4687, September 2006. [RFC4741] Enns, R., "NETCONF Configuration Protocol", RFC 4741, December 2006. [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", RFC 5357, October 2008. [VIGO] Vigoureux, M., "Requirements for Operations and Management Cole, et al. Expires December 18, 2009 [Page 13] Internet-Draft Robust Management June 2009 (OAM) in MPLS Transport Network", March 2009. [Y.1710] ITU-T Study Group 13, "ITU-T Y.1710 - Requirements for OAM Functionality in MPLS Networks", 2002. [Y.1730] ITU-T Study Group 13, "ITU-T Y.1730 - Requirements for OAM Functions in Ethernet-based Networks and Ethernet Services", January 2004. [Y.1731] ITU-T Study Group 13, "ITU-T Y.1731 - OAM Functions and Mechanisms for Ethernet-based Networks", May 2006. [YANG] Bjorklund, M., "YANG - A data modeling language for NETCONF", January 2009. Appendix A. Appendix A: Motivational Use Cases In this appendix we motivate the need for more robust configuration management through a set of example use cases and failure situations. We solicit other use cases from readers. One note, not all of these use cases currently apply to the application of NETCONF configuration management for various reasons not of interest here. But we do believe that future implementations and versions of NETCONF will be applied to all these use cases; so we include them here. A.1. Use Case A: MANET This section discusses a potential failure in configuration management in the case of a multi-frequency, multi-domain wireless Mobile Ad-hoc Network (MANET) scenario. Here there is a single NETCONF client connected to both MANET domains. The MANET domains are operating on different wireless frequencies. MANET_1 operates on freq_1 while MANET_2 operates on freq_2. In MANET_2 is the Server in question, which is indicated with an 'X'. Other nodes in the MANETs are indicated with a 'O'. The following sequence of events follow. The Client issues a operation with the :confirmed capability. Part of the new configuration pushed to the Server, i.e., 'X', includes inadvertently changing its operating frequency from freq_2 to freq_1. However, the Server maintains connectivity back to the Client through the MANET node indicated as '@' which sits on the border of MANET_1 within radio range of the Server. This allows the Client to confirm its connectivity tests to the Server and then finally issue a confirming- commit. The Server then moves deeper into MANET_2 and becomes disconnected from the Client and all other nodes within MANET_2 do to the erroneous change in its operating frequency. The Client has no Cole, et al. Expires December 18, 2009 [Page 14] Internet-Draft Robust Management June 2009 means at this point to reconnected to the Server and fix its configuration. Cole, et al. Expires December 18, 2009 [Page 15] Internet-Draft Robust Management June 2009 NETWORK DIAGRAM: ---------------- +---Client---+ | | freq_1 V V freq_2 +---------------+ +----------------+ | O O O | | O | | O @| |X---> O | | | | O O | | O O O O | | O O | | O | | O | | O O | | O O O| | O | | O | | OO | | O | +---------------+ +----------------+ MANET_1 MANET_2 CLIENT/SERVER INTERACTIONS: --------------------------- Client Server(X) ------ --------- w/confirm (changing to freq_1) +-------------------------> +Changes configuration. +Sets timer. | +Executes | ping tests. | | +Connectivity | confirmed. | | --- +-------------------------> +Verifies configuration. +Stops timer. +Wanders off into MANET_2 and looses connectivity to client. Cole, et al. Expires December 18, 2009 [Page 16] Internet-Draft Robust Management June 2009 Figure 5 Our proposed solution is to have the Server perform its own connectivity tests to a set of critical neighbor or peer nodes. This would allow the Server to realize the incorrect frequency setting. It would then need a means to indicate back to the Client that a configuration error has occurred. Then the Client would not issue the confirming commit operation and the Server would back out into its previous configuration. A.2. Use Case B: IpTables This section is TBD. Cole, et al. Expires December 18, 2009 [Page 17] Internet-Draft Robust Management June 2009 NETWORK DIAGRAM: ---------------- +---------------+ | O O O | Client----->|X---> O | | | | O O O O | | O | | O O | | O | | OO | +---------------+ MANET_1 CLIENT/SERVER INTERACTIONS: --------------------------- Client Server(X) ------ --------- w/confirm (changes to ipTables) +-------------------------> +Changes configuration (looses connectivity to all neighbors but Client). +Sets timer. | +Executes | connectivity tests. | | +Connectivity | confirmed. | | --- +-------------------------> +Verifies configuration. +Stops timer. +Wanders off into MANET_1 and looses connectivity to client. Cole, et al. Expires December 18, 2009 [Page 18] Internet-Draft Robust Management June 2009 Figure 6 A.3. Use Case C: DTN This is a rather extreme use case, but one which is of interest to address within the Disruption Tolerant Network (DTN) development community. DTNs are characterized by large and/or intermittent delays between network systems. Clearly there are numerous issues to be worked in order to achieve NETCONF configuration management over DTNs. This use case illustrates just one example issue. Here, the NETCONF Client issues a commit with the confirm capability to the DTN's Bundle delivery protocol. By the time the configuration change request reaches the distant, remote Server the Client and Server have no immediate connectivity. Hence, any testing performed by the Client to Verify the proposed configuration changes on the Server are bound to fail. If this is the only means to perform Verification of running configurations then this form of management over DTNs is bound to always fail. Cole, et al. Expires December 18, 2009 [Page 19] Internet-Draft Robust Management June 2009 NETWORK DIAGRAM: ---------------- +---------------+ | O O O | Client----->|O O | | | | O O O O | | O | | O O | | O O|------>Server | OO | +---------------+ DTN CLIENT/SERVER INTERACTIONS: --------------------------- Client Bundle Delivery Server(X) ------ --------------- --------- w/confirm (long delivery delay) +-------------------------> +Changes configuration (but has no current communication to Client). +Sets timer. | +Cannot execute | connectivity tests. | | +Cannot confirm changes, | will always fail. | | --- +Stops timer. +Backs out of configuration change. Figure 7 Cole, et al. Expires December 18, 2009 [Page 20] Internet-Draft Robust Management June 2009 A.4. Use Case D: Dual Homing In this use case, the Server is dual homed over two different ISPs, A and B. The link to ISP B is currently the primary router path between the Server and Client. The two ISPs are very protective of the specifics of their internal networks and block all attempts of external devices to probe the internals of their network, e.g., pings, traceroutes, etc are blocked. The Client issues a configuration change to the Server via the commit with confirm capability. The new configuration is flawed and causes the Server to loose connectivity over the backup link_a path. The Client performs connectivity tests to the Server, which succeed due to the presence of the primary path over link_b. The Client issues the confirming commit and the Server commits to the current configuration. Sometime later, link_b fails and the Server becomes totally disconnected and the Client cannot access the Server to fix it. NETWORK DIAGRAM: ---------------- Client | | +--------------+ | | | ISP_C* | | | +--------------+ | | | | +-----------+ +-----------+ | | | | | ISP_A* | | ISP_B* | | | | | +-----------+ +-----------+ \ / link_a\ /link_b (backup)\ /(primary) \ / Server (enterprise router) * ISP's hide/block path information, e.g., Cole, et al. Expires December 18, 2009 [Page 21] Internet-Draft Robust Management June 2009 hides traceroute information. CLIENT/SERVER INTERACTIONS: --------------------------- Client Server(X) ------ --------- w/confirm (changes cause Server to loose connectivity over backup link_a) +-------------------------> +Changes configuration (looses connectivity over link_a, but not link_b). +Sets timer. | +Executes | connectivity tests | (running over link_b) | +Connectivity | confirmed. | | --- +-------------------------> +Verifies configuration. +Stops timer. +Link_b fails and Server looses all connectivity. Figure 8 Appendix B. Appendix B: Network-wide Upgrades One further point regarding network versus device management and the utility of an extensive Validation and Verification capability within NETCONF and YANG. The NETCONF protocol is currently defined to provide a set of operations and optional capabilities which afford management applications a configuration framework which improves previous capabilities. Specifically, as described in Appendix D of NETCONF RFC 4741 [RFC4741], the following client to server procedure is possible within NETCONF: Cole, et al. Expires December 18, 2009 [Page 22] Internet-Draft Robust Management June 2009 1. Acquire a configuration 'lock' - prevent other applications from simultaneously modifying the same sections of the device configuration. 2. Load configuration update - move the desired new configuration to the managed device. 3. Verify the configuration (syntax) - perform a syntax check on the new configuration code. 4. Checkpoint the configuration - save the old configuration in case the device needs to back out of the desired changes. 5. Change the configuration - move the proposed configuration changes over to the configuration using, e.g., the ':confirmed-commit' capability. 6. Validate the new configuration - within the time limits set in the ':confirmed-commit' the application can perform a set of tests, e.g., 'ping', or inferential checks, e.g., pull routing information from the device or peers, to build some confidence in the proposed configuration changes. If the application is not satisfied with the tests and checks available to it, it can withhold the 'confirming-commit' forcing the device to back out of the desired configuration changes. 7. Make the changes permanent (if desired) - 8. Release the configuration 'lock' - This represents an significant step forward from a reliance upon SNMP for configuration management. However, further improvements are desirable, specifically in the definition and automation of tests associated with Step 6 above. Herein lies our interests and the focus of the framework discussion outlined in this document. With respect to the above procedure, extensions to network-wide configuration changes are limited to a serial repetition of the above procedure for each network device. This may prove awkward for large numbers of devices; if one device fails to upgrade its configuration the client has to back out of all previous device upgrades serially. Whereas, an enhanced Validation and, specifically, an enhanced Verification capability may result in improved methods and procedures for network-wide configuration updates. As an example, the following network upgrade procedure may be feasible. Cole, et al. Expires December 18, 2009 [Page 23] Internet-Draft Robust Management June 2009 1. Configure N devices with appropriate configuration changes in candidate configuration files. Regular YANG 'static' file checking used to make sure first will work on each device. 2. Issue (#1) to all devices with extra parameters identifying the master test template to run on each device (if needed). 3. Run all the tests according to the template(s) and report the results to the client with internal code. 4. Servers will issue a pass/failed notification and save a detailed report as well. 5. Client issuing all these tests waits for notifications or polls the agents for the pass/fail (i.e., done) flag NMS can let all tests finish or cancel all tests/commits on first failure reported (with new RPC operation). 6. All agents report OK; issue all (#2) to finish robust configuration change, or 7. Analyze detailed reports from agents that failed to see what network/device/bug/other conditions are preventing the test(s) from passing. This is an opportunity to do some network management, not just device management. Clearly this is an area for further study. Appendix C. Appendix C: verify-commit.yang Module In this appendix we list the verify-commit.yang model for use in conjunction with the robust-netconf capabilities. Note: this capability has several prerequisites, including support for configuration and notifications. module verified-commit { namespace "file:///draft-cole-netconf-robust-config-01.txt"; prefix "vc"; organization "IETF"; Cole, et al. Expires December 18, 2009 [Page 24] Internet-Draft Robust Management June 2009 contact "[add contact info here]."; description "NETCONF verified commit procedure."; revision 2009-06-09 { description "Initial version."; } rpc start-verified-commit { description "The verified commit procedure is started by invoking this operation. The NETCONF procedure for the :confirmed-commit capability is followed, with the additional semantics: * the agent will cancel the verified-commit procedure if the operation is invoked. * the agent will start, monitor, and report the verification test(s) during the time interval after this operation, and before the 'timeout' interval has expired. * the agent will complete the verified commit procedure if the operation is invoked before the timeout interval has expired. * the agent will generate the notification for each verification test specified in the 'test-template leaf-list, indicating the result of each verification test. * the agent will generate the notification at the completion of the entire verified commit procedure, indicating the final verified commit procedure status. "; input { leaf timeout { description "The time interval the agent must wait before reverting the configuration and automatically canceling the verified commit procedure. (Note that the verified Cole, et al. Expires December 18, 2009 [Page 25] Internet-Draft Robust Management June 2009 commit procedure will also be automatically canceled if the session that invokes this operation is terminated for any reason."; type uint32; units seconds; default 600; } leaf-list test-template { description "Identifies a verification test control entry for the agent to use for the verification portion of the verified commit procedure. The verification test control entry must conform to the requirements specified in section X.X, and the agent must be capable of starting, monitoring, and reporting the results of the verification test, as required. The agent will also generate the notification, as specified for the verification test control entry indicated by this parameter."; type instance-identifier; min-elements 1; } } } rpc cancel-verified-commit { description "Cancel a verified commit procedure already in progress. If no verified commit procedure is currently in progress, then an 'operation-failed' error is generated, and the value 'no-verified-commit' is used for the error-app-tag field. If the verified commit procedure in progress cannot be canceled for any reason, then an 'operation-failed' error is returned, and the value 'cancel-failed' is used in the error-app-tag field. If any verification tests associated with this Cole, et al. Expires December 18, 2009 [Page 26] Internet-Draft Robust Management June 2009 verified commit procedure are still in progress, they will be canceled by this operation. If the verified commit procedure in progress is canceled, then the agent will return . "; } rpc complete-verified-commit { description "Complete a verified commit procedure already in progress. If no verified commit procedure is currently in progress, then an 'operation-failed' error is generated, and the value 'no-verified-commit' is used for the error-app-tag field. If the verified commit procedure in progress cannot be completed for any reason, then an 'operation-failed' error is returned, and the value 'complete-failed' is used in the error-app-tag field. If any verification tests associated with this verified commit procedure are still in progress, they will be canceled by this operation. If the verified commit procedure in progress is completed, then the agent will return . "; } notification verifiedCommitStatus { description "Contains the current or final status of a verification test being invoked on behalf of the current verified commit procedure."; leaf testIdentifier { description "Indicates which verification test this status report is associated with. This value will identify the same node as specified in a 'test-template' parameter instance provided in the operation."; type instance-identifier; mandatory true; Cole, et al. Expires December 18, 2009 [Page 27] Internet-Draft Robust Management June 2009 } leaf statusType { description "Indicates the type of status report that this notification contains."; type enumeration { enum partial { description "Indicates this is a partial status result and the verification test is still in progress."; } enum final { description "Indicates this is the final status result and the verification test is completed or canceled."; } } mandatory true; } leaf status { description "Indicates the NETCONF error-tag value most closely associated with the test status. The string 'ok' is used to indicate that no errors have been detected."; type string; reference "RFC 4741bis, Appendix A"; mandatory true; } anyxml extendedStatus { description "Indicates test-specific status data. The requirements for verification tests (section X.X) describes how the semantics of this structure are determined."; } } notification verificationTestComplete { description "Contains the final status of the current verified commit test procedure."; Cole, et al. Expires December 18, 2009 [Page 28] Internet-Draft Robust Management June 2009 leaf status { description "Indicates the NETCONF error-tag value most closely associated with the test status. The string 'ok' is used to indicate that no errors have been detected."; type string; reference "RFC 4741bis, Appendix A"; mandatory true; } } } Figure 9 Appendix D. Appendix D: Example ping.yang Module In this appendix we list an example ping.yang model for use in conjunction with the robust-netconf capabilities. Specifically, the verified-commit operation passes the instance- identifier parameter. That leaf identifies the entry of the specific ping test being requested. 3600 /at:ping/at:pingEntry[at:pingControlIndex=42] true Figure 10 =========Contents of "ping.yang"================== module ping { namespace "unassigned"; prefix "at"; Cole, et al. Expires December 18, 2009 [Page 29] Internet-Draft Robust Management June 2009 import ietf-yang-types { prefix yang; } import ietf-inet-types { prefix inet; } organization "IETF"; contact "Andy Bierman Netconf Central, Inc. EMail: andy@netconfcentral.com Robert G. Cole Johns Hopkins University/Applied Physics Lab Email: rgcole01@comcast.net Dan Romascanu Avaya Email:dromasc@avaya.com"; description "The module for entities implementing the ping test."; revision 2009-05-26 { description "Initial revision."; } leaf test-reference { type string; config false; description "URL for the definition of this test"; } list pingEntry { key "pingControlIndex"; config true; leaf pingControlIndex { type uint32; description "Identifies the specific control table row of the ping test template to be executed, which represents the verification tests to be performed on the device as part of the verified commit operation."; Cole, et al. Expires December 18, 2009 [Page 30] Internet-Draft Robust Management June 2009 } leaf dstAddr { type inet:ip-address; description "Identifies the destination address in the packet header of the ping message."; } leaf srcAddr { type inet:ip-address; description "Identifies the source address in the packet header of the ping message."; } leaf number { type uint32; description "The number of ping packets to be sent."; } leaf spacing { type uint32; description "The number of seconds between sending subsequent ping packets."; } leaf threshold { type uint32; description "The minimum number of successful ping packets required to consider the test a success."; } leaf startTime { type yang:date-and-time; config false; description "The time the first ping packet was sent for the previous test. This is set each time the test is initiated from a client. When this value is reset, the value of the 'result' node is set to 'indeterminant'."; Cole, et al. Expires December 18, 2009 [Page 31] Internet-Draft Robust Management June 2009 } leaf result { type enumeration { enum indeterminant{ description "Set to 'indeterminant' upon the initiation of a test."; } enum success{ description "Set to 'success' if the number of successful pings exceeded the 'threshold'."; } enum failure{ description "Set to 'failure' if the number of successful pings is less than or equal to the 'threshold'."; } config false; description "The result of the previous test."; } } } } Figure 11 Appendix E. Appendix E: Existing Capabilities In this appendix we identify existing protocol capabilities which may play a role in extending NETCONF Validation and Verification capabilities and specifications for improved configuration management. This is by no means meant to be an exhaustive, all- inclusive list. It is merely intended to better reinforce this proposal and give an appreciation of its potential mechanisms currently available in other contexts. E.1. NETCONF Capabilities Here we highlight existing NETCONF mechanisms associated with Validation checking and Verification testing configuration changes prior to committing to those changes. We conclude this section with Cole, et al. Expires December 18, 2009 [Page 32] Internet-Draft Robust Management June 2009 a potential list of extensions to NETCONF which may be necessary to accomplish improved configuration management. The NETCONF protocol is a new tool for configuration management over IP networks. The NETCONF protocol current supports a set of configuration operations, including: o , o , o , o ...., o . NETCONF servers can advertise capabilities upon initial session establishments. One capability is the ':validate' capability. When implementing the ':validate' capability, the server ``checks at least for syntax error ...'' (reference NETCONF). This level of checking can be tied directly to the operation through the operation test-option: 'test-then-set' if the server advertises :validate capability (NETCONF sect 8.6). This forces the server to perform syntax checking during the operation. We describe this as Validation checking made against non-running configuration code. However, NETCONF and YANG do not fully define this 'Validation' capability. Currently only limited syntax checking is defined. Yang proposes to extend this capability by adding 'constraints' checking through the definition of XPATH relationships within the server management model. We propose that further and useful extensions should be included to cover more general cross- management model relationships, a ka, 'Validation' statements. The 'writable-running' capability allows the operation to define the configuration to be the target. However, in this case, we believe that the checks are to be performed prior to copying the proposed configuration to the configuration. Hence, we still maintain that this is Validation. A further NETCONF capability is the ':confirmed-commit' capability. This allows the client to instruct the server through the optional operation's parameters, 'confirmed' and 'confirmed-timeout', to run the desired configuration changes for a period of time, until it either receives a 'confirming commit' from the client and commits, or times out and reverts back to the prior configuration. This gives the client time to perform an unspecified set of Verification tests to build confidence in the desired changes prior to instructing to Cole, et al. Expires December 18, 2009 [Page 33] Internet-Draft Robust Management June 2009 commit. However, NETCONF does not specify or recommend the tests to be performed, nor the success criteria for the tests, nor does it specify how the server can actively participate in the test phase of the 'commit' and 'confirmed-commit' procedure. The following enhancements are in consideration for improved Verification testing and Validation checking of proposed configuration code: o Enhance the operation to include a greater set of Validation checks on the proposed configuration. These may include specifying tests through reference, i.e., URL, or through explicit device models, e.g, constraint checks defined through YANG. This would allow for improved Validation. o Define a operation. E.5. OAM for MPLS Services Carriers are actively deploying new metropolitan data services based upon 'MPLS Services'. As with Carrier Class Ethernet deployments, new OAM capabilities need to be defined. Current work to date primarily involves the definitions of requirements for these capabilities. These are discussed in RFC 4377 [RFC4377], RFC 4378 Cole, et al. Expires December 18, 2009 [Page 36] Internet-Draft Robust Management June 2009 [RFC4378], RFC 4687 [RFC4687], Y.1710 [Y.1710], and VIGO [VIGO]. Once defined, we can envision exercising active tests to Verify proposed configuration changes to these MPLS-based carrier services. Automatically coupling proposed configuration changes to Verification tests relying upon defined OAM active measurements of the resulting MPLS service instance will provide a robust configuration management capability for carriers while simplifying their configuration management Manual Methods and Procedures (MMPs). E.6. Active Tests for Performance Monitoring The IPPM Working Group has developed several measurement protocols for active measurements of metrics defined in various IPPM WG documents. Specifically, the One-way Active Measurement Protocol (OWAMP) and the Two-way Active Measurement Protocol (TWAMP) are defined in RFC 4656 [RFC4656], and RFC 5357 [RFC5357]. These allow for the generation of active test measurements for precise performance measurements across IP networks. These specify the nature of the traffic generation, the collection process and the data reduction methods to achieve precise performance metrics. The measurement protocols define their own packet formats; hence these protocols are not intended for broad continuity tests such as obtainable through the SSPM-MIB. Instead they are developed for precise performance measurements. In applications where concern with the impact of configuration changes on fine grained network performance is important, then methods to automatically invoke these types of tests through the NETCONF protocol and YANG models become interesting. Authors' Addresses Robert G. Cole Johns Hopkins University 11100 Johns Hopkins Road Laurel, MD 20723 USA Phone: +1.443.778.6951 Email: rgcole01@comcast.net URI: http://www.cs.jhu/~rgcole/ Cole, et al. Expires December 18, 2009 [Page 37] Internet-Draft Robust Management June 2009 Dan Romascanu Avaya Atidim Technology Park, Bldg. #3 Tel Aviv 61131 Israel Email: dromasca@avaya.com Andy Bierman Netconf Central Simi Valley, CA USA Email: andy@netconfcentral.com Cole, et al. Expires December 18, 2009 [Page 38]