Internet-Draft ASA Guidelines December 2021
Carpenter, et al. Expires 22 June 2022 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-ietf-anima-asa-guidelines-05
Published:
Intended Status:
Informational
Expires:
Authors:
B. E. Carpenter
Univ. of Auckland
L. Ciavaglia
Rakuten Mobile
S. Jiang
Huawei Technologies Co., Ltd
P. Peloso
Nokia

Guidelines for Autonomic Service Agents

Abstract

This document proposes guidelines for the design of Autonomic Service Agents for autonomic networks. Autonomic Service Agents, together with the Autonomic Network Infrastructure, the Autonomic Control Plane and the Generic Autonomic Signaling Protocol constitute base elements of a so-called autonomic networking ecosystem.

Discussion Venue

This note is to be removed before publishing as an RFC.

Discussion of this document takes place on the ANIMA mailing list (anima@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/anima/.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 22 June 2022.

Table of Contents

1. Introduction

This document proposes guidelines for the design of Autonomic Service Agents (ASAs) in the context of an Autonomic Network (AN) based on the Autonomic Network Infrastructure (ANI) outlined in the ANIMA reference model [RFC8993]. This infrastructure makes use of the Autonomic Control Plane (ACP) [RFC8994] and the Generic Autonomic Signaling Protocol (GRASP) [RFC8990]. A general introduction to this environment may be found at [IPJ], which also includes explanatory diagrams, and a summary of terminology is in Appendix B.

This document is a contribution to the description of an autonomic networking ecosystem, recognizing that a deployable autonomic network needs more than just ACP and GRASP implementations. Such an autonomic network must achieve management tasks that a Network Operations Center (NOC) cannot readily achieve manually, such as continuous resource optimization or automated fault detection and repair. These tasks, and other management automation goals, are described at length in [RFC7575]. The net result should be significant improvement of operational metrics. To achieve this, the autonomic networking ecosystem must include at least a library of ASAs and corresponding GRASP technical objective definitions. A GRASP objective [RFC8990] is a data structure whose main contents are a name and a value. The value consists of a single configurable parameter or a set of parameters of some kind.

There must also be tools to deploy and oversee ASAs, and integration with existing operational mechanisms [RFC8368]. However, this document focuses on the design of ASAs, with some reference to implementation and operational aspects.

There is a considerable literature about autonomic agents with a variety of proposals about how they should be characterized. Some examples are [DeMola06], [Huebscher08], [Movahedi12] and [GANA13]. However, for the present document, the basic definitions and goals for autonomic networking given in [RFC7575] apply. According to RFC 7575, an Autonomic Service Agent is "An agent implemented on an autonomic node that implements an autonomic function, either in part (in the case of a distributed function) or whole."

ASAs must be distinguished from other forms of software component. They are components of network or service management; they do not in themselves provide services to end users. They do however provide management services to network operators and administrators. For example, the services envisaged for network function virtualisation [RFC8568] or for service function chaining [RFC7665] might be managed by an ASA rather than by traditional configuration tools.

Another example is that an existing script running within a router to locally monitor or configure functions or services could be upgraded to an ASA that could communicate with peer scripts on neighboring or remote routers. A high-level API will allow such upgraded scripts to take full advantage of the secure ACP and the discovery, negotiation and synchronization features of GRASP. Familiar tasks such as configuring an Interior Gateway Protocol (IGP) on neighboring routers or even exchanging IGP security keys could be performed securely in this way. This document mainly addresses issues affecting quite complex ASAs, but the most useful ones may in fact be rather simple developments from existing scripts.

The reference model [RFC8993] for autonomic networks explains further the functionality of ASAs by adding "[An ASA is] a process that makes use of the features provided by the ANI to achieve its own goals, usually including interaction with other ASAs via the GRASP protocol [RFC8990] or otherwise. Of course, it also interacts with the specific targets of its function, using any suitable mechanism. Unless its function is very simple, the ASA will need to handle overlapping asynchronous operations. It may therefore be a quite complex piece of software in its own right, forming part of the application layer above the ANI."

As mentioned, there will certainly be simple ASAs that manage a single objective in a straightforward way and do not need asynchronous operations. In nodes where computing power and memory space are limited, ASAs should run at a much lower frequency than the primary workload, so CPU load should not be a big issue, but memory footprint in a constrained node is certainly a concern. ASAs installed in constrained devices will have limited functionality. In such cases, many aspects of the current document do not apply. However, in the general case, an ASA may be a relatively complex software component that will in many cases control and monitor simpler entities in the same or remote host(s). For example, a device controller that manages tens or hundreds of simple devices might contain a single ASA.

The remainder of this document offers guidance on the design of complex ASAs. Some of the material may be familiar to those experienced in distributed fault-tolerant and real-time control systems.

2. Logical Structure of an Autonomic Service Agent

As mentioned above, all but the simplest ASAs will need to support asynchronous operations. Different programming environments support asynchronicity in different ways. In this document, we use an explicit multi-threading model to describe operations. Alternatives are discussed in connection with the GRASP API in Section 3.3.

A typical ASA will have a main thread that performs various initial housekeeping actions such as:

The logic of the main loop will depend on the details of the autonomic function concerned. Whenever asynchronous operations are required, extra threads may be launched. Examples of such threads include:

These threads should all either exit after their job is done, or enter a wait state for new work, to avoid wasting system resources.

According to the degree of parallelism needed by the application, some of these threads might be launched in multiple instances. In particular, if negotiation sessions with other ASAs are expected to be long or to involve wait states, the ASA designer might allow for multiple simultaneous negotiating threads, with appropriate use of queues and locks to maintain consistency.

The main loop itself could act as the initiator of synchronization requests or negotiation requests, when the ASA needs data or resources from other ASAs. In particular, the main loop should watch for changes in policy parameters that affect its operation. It should also do whatever is required to avoid unnecessary resource consumption, for example by limiting its frequency of execution.

The self-monitoring thread is of considerable importance. Autonomic service agents must never fail. To a large extent this depends on careful coding and testing, with no unhandled error returns or exceptions, but if there is nevertheless some sort of failure, the self-monitoring thread should detect it, fix it if possible, and in the worst case restart the entire ASA.

Appendix C presents some example logic flows in informal pseudocode.

3. Interaction with the Autonomic Networking Infrastructure

3.1. Interaction with the security mechanisms

An ASA by definition runs in an autonomic node. Before any normal ASAs are started, such nodes must be bootstrapped into the autonomic network's secure key infrastructure, typically in accordance with [RFC8995]. This key infrastructure will be used to secure the ACP (next section) and may be used by ASAs to set up additional secure interactions with their peers, if needed.

Note that the secure bootstrap process itself may include special-purpose ASAs that run in a constrained insecure mode.

3.2. Interaction with the Autonomic Control Plane

In a normal autonomic network, ASAs will run as clients of the ACP, which will provide a fully secured network environment for all communication with other ASAs, in most cases mediated by GRASP (next section).

Note that the ACP formation process itself may include special-purpose ASAs that run in a constrained insecure mode.

3.3. Interaction with GRASP and its API

GRASP [RFC8990] is likely to run as a separate process with its API [RFC8991] available in user space. Thus, ASAs may operate without special privilege, unless they need it for other reasons. The ASA's view of GRASP is built around GRASP objectives (Section 5), defined as data structures containing administrative information such as the objective's unique name, and its current value. The format and size of the value is not restricted by the protocol, except that it must be possible to serialise it for transmission in Concise Binary Object Representation (CBOR) [RFC8949], subject only to GRASP's maximum message size as discussed in Section 5.

As discussed in Section 2, GRASP is an asynchronous protocol, and this document uses a multi-threading model to describe operations. In many programming environments, an 'event loop' model is used instead, in which case each thread would be implemented as an event handler called in turn by the main loop. For this case, the GRASP API must provide non-blocking calls and possibly support callbacks. This topic is discussed in more detail in [RFC8991], and other asynchronicity models are also possible. Whenever necessary, the GRASP session identifier will be used to distinguish simultaneous operations.

The GRASP API should offer the following features:

  • Registration functions, so that an ASA can register itself and the objectives that it manages.
  • A discovery function, by which an ASA can discover other ASAs supporting a given objective.
  • A negotiation request function, by which an ASA can start negotiation of an objective with a counterpart ASA. With this, there is a corresponding listening function for an ASA that wishes to respond to negotiation requests, and a set of functions to support negotiating steps. Once a negotiation starts, it is a symmetric process with both sides sending successive objective values to each other until agreement is reached (or the negotiation fails).
  • A synchronization function, by which an ASA can request the current value of an objective from a counterpart ASA. With this, there is a corresponding listening function for an ASA that wishes to respond to synchronization requests. Unlike negotiation, synchronization is an asymmetric process in which the listener sends a single objective value to the requester.
  • A flood function, by which an ASA can cause the current value of an objective to be flooded throughout the AN so that any ASA can receive it.

For further details and some additional housekeeping functions, see [RFC8991].

The GRASP API is intended to support the various interactions expected between most ASAs, such as the interactions outlined in Section 2. However, if ASAs require additional communication between themselves, they can do so using any desired protocol, such as a TLS session over the ACP if that meets their needs. One option is to use GRASP discovery and synchronization as a rendez-vous mechanism between two ASAs, passing communication parameters such as a TCP port number via GRASP. As noted above, the ACP should be used to secure such communications.

3.4. Interaction with policy mechanisms

At the time of writing, the policy mechanisms for the ANI are undefined. In particular, the use of declarative policies (aka Intents) for the definition and management of ASA's behaviors remains a research topic [I-D.irtf-nmrg-ibn-concepts-definitions].

In the cases where ASAs are defined as closed control loops, the specifications defined in [ZSM009-1] regarding imperative and declarative goal statements may be applicable.

In the ANI, policy dissemination is expected to operate by an information distribution mechanism (e.g. via GRASP [RFC8990]) that can reach all autonomic nodes, and therefore every ASA. However, each ASA must be capable of operating "out of the box" in the absence of locally defined policy, so every ASA implementation must include carefully chosen default values and settings for all policy parameters.

4. Interaction with Non-Autonomic Components

An ASA, to have any external effects, must also interact with non-autonomic components of the node where it is installed. For example, an ASA whose purpose is to manage a resource must interact with that resource. An ASA whose purpose is to manage an entity that is already managed by local software must interact with that software. For example, if such management is performed by NETCONF [RFC6241], the ASA must interact with the NETCONF server as an independent NETCONF client in the same node to avoid any inconsistency between configuration changes delivered via NETCONF and configuration changes made by the ASA.

In an environment where systems are virtualized and specialized using techniques such as network function virtualization or network slicing, there will be a design choice whether ASAs are deployed once per physical node or once per virtual context. A related issue is whether the ANI as a whole is deployed once on a physical network, or whether several virtual ANIs are deployed. This aspect needs to be considered by the ASA designer.

5. Design of GRASP Objectives

The general rules for the format of GRASP objectives, their names, and IANA registration are given in [RFC8990]. Additionally, that document discusses various general considerations for the design of objectives, which are not repeated here. However, note that the GRASP protocol, like HTTP, does not provide transactional integrity. In particular, steps in a GRASP negotiation are not idempotent. The design of a GRASP objective and the logic flow of the ASA should take this into account. One approach, which should be used when possible, is to design objectives with idempotent semantics. If this is not possible, typically if an ASA is allocating part of a shared resource to other ASAs, it needs to ensure that the same part of the resource is not allocated twice. The easiest way is to run only one negotiation at a time. If an ASA is capable of overlapping several negotiations, it must avoid interference between these negotiations.

Negotiations will always end, normally because one end or the other declares success or failure. If this does not happen, either a timeout or exhaustion of the loop count will occur. The definition of a GRASP objective should describe a specific negotiation policy if it is not self-evident.

GRASP allows a 'dry run' mode of negotiation, where a negotiation session follows its normal course but is not committed at either end until a subsequent live negotiation session. If 'dry run' mode is defined for the objective, its specification, and every implementation, must consider what state needs to be saved following a dry run negotiation, such that a subsequent live negotiation can be expected to succeed. It must be clear how long this state is kept, and what happens if the live negotiation occurs after this state is deleted. An ASA that requests a dry run negotiation must take account of the possibility that a successful dry run is followed by a failed live negotiation. Because of these complexities, the dry run mechanism should only be supported by objectives and ASAs where there is a significant benefit from it.

The actual value field of an objective is limited by the GRASP protocol definition to any data structure that can be expressed in Concise Binary Object Representation (CBOR) [RFC8949]. For some objectives, a single data item will suffice; for example an integer, a floating point number or a UTF-8 string. For more complex cases, a simple tuple structure such as [item1, item2, item3] could be used. Since CBOR is closely linked to JSON, it is also rather easy to define an objective whose value is a JSON structure. The formats acceptable by the GRASP API will limit the options in practice. A generic solution is for the API to accept and deliver the value field in raw CBOR, with the ASA itself encoding and decoding it via a CBOR library.

The maximum size of the value field of an objective is limited by the GRASP maximum message size. If the default maximum size specified by [RFC8990] is not enough, the specification of the objective must indicate the required maximum message size, both for unicast and multicast messages.

A mapping from YANG to CBOR is defined by [I-D.ietf-core-yang-cbor]. Subject to the size limit defined for GRASP messages, nothing prevents objectives using YANG in this way.

It is expected that the value field of many objectives will be extended in service, to add additional information. This has consequences for the robustness of ASAs, as discussed in Section 8.

6. Life Cycle

The ASA life cycle was discussed in [I-D.peloso-anima-autonomic-function], from which the following text was derived.

In simple cases, Autonomic functions could be permanent, in the sense that ASAs are shipped as part of a product and persist throughout the product's life. However, in complex cases, a more likely situation is that ASAs need to be installed or updated dynamically, because of new requirements or bugs. This section describes one approach to the resulting life cycle.

Because continuity of service is fundamental to autonomic networking, the process of seamlessly replacing a running instance of an ASA with a new version needs to be part of the ASA's design. The implication of service continuity on the design of ASAs can be illustrated along the three main phases of the ASA life cycle, namely Installation, Instantiation and Operation.


                  +--------------+
Undeployed ------>|              |------> Undeployed
                  |  Installed   |
              +-->|              |---+
     Mandate  |   +--------------+   | Receives a
   is revoked |   +--------------+   |  Mandate
              +---|              |<--+
                  | Instantiated |
              +-->|              |---+
          set |   +--------------+   | set
         down |   +--------------+   | up
              +---|              |<--+
                  |  Operational |
                  |              |
                  +--------------+

Figure 1: Life Cycle of an Autonomic Service Agent

6.1. Installation phase

We define "installation" to mean that a piece of software is loaded into a device, along with any necessary libraries, but is not yet activated.

Before being able to instantiate and run ASAs, the operator will first provision the infrastructure with the sets of ASA software corresponding to its needs and objectives. The provisioning of the infrastructure is realized in the installation phase and consists in installing (or checking the availability of) the pieces of software of the different ASAs in a set of Installation Hosts. Installation Hosts may be nodes of an autonomic network, or servers dedicated to storing the software images of the different ASAs.

There are 3 properties applicable to the installation of ASAs:

  • The dynamic installation property allows installing an ASA on demand, on any hosts compatible with the ASA.
  • The decoupling property allows controlling resources of an autonomic node from a remote ASA, i.e. an ASA installed on a host machine different from the autonomic node resources.
  • The multiplicity property allows controlling multiple sets of resources from a single ASA.

These three properties are very important in the context of the installation phase as their variations condition how the ASA could be installed on the infrastructure.

6.1.1. Installation phase inputs and outputs

Inputs are:

  • [ASA of a given type] specifies which ASAs to install.
  • [Installation_target_Infrastructure] specifies the candidate Installation Hosts.
  • [ASA placement function] specifies how the installation phase will meet the operator's needs and objectives for the provision of the infrastructure. This function is only required in the decoupled mode. It can be as simple as an explicit list of Installation Hosts, or it could consist of operator-defined criteria and constraints.

The main output of the installation phase is a [list of ASAs] installed on [list of Installation Hosts]. This output is also useful for the coordination function where it acts as a static interaction map (see Section 7.1).

The condition to validate in order to pass to next phase is to ensure that [list of ASAs] are well installed on [list of Installation Hosts]. The state of the ASAs at the end of the installation phase is installed (but not instantiated). A minimum set of primitives to support the installation of ASAs could be: install(list of ASAs, Installation_target_Infrastructure, ASA placement function), and uninstall (list of ASAs).

6.2. Instantiation phase

We define "instantiation" as the operation of creating a single ASA instance from the corresponding piece of installed software.

Once the ASAs are installed on the appropriate hosts in the network, these ASAs may start to operate. From the operator viewpoint, an operating ASA means the ASA manages the network resources as per the objectives given. At the ASA local level, operating means executing their control loop algorithm.

But right before that, there are two things to take into consideration. First, there is a difference between (1) having a piece of code available to run on a host and (2) having an agent based on this piece of code running inside the host. Second, in a coupled case, determining which resources are controlled by an ASA is straightforward (the ASA runs on the same autonomic node as the resources it is controlling); in a decoupled mode determining this is a bit more complex: a starting agent will have to either discover the set of resources it ought to control, or such information has to be communicated to the ASA.

The instantiation phase of an ASA covers both these aspects: starting the agent code (when this does not start automatically) and determining which resources have to be controlled (when this is not straightforward).

6.2.1. Operator's goal

Through this phase, the operator wants to control its autonomic network regarding at least two aspects:

1
determine the scope of autonomic functions by instructing which network resources have to be managed by which autonomic function (and more precisely by which release of the ASA software code, e.g., version number or provider),
2
determine how the autonomic functions are organized by instantiating a set of ASAs across one or more autonomic nodes and instructing them accordingly about the other ASAs in the set as necessary.

In this phase, the operator may also want to set goals for autonomic functions, e.g., by configuring GRASP objectives.

The operator's goal can be summarized in an instruction to the ANIMA ecosystem matching the following format, explained in detail in the next sub-section:

  • [instances of ASAs of a given type] ready to control [Instantiation_target_Infrastructure] with [Instantiation_target_parameters]

6.2.2. Instantiation phase inputs and outputs

Inputs are:

  • [instances of ASAs of a given type] that specifies which ASAs to instantiate
  • [Instantiation_target_Infrastructure] that specifies which are the resources to be managed by the autonomic function; this can be the whole network or a subset of it like a domain, a physical segment or even a specific list of resources,
  • [Instantiation_target_parameters] that specifies which are the GRASP objectives to be sent to ASAs (e.g., an optimization target)

Outputs are:

  • [Set of ASAs - Resources relations] describing which resources are managed by which ASA instances, this is not a formal message, but a resulting configuration of a set of ASAs.

6.2.3. Instantiation phase requirements

The instructions described in Section 6.2 could be either:

  • Sent to a targeted ASA. In the case, the receiving Agent will have to manage the specified list of [Instantiation_target_Infrastructure], with the [Instantiation_target_parameters].
  • Broadcast to all ASAs. In this case, the ASAs would collectively determine from the list which Agent(s) would handle which [Instantiation_target_Infrastructure], with the [Instantiation_target_parameters].

These instructions may be grouped as a specific data structure, referred to as an ASA Instance Mandate. The specification of such an ASA Instance Mandate is beyond the scope of this document.

The conclusion of this instantiation phase is a set of ASA instances ready to operate. These ASA instances are characterized by the resources they manage, the metrics being monitored and the actions that can be executed (like modifying certain parameters values). The description of the ASA instance may be defined in an ASA Instance Manifest data structure. The specification of such an ASA Instance Manifest is beyond the scope of this document.

The ASA Instance Manifest does not only serve informational purposes such as acknowledgement of successful instantiation to the operator, but is also necessary for further autonomic operations with:

  • coordinated entities (see Section 7.1)
  • collaborative entities with purposes such as to establish knowledge exchange (some ASAs may produce knowledge or monitor metrics that would be useful for other ASAs)

6.3. Operation phase

During the Operation phase, the operator can:

  • Activate/Deactivate ASAs: enable/disable their autonomic loops.
  • Modify ASAs targets: set different technical objectives.
  • Modify ASAs managed resources: update the instance mandate to specify a different set of resources to manage (only applicable to decoupled ASAs).

During the Operation phase, running ASAs can interact with other ASAs:

  • in order to exchange knowledge (e.g. an ASA providing traffic predictions to a load balancing ASA)
  • in order to collaboratively reach an objective (e.g. ASAs pertaining to the same autonomic function will collaborate, e.g., in the case of a load balancing function, by modifying link metrics according to neighboring resource loads)

During the Operation phase, running ASAs are expected to apply coordination schemes as per Section 7.1.

7. Coordination and Data Models

7.1. Coordination between Autonomic Functions

Some autonomic functions will be completely independent of each other. However, others are at risk of interfering with each other - for example, two different optimization functions might both attempt to modify the same underlying parameter in different ways. In a complete system, a method is needed of identifying ASAs that might interfere with each other and coordinating their actions when necessary. This issue is considered in detail in [I-D.ciavaglia-anima-coordination].

7.2. Coordination with Traditional Management Functions

Some ASAs will have functions that overlap with existing configuration tools and network management mechanisms such as command line interfaces, DHCP, DHCPv6, SNMP, NETCONF, and RESTCONF. This is of course an existing problem whenever multiple configuration tools are in use by the NOC. Each ASA designer will need to consider this issue and how to avoid clashes and inconsistencies. Some specific considerations for interaction with OAM tools are given in [RFC8368]. As another example, [RFC8992] describes how autonomic management of IPv6 prefixes can interact with prefix delegation via DHCPv6. The description of a GRASP objective and of an ASA using it should include a discussion of any such interactions.

7.3. Data Models

Management functions often include a shared data model, quite likely to be expressed in a formal notation such as YANG. This aspect should not be an afterthought in the design of an ASA. To the contrary, the design of the ASA and of its GRASP objectives should match the data model; as noted in Section 5, YANG serialized as CBOR may be used directly as the value of a GRASP objective.

8. Robustness

It is of great importance that all components of an autonomic system are highly robust. Although ASA designers should aim for their component to never fail, it is more important to design the ASA to assume that failures will happen and to gracefully recover from those failures when they occur. Hence, this section lists various aspects of robustness that ASA designers should consider:

  1. If despite all precautions, an ASA does encounter a fatal error, it should in any case restart automatically and try again. To mitigate a loop in case of persistent failure, a suitable pause should be inserted before such a restart. The length of the pause depends on the use case.
  2. If a newly received or calculated value for a parameter falls out of bounds, the corresponding parameter should be either left unchanged or restored to a safe value.
  3. If a GRASP synchronization or negotiation session fails for any reason, it may be repeated after a suitable pause. The length of the pause depends on the use case.
  4. If a session fails repeatedly, the ASA should consider that its peer has failed, and cause GRASP to flush its discovery cache and repeat peer discovery.
  5. In any case, it may be prudent to repeat discovery periodically, depending on the use case.
  6. Any received GRASP message should be checked. If it is wrongly formatted, it should be ignored. Within a unicast session, an Invalid message (M_INVALID) may be sent. This function may be provided by the GRASP implementation itself.
  7. Any received GRASP objective should be checked. Basic formatting errors like invalid CBOR will likely be detected by GRASP itself, but the ASA is responsible for checking the precise syntax and semantics of a received objective. If it is wrongly formatted, it should be ignored. Within a negotiation session, a Negotiation End message (M_END) with a Decline option (O_DECLINE) should be sent. An ASA may log such events for diagnostic purposes.
  8. On the other hand, the definitions of GRASP objectives are very likely to be extended, using the flexibility of CBOR or JSON. Therefore, ASAs should be able to deal gracefully with unknown components within the values of objectives. The specification of an objective should describe how unknown components are to be handled (ignored, logged and ignored, or rejected as an error).
  9. If an ASA receives either an Invalid message (M_INVALID) or a Negotiation End message (M_END) with a Decline option (O_DECLINE), one possible reason is that the peer ASA does not support a new feature of either GRASP or of the objective in question. In such a case the ASA may choose to repeat the operation concerned without using that new feature.
  10. All other possible exceptions should be handled in an orderly way. There should be no such thing as an unhandled exception (but see point 1 above).

At a slightly more general level, ASAs are not services in themselves, but they automate services. This has a fundamental impact on how to design robust ASAs. In general, when an ASA observes a particular state (1) of operations of the services/resources it controls, it typically aims to improve this state to a better state, say (2). Ideally, the ASA is built so that it can ensure that any error encountered can still lead to returning to (1) instead of a state (3) which is worse than (1). One example instance of this principle is "make-before-break" used in reconfiguration of routing protocols in manual operations. This principle of operations can accordingly be coded into the operation of an ASA. The GRASP dry run option mentioned in Section 5 is another tool helpful for this ASA design goal of "test-before-make".

9. Security Considerations

ASAs are intended to run in an environment that is protected by the Autonomic Control Plane [RFC8994], admission to which depends on an initial secure bootstrap process such as BRSKI [RFC8995]. Such an ACP can provide keying material for mutual authentication between ASAs as well as confidential communication channels for messages between ASAs. In some deployments, a secure partition of the link layer might be used instead. However, this does not relieve ASAs of responsibility for security. When ASAs configure or manage network elements outside the ACP, potentially in a different physical node, they must interact with other non-autonomic software components to perform their management functions. The details are specific to each case, but this has an important security implication. An ASA might act as a loophole by which the managed entity could penetrate the security boundary of the ANI. Thus, ASAs must be designed to avoid loopholes such as passing on executable code, and should if possible operate in an unprivileged mode. In particular, they must use secure coding practices, e.g., carefully validate all incoming information and avoid unnecessary elevation of privilege. This will apply in particular when an ASA interacts with a management component such as a NETCONF server.

A similar situation will arise if an ASA acts as a gateway between two separate autonomic networks, i.e. it has access to two separate ACPs. Such an ASA must also be designed to avoid loopholes and to validate incoming information from both sides.

As appropriate to their specific functions, ASAs should take account of relevant privacy considerations [RFC6973].

The initial version of the autonomic infrastructure assumes that all autonomic nodes are trusted by virtue of their admission to the ACP. ASAs are therefore trusted to manipulate any GRASP objective, simply because they are installed on a node that has successfully joined the ACP. In the general case, a node may have multiple roles and a role may use multiple ASAs, each using multiple GRASP objectives. Additional mechanisms for the fine-grained authorization of nodes and ASAs to manipulate specific GRASP objectives could be designed. Independently of this, interfaces between ASAs and the router configuration and monitoring services of the node can be subject to authentication that provides more fine-grained authorization for specific services. These additional authentication parameters could be passed to an ASA during its instantiation phase.

10. IANA Considerations

This document makes no request of the IANA.

11. Acknowledgements

Valuable comments were received from Michael Behringer, Menachem Dodge, Martin Dürst, Toerless Eckert, Thomas Fossati, Alex Galis, Bing Liu, Michael Richardson, and Rob Wilton.

12. References

12.1. Normative References

[RFC8949]
Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, , <https://www.rfc-editor.org/info/rfc8949>.
[RFC8990]
Bormann, C., Carpenter, B., Ed., and B. Liu, Ed., "GeneRic Autonomic Signaling Protocol (GRASP)", RFC 8990, DOI 10.17487/RFC8990, , <https://www.rfc-editor.org/info/rfc8990>.
[RFC8994]
Eckert, T., Ed., Behringer, M., Ed., and S. Bjarnason, "An Autonomic Control Plane (ACP)", RFC 8994, DOI 10.17487/RFC8994, , <https://www.rfc-editor.org/info/rfc8994>.
[RFC8995]
Pritikin, M., Richardson, M., Eckert, T., Behringer, M., and K. Watsen, "Bootstrapping Remote Secure Key Infrastructure (BRSKI)", RFC 8995, DOI 10.17487/RFC8995, , <https://www.rfc-editor.org/info/rfc8995>.

12.2. Informative References

[DeMola06]
De Mola, F. and R. Quitadamo, "An Agent Model for Future Autonomic Communications", Proceedings of the 7th WOA 2006 Workshop From Objects to Agents 51-59, .
[GANA13]
"Autonomic network engineering for the self-managing Future Internet (AFI): GANA Architectural Reference Model for Autonomic Networking, Cognitive Networking and Self-Management.", , <http://www.etsi.org/deliver/etsi_gs/AFI/001_099/002/01.01.01_60/gs_afi002v010101p.pdf>.
[Huebscher08]
Huebscher, M. C. and J. A. McCann, "A survey of autonomic computing - degrees, models, and applications", ACM Computing Surveys (CSUR) Volume 40 Issue 3 DOI: 10.1145/1380584.1380585, .
[I-D.ciavaglia-anima-coordination]
Ciavaglia, L. and P. Pierre, "Autonomic Functions Coordination", Work in Progress, Internet-Draft, draft-ciavaglia-anima-coordination-01, , <https://datatracker.ietf.org/doc/html/draft-ciavaglia-anima-coordination-01>.
[I-D.ietf-core-yang-cbor]
Veillette, M., Petrov, I., Pelov, A., Bormann, C., and M. Richardson, "CBOR Encoding of Data Modeled with YANG", Work in Progress, Internet-Draft, draft-ietf-core-yang-cbor-17, , <https://datatracker.ietf.org/doc/html/draft-ietf-core-yang-cbor-17>.
[I-D.irtf-nmrg-ibn-concepts-definitions]
Clemm, A., Ciavaglia, L., Granville, L. Z., and J. Tantsura, "Intent-Based Networking - Concepts and Definitions", Work in Progress, Internet-Draft, draft-irtf-nmrg-ibn-concepts-definitions-06, , <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-ibn-concepts-definitions-06>.
[I-D.peloso-anima-autonomic-function]
Pierre, P. and L. Ciavaglia, "A Day in the Life of an Autonomic Function", Work in Progress, Internet-Draft, draft-peloso-anima-autonomic-function-01, , <https://datatracker.ietf.org/doc/html/draft-peloso-anima-autonomic-function-01>.
[IPJ]
Behringer, M., Bormann, C., Carpenter, B. E., Eckert, T., Campos Nobre, J., Jiang, S., Li, Y., and M. C. Richardson, "Autonomic Networking Gets Serious", The Internet Protocol Journal Volume: 24 , Issue: 3, ISSN 1944-1134, Page(s): 2 - 18, , <https://ipj.dreamhosters.com/wp-content/uploads/2021/10/243-ipj.pdf>.
[Movahedi12]
Movahedi, Z., Ayari, M., Langar, R., and G. Pujolle, "A Survey of Autonomic Network Architectures and Evaluation Criteria", IEEE Communications Surveys & Tutorials Volume: 14 , Issue: 2 DOI: 10.1109/SURV.2011.042711.00078, Page(s): 464 - 490, .
[RFC6241]
Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., and A. Bierman, Ed., "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, , <https://www.rfc-editor.org/info/rfc6241>.
[RFC6973]
Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., Morris, J., Hansen, M., and R. Smith, "Privacy Considerations for Internet Protocols", RFC 6973, DOI 10.17487/RFC6973, , <https://www.rfc-editor.org/info/rfc6973>.
[RFC7575]
Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic Networking: Definitions and Design Goals", RFC 7575, DOI 10.17487/RFC7575, , <https://www.rfc-editor.org/info/rfc7575>.
[RFC7665]
Halpern, J., Ed. and C. Pignataro, Ed., "Service Function Chaining (SFC) Architecture", RFC 7665, DOI 10.17487/RFC7665, , <https://www.rfc-editor.org/info/rfc7665>.
[RFC8368]
Eckert, T., Ed. and M. Behringer, "Using an Autonomic Control Plane for Stable Connectivity of Network Operations, Administration, and Maintenance (OAM)", RFC 8368, DOI 10.17487/RFC8368, , <https://www.rfc-editor.org/info/rfc8368>.
[RFC8568]
Bernardos, CJ., Rahman, A., Zuniga, JC., Contreras, LM., Aranda, P., and P. Lynch, "Network Virtualization Research Challenges", RFC 8568, DOI 10.17487/RFC8568, , <https://www.rfc-editor.org/info/rfc8568>.
[RFC8991]
Carpenter, B., Liu, B., Ed., Wang, W., and X. Gong, "GeneRic Autonomic Signaling Protocol Application Program Interface (GRASP API)", RFC 8991, DOI 10.17487/RFC8991, , <https://www.rfc-editor.org/info/rfc8991>.
[RFC8992]
Jiang, S., Ed., Du, Z., Carpenter, B., and Q. Sun, "Autonomic IPv6 Edge Prefix Management in Large-Scale Networks", RFC 8992, DOI 10.17487/RFC8992, , <https://www.rfc-editor.org/info/rfc8992>.
[RFC8993]
Behringer, M., Ed., Carpenter, B., Eckert, T., Ciavaglia, L., and J. Nobre, "A Reference Model for Autonomic Networking", RFC 8993, DOI 10.17487/RFC8993, , <https://www.rfc-editor.org/info/rfc8993>.
[ZSM009-1]
"Zero-touch network and Service Management (ZSM); Closed-Loop Automation; Part 1: Enablers", , <https://www.etsi.org/deliver/etsi_gs/ZSM/001_099/00901/01.01.01_60/gs_ZSM00901v010101p.pdf>.

Appendix A. Change log

This section is to be removed before publishing as an RFC.

draft-ietf-anima-asa-guidelines-05, 2021-12-20:

draft-ietf-anima-asa-guidelines-04, 2021-11-20:

draft-ietf-anima-asa-guidelines-03, 2021-11-07:

draft-ietf-anima-asa-guidelines-02, 2021-09-13:

draft-ietf-anima-asa-guidelines-01, 2021-06-27:

draft-ietf-anima-asa-guidelines-00, 2020-11-14:

draft-carpenter-anima-asa-guidelines-09, 2020-07-25:

draft-carpenter-anima-asa-guidelines-08, 2020-01-10:

draft-carpenter-anima-asa-guidelines-07, 2019-07-17:

draft-carpenter-anima-asa-guidelines-06, 2018-01-07:

draft-carpenter-anima-asa-guidelines-05, 2018-06-30:

draft-carpenter-anima-asa-guidelines-04, 2018-03-03:

draft-carpenter-anima-asa-guidelines-03, 2017-10-25:

draft-carpenter-anima-asa-guidelines-02, 2017-07-01:

draft-carpenter-anima-asa-guidelines-01, 2017-01-06:

draft-carpenter-anima-asa-guidelines-00, 2016-09-30:

Appendix B. Terminology

This appendix summarises various acronyms and terminology used in the document. Where no other reference is given, please consult [RFC8993] or [RFC7575].

Appendix C. Example Logic Flows

This appendix describes generic logic flows that combine to act as an Autonomic Service Agent (ASA) for resource management. Note that these are illustrative examples, and in no sense requirements. As long as the rules of GRASP are followed, a real implementation could be different. The reader is assumed to be familiar with GRASP [RFC8990] and its conceptual API [RFC8991].

A complete autonomic function for a distributed resource will consist of a number of instances of the ASA placed at relevant points in a network. Specific details will of course depend on the resource concerned. One example is IP address prefix management, as specified in [RFC8992]. In this case, an instance of the ASA will exist in each delegating router.

An underlying assumption is that there is an initial source of the resource in question, referred to here as an origin ASA. The other ASAs, known as delegators, obtain supplies of the resource from the origin, and then delegate quantities of the resource to consumers that request it, and recover it when no longer needed.

Another assumption is there is a set of network wide policy parameters, which the origin will provide to the delegators. These parameters will control how the delegators decide how much resource to provide to consumers. Thus, the ASA logic has two operating modes: origin and delegator. When running as an origin, it starts by obtaining a quantity of the resource from the NOC, and it acts as a source of policy parameters, via both GRASP flooding and GRASP synchronization. (In some scenarios, flooding or synchronization alone might be sufficient, but this example includes both.)

When running as a delegator, it starts with an empty resource pool, it acquires the policy parameters by GRASP synchronization, and it delegates quantities of the resource to consumers that request it. Both as an origin and as a delegator, when its pool is low it seeks quantities of the resource by requesting GRASP negotiation with peer ASAs. When its pool is sufficient, it hands out resource to peer ASAs in response to negotiation requests. Thus, over time, the initial resource pool held by the origin will be shared among all the delegators according to demand.

In theory a network could include any number of origins and any number of delegators, with the only condition being that each origin's initial resource pool is unique. A realistic scenario is to have exactly one origin and as many delegators as you like. A scenario with no origin is useless.

An implementation requirement is that resource pools are kept in stable storage. Otherwise, if a delegator exits for any reason, all the resources it has obtained or delegated are lost. If an origin exits, its entire spare pool is lost. The logic for using stable storage and for crash recovery is not included in the pseudocode below, which focuses on communication between ASAs. Since GRASP operations are not intrinsically idempotent, data integrity during failure scenarios is the responsibility of the ASA designer. This is a complex topic in its own right that is not discussed in the present document.

The description below does not implement GRASP's 'dry run' function. That would require temporarily marking any resource handed out in a dry run negotiation as reserved, until either the peer obtains it in a live run, or a suitable timeout occurs.

The main data structures used in each instance of the ASA are:

Possible main logic flows are below, using a threaded implementation model. The transformation to an event loop model should be apparent - each thread would correspond to one event in the event loop.

The GRASP objectives are as follows:

In the outline logic flows below, these objectives are represented simply by their names.

MAIN PROGRAM:

Create empty resource_pool (and an associated lock)
Create empty delegated_list
Determine whether to act as origin
if origin:
    Obtain initial resource_pool contents from NOC
    Obtain value of EX1.Params from NOC
Register ASA with GRASP
Register GRASP objectives EX1.Resource and EX1.Params
if origin:
    Start FLOODER thread to flood EX1.Params
    Start SYNCHRONIZER listener for EX1.Params
Start MAIN_NEGOTIATOR thread for EX1.Resource
if not origin:
    Obtain value of EX1.Params from GRASP flood or synchronization
    Start DELEGATOR thread
Start GARBAGE_COLLECTOR thread
good_peer = none
do forever:
    if resource_pool is low:
        Calculate amount A of resource needed
        Discover peers using GRASP M_DISCOVER / M_RESPONSE
        if good_peer in peers:
            peer = good_peer
        else:
            peer =  #any choice among peers
            grasp.request_negotiate("EX1.Resource", peer)
            #i.e., send negotiation request
            Wait for response (M_NEGOTIATE, M_END or M_WAIT)
            if OK:
                if offered amount of resource sufficient:
                    Send M_END + O_ACCEPT #negotiation succeeded
                    Add resource to pool
                    good_peer = peer      #remember this choice
                else:
                    Send M_END + O_DECLINE #negotiation failed
    sleep() #periodic timer suitable for application scenario
MAIN_NEGOTIATOR thread:

do forever:
    grasp.listen_negotiate("EX1.Resource")
    #i.e., wait for negotiation request
    Start a separate new NEGOTIATOR thread for requested amount A
NEGOTIATOR thread:

Request resource amount A from resource_pool
if not OK:
    while not OK and A > Amin:
        A = A-1
        Request resource amount A from resource_pool
if OK:
    Offer resource amount A to peer by GRASP M_NEGOTIATE
    if received M_END + O_ACCEPT:
        #negotiation succeeded
    elif received M_END + O_DECLINE or other error:
        #negotiation failed
else:
    Send M_END + O_DECLINE #negotiation failed
#thread exits
DELEGATOR thread:

do forever:
    Wait for request or release for resource amount A
    if request:
        Get resource amount A from resource_pool
        if OK:
            Delegate resource to consumer #atomic
            Record in delegated_list      #operation
        else:
            Signal failure to consumer
            Signal main thread that resource_pool is low
    else:
        Delete resource from delegated_list
        Return resource amount A to resource_pool
SYNCHRONIZER thread:

do forever:
    Wait for  M_REQ_SYN message for EX1.Params
    Reply with M_SYNCH message for EX1.Params
FLOODER thread:

do forever:
    Send M_FLOOD message for EX1.Params
    sleep() #periodic timer suitable for application scenario

GARBAGE_COLLECTOR thread:

do forever:
    Search resource_pool for adjacent resources
    Merge adjacent resources
    sleep() #periodic timer suitable for application scenario

Authors' Addresses

Brian Carpenter
School of Computer Science
University of Auckland
PB 92019
Auckland 1142
New Zealand
Laurent Ciavaglia
Rakuten Mobile
Paris
France
Sheng Jiang
Huawei Technologies Co., Ltd
Q14 Huawei Campus
156 Beiqing Road
Hai-Dian District
Beijing
100095
China
Pierre Peloso
Nokia
Villarceaux
91460 Nozay
France