Network Working Group                                     S. Randriamasy
Internet-Draft                                           Nokia Bell Labs
Intended status: Informational                           L. M. Contreras
Expires: 5 September 2024                                     Telefonica
                                                           J. Ros-Giralt
                                                   Qualcomm Europe, Inc.
                                                               R. Schott
                                                        Deutsche Telekom
                                                            4 March 2024


 Joint Exposure of Network and Compute Information for Infrastructure-
                        Aware Service Deployment
            draft-rcr-opsawg-operational-compute-metrics-02

Abstract

   Service providers are starting to deploy computing capabilities
   across the network for hosting applications such as distributed AI
   workloads, AR/VR, vehicle networks, and IoT, among others.  In this
   network-compute environment, knowing information about the
   availability and state of the underlying communication and compute
   resources is necessary to determine both the proper deployment
   location of the applications and the most suitable servers on which
   to run them.  Further, this information is used by numerous use cases
   with different interpretations.  This document proposes an initial
   approach towards a common understanding and exposure scheme for
   metrics reflecting compute and communication capabilities.

About This Document

   This note is to be removed before publishing as an RFC.

   The latest revision of this draft can be found at
   https://giralt.github.io/draft-rcr-opsawg-operational-compute-
   metrics/draft-rcr-opsawg-operational-compute-metrics.html.  Status
   information for this document may be found at
   https://datatracker.ietf.org/doc/draft-rcr-opsawg-operational-
   compute-metrics/.

   Source for this draft and an issue tracker can be found at
   https://github.com/giralt/draft-rcr-opsawg-operational-compute-
   metrics.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.


Randriamasy, et al.     Expires 5 September 2024                [Page 1]

Internet-Draft             TODO - Abbreviation                March 2024


   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 5 September 2024.

Copyright Notice

   Copyright (c) 2024 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Conventions and Definitions . . . . . . . . . . . . . . . . .   4
   3.  Problem Space and Needs . . . . . . . . . . . . . . . . . . .   4
   4.  Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Distributed AI Workloads  . . . . . . . . . . . . . . . .   6
     4.2.  Open Abstraction for Edge Computing . . . . . . . . . . .   8
     4.3.  Optimized Placement of Microservice Components  . . . . .   9
   5.  Production and Consumption Scenarios of Compute-related
           Information . . . . . . . . . . . . . . . . . . . . . . .   9
     5.1.  Producers of Compute-Related Information  . . . . . . . .   9
     5.2.  Consumers of Compute-Related Information  . . . . . . . .  10
   6.  Metrics Exposure  . . . . . . . . . . . . . . . . . . . . . .  10
     6.1.  Edge Resources  . . . . . . . . . . . . . . . . . . . . .  11
     6.2.  Network Resources . . . . . . . . . . . . . . . . . . . .  11
     6.3.  Cloud Resources . . . . . . . . . . . . . . . . . . . . .  11
   7.  Related Work  . . . . . . . . . . . . . . . . . . . . . . . .  12
   8.  Guiding Principles  . . . . . . . . . . . . . . . . . . . . .  13
   9.  GAP Analysis  . . . . . . . . . . . . . . . . . . . . . . . .  13
   10. Security Considerations . . . . . . . . . . . . . . . . . . .  14
   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  14


Randriamasy, et al.     Expires 5 September 2024                [Page 2]

Internet-Draft             TODO - Abbreviation                March 2024


   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  14
     12.1.  Normative References . . . . . . . . . . . . . . . . . .  14
     12.2.  Informative References . . . . . . . . . . . . . . . . .  15
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .  16
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

1.  Introduction

   Operators are starting to deploy distributed computing environments
   in different parts of the network with the objective of addressing
   different service needs including latency, bandwidth, processing
   capabilities, storage, etc.  This translates in the emergence of a
   number of data centers (both in the cloud and at the edge) of
   different sizes (e.g., large, medium, small) characterized by
   distinct dimension of CPUs, memory, and storage capabilities, as well
   as bandwidth capacity for forwarding the traffic generated in and out
   of the corresponding data center.

   The proliferation of the edge computing paradigm further increases
   the potential footprint and heterogeneity of the environments where a
   function or application can be deployed, resulting in different
   unitary cost per CPU, memory, and storage.  This increases the
   complexity of deciding the location where a given function or
   application should be best deployed or executed.  This decision
   should be jointly influenced on the one hand by the available
   resources in a given computing environment, and on the other hand by
   the capabilities of the network path connecting the traffic source
   with the destination.

   Network and compute aware function placement and selection has become
   of utmost importance in the last decade.  The availability of such
   information is taken for granted by the numerous service providers
   and bodies that are specifying them.  However, deployments may reach
   out to data centers running different implementations with different
   understandings and representations of compute capabilities and smooth
   operation is a challenge.  While standardization efforts on network
   capabilities representation and exposure are well-advanced, similar
   efforts on compute capabilitites are in their infancy.

   This document proposes an initial approach towards a common
   understanding and exposure scheme for metrics reflecting compute
   capabilities.  It aims at leveraging on existing work in the IETF on
   compute metrics definitions to build synergies.  It also aims at
   reaching out to working or research groups in the IETF that would
   consume such information and have particular requirements.


Randriamasy, et al.     Expires 5 September 2024                [Page 3]

Internet-Draft             TODO - Abbreviation                March 2024


2.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Problem Space and Needs

   Visibility and exposure of both (1) network and (2) compute resources
   to the application is critical to enable the proper functioning of
   the new class of services arising at the edge (e.g., distributed AI,
   driverless vehicles, AR/VR, etc.).  To understand the problem space
   and the capabilities that are lacking in today's protocol interfaces
   needed to enable these new services, we focus on the life cycle of a
   service.

   At the edge, compute nodes are deployed near communication nodes
   (e.g., co-located in a 5G base station) to provide computing services
   that are close to users with the goal to (1) reduce latency, (2)
   increase communication bandwidth, (3) enable privacy/personalization
   (e.g., federated AI learning), and (4) reduce cloud costs and energy.
   Services are deployed on the communication and compute infrastructure
   through a two-phase life cycle that involves first a service
   _deployment stage_ and then a _service selection_ stage (Figure 1).

    +-------------+      +--------------+      +-------------+
    |             |      |              |      |             |
    |  New        +------>  Service     +------>  Service    |
    |  Service    |      |  Deployment  |      |  Selection  |
    |             |      |              |      |             |
    +-------------+      +--------------+      +-------------+

                       Figure 1: Service life cycle.

   *Service deployment.* This phase is carried out by the service
   provider, and consists in the deployment of a new service (e.g., a
   distributed AI training/inference, an XR/AR service, etc.) on the
   communication and compute infrastructure.  The service provider needs
   to properly size the amount of communication and compute resources
   assigned to this new service to meet the expected user demand.  The
   decision on where the service is deployed and how many resources are
   requested from the infrastructure depends on the levels of QoE that
   the provider wants to guarantee to the user base.  To make a proper
   deployment decision, the provider must have visibility on the
   resources available from the infrastructure, including communication
   resources (e.g., latency and bandwidth) and compute (e.g., CPU, GPU,


Randriamasy, et al.     Expires 5 September 2024                [Page 4]

Internet-Draft             TODO - Abbreviation                March 2024


   memory, storage).  For instance, to run a Large Language Model (LLM)
   with 175 billion parameters, a total aggregated memory of 400GB and 8
   GPUs are needed.  The service provider needs an interface to query
   the infrastructure, extract the available compute and communication
   resources, and decide which subset of resources are needed to run the
   service.

   *Service selection.* This phase is initiated by the user, through a
   client application that connects to the deployed service.  There are
   two main decisions that must be performed in the service selection
   stage: compute node selection and path selection.  In the compute
   node selection step, as the service is generally replicated in N
   locations (e.g., by leveraging a microservices architecture), the
   application must decide which of the service replicas it connects to.
   Similar to the service deployment stage, this decision requires
   knowledge about communication and compute resources available in each
   replica.  On the other hand, in the path selection decision, the
   application must decide which path it chooses to connect to the
   service.  This decision depends on the communication properties
   (e.g., bandwidth and latency) of the available paths.  Similar to the
   service deployment case, the service provider needs an interface to
   query the infrastructure and extract the available compute and
   communication resources, with the goal to make informed node and path
   selection decisions.  It is also important to note that, ideally, the
   node and path selection decisions should be jointly optimized, since
   in general the best end-to-end performance is achieved by jointly
   taking into account both decisions.  In some cases, however, such
   decisions may be owned by different players.  For instance, in some
   network environments, the path selection may be decided by the
   network operator, wheres the node selection may be decided by the
   application.  Even in these cases, it is crucial to have a proper
   interface (for both the network operator and the service provider) to
   query the available compute and communication resources from the
   system.

   Table 1 summarizes the problem space, the information that needs to
   be exposed, and the stakeholders that need this information.


Randriamasy, et al.     Expires 5 September 2024                [Page 5]

Internet-Draft             TODO - Abbreviation                March 2024


     +====================+===============+==========================+
     |     Action to take |  Information  | Who needs it             |
     |                    |     needed    |                          |
     +====================+===============+==========================+
     |  Service placement |  Compute and  | Service provider         |
     |                    | communication |                          |
     +--------------------+---------------+--------------------------+
     | Service selection/ |    Compute    | Network/service provider |
     |     node selection |               | and/or application       |
     +--------------------+---------------+--------------------------+
     | Service selection/ | Communication | Network/service and/or   |
     |     path selection |               | application              |
     +--------------------+---------------+--------------------------+

              Table 1: Problem space, needs, and stakeholders.

4.  Use Cases

4.1.  Distributed AI Workloads

   Generative AI is a technological feat that opens up many applications
   such as holding conversations, generating art, developing a research
   paper, or writing software, among many others.  Yet this innovation
   comes with a high cost in terms of processing and power consumption.
   While data centers are already running at capacity, it is projected
   that transitioning current search engine queries to leverage
   generative AI will increase costs by 10 times compared to traditional
   search methods [DC-AI-COST].  As (1) computing nodes (CPUs and GPUs)
   are deployed to build the edge cloud through technologies like 5G and
   (2) with billions of mobile user devices globally providing a large
   untapped computational platform, shifting part of the processing from
   the cloud to the edge becomes a viable and necessary step towards
   enabling the AI-transition.  There are at least four drivers
   supporting this trend:

   *  Computational and energy savings: Due to savings from not needing
      large-scale cooling systems and the high performance-per-watt
      efficiency of the edge devices, some workloads can run at the edge
      at a lower computational and energy cost [EDGE-ENERGY], especially
      when considering not only processing but also data transport.

   *  Latency: For applications such as driverless vehicles which
      require real-time inference at very low latency, running at the
      edge is necessary.


Randriamasy, et al.     Expires 5 September 2024                [Page 6]

Internet-Draft             TODO - Abbreviation                March 2024


   *  Reliability and performance: Peaks in cloud demand for generative
      AI queries can create large queues and latency, and in some cases
      even lead to denial of service.  In some cases, limited or no
      connectivity requires running the workloads at the edge.

   *  Privacy, security, and personalization: A "private mode" allows
      users to strictly utilize on-device (or near-the-device) AI to
      enter sensitive prompts to chatbots, such as health questions or
      confidential ideas.

   These drivers lead to a distributed computational model that is
   hybrid: Some AI workloads will fully run in the cloud, some will
   fully run in the edge, and some will run both in the edge and in the
   cloud.  Being able to efficiently run these workloads in this hybrid,
   distributed, cloud-edge environment is necessary given the
   aforementioned massive energy and computational costs.  To make
   optimized service and workload placement decisions, information about
   both the compute and communication resources available in the network
   is necessary too.

   Consider as an example a large language model (LLM) used to generate
   text and hold intelligent conversations.  LLMs produce a single token
   per inference, where a token is almost equivalent to a word.
   Pipelining and parallelization techniques are used to optimize
   inference, but this means that a model like GPT-3 could potentially
   go through all 175 billion parameters that are part of it to generate
   a single word.  To efficiently run these computational-intensive
   workloads, it is necessary to know the availability of compute
   resources in the distributed system.  Suppose that a user is driving
   a car while conversing with an AI model.  The model can run inference
   on a variety of compute nodes, ordered from lower to higher compute
   power as follows: (1) the user's phone, (2) the computer in the car,
   (3) the 5G edge cloud, and (4) the datacenter cloud.
   Correspondingly, the system can deploy four different models with
   different levels of precision and compute requirements.  The simplest
   model with the least parameters can run in the phone, requiring less
   compute power but yielding lower accuracy.  Three other models
   ordered in increasing value of accuracy and computational complexity
   can run in the car, the edge, and the cloud.  The application can
   identify the right trade-off between accuracy and computational cost,
   combined with metrics of communication bandwidth and latency, to make
   the right decision on which of the four models to use for every
   inference request.  Note that this is similar to the resolution/
   bandwidth trade-off commonly found in the image encoding problem,
   where an image can be encoded and transmitted at different levels of
   resolution depending on the available bandwidth in the communication
   channel.  In the case of AI inference, however, not only bandwidth is
   a scarce resource, but also compute.  ALTO extensions to support the


Randriamasy, et al.     Expires 5 September 2024                [Page 7]

Internet-Draft             TODO - Abbreviation                March 2024


   exposure of compute resources would allow applications to make
   optimized decisions on selecting the right computational resource,
   supporting the efficient execution of hybrid AI workloads.

4.2.  Open Abstraction for Edge Computing

   Modern applications such as AR/VR, V2X, or IoT, require bringing
   compute closer to the edge in order to meet strict bandwidth,
   latency, and jitter requirements.  While this deployment process
   resembles the path taken by the main cloud providers (notably, AWS,
   Facebook, Google and Microsoft) to deploy their large-scale
   datacenters, the edge presents a key difference: datacenter clouds
   (both in terms of their infrastructure and the applications run by
   them) are owned and managed by a single organization, whereas edge
   clouds involve a complex ecosystem of operators, vendors, and
   application providers, all striving to provide a quality end-to-end
   solution to the user.  This implies that, while the traditional cloud
   has been implemented for the most part by using vertically optimized
   and closed architectures, the edge will necessarily need to rely on a
   complete ecosystem of carefully designed open standards to enable
   horizontal interoperability across all the involved parties.  This
   document envisions ALTO playing a role as part of the ecosystem of
   open standards that are necessary to deploy and operate the edge
   cloud.

   As an example, consider a user of an XR application who arrives at
   his/her home by car.  The application runs by leveraging compute
   capabilities from both the car and the public 5G edge cloud.  As the
   user parks the car, 5G coverage may diminish (due to building
   interference) making the home local Wi-Fi connectivity a better
   choice.  Further, instead of relying on computational resources from
   the car and the 5G edge cloud, latency can be reduced by leveraging
   computing devices (PCs, laptops, tablets) available from the home
   edge cloud.  The application's decision to switch from one domain to
   another, however, demands knowledge about the compute and
   communication resources available both in the 5G and the Wi-Fi
   domains, therefore requiring interoperability across multiple
   industry standards (for instance, IETF and 3GPP on the public side,
   and IETF and LF Edge [LF-EDGE] on the private home side).  ALTO can
   be positioned to act as an abstraction layer supporting the exposure
   of communication and compute information independently of the type of
   domain the application is currently residing in.

   Future versions of this document will elaborate further on this use
   case.


Randriamasy, et al.     Expires 5 September 2024                [Page 8]

Internet-Draft             TODO - Abbreviation                March 2024


4.3.  Optimized Placement of Microservice Components

   Current applications are transitioning from a monolithic service
   architecture towards the composition of microservice components,
   following cloud-native trends.  The set of microservices can have
   associated SLOs which impose constraints not only in terms of
   required compute resources (CPU, storage, ...) dependent on the
   compute facilities available, but also in terms of performance
   indicators such as latency, bandwidth, etc, which impose restrictions
   in the networking capabilities connecting the computing facilities.
   Even more complex constrains, such as affinity among certain
   microservices components could require complex calculations for
   selecting the most appropriate compute nodes taken into consideration
   both network and compute information.

   Thus, service/application orchestrators can benefit from the
   information exposed by ALTO at the time of deciding the placement of
   the microservices in the network.

5.  Production and Consumption Scenarios of Compute-related Information

   It is important to understand the scenarios of production and
   consumption of compute-related information in combination with
   information related to communication.  Leveraging such combination
   enables the possibility of resource and workload placement
   optimization, leading to both operational cost reductions to the
   operator and service provider as well as an improvement on the
   service level experienced by the users.

5.1.  Producers of Compute-Related Information

   The information relative to compute (i.e., processing capabilities,
   memory, and storage capacity) can be structured in two ways.  On one
   hand, the information corresponding to the raw compute resources; on
   the other hand, the information of resources allocated or in use by a
   specific application or service function.

   The former is typically provided by the management systems enabling
   the virtualization of the physical resources for a later assignment
   to processes running on top.  Cloud Managers or Virtual
   Infrastructure Managers are the entities which manage those
   resources.  These management systems offer APIs from where to
   retrieve the available resources in a compute facility.  Thus, it can
   be expected that these APIs can be used for the consumption of such
   information.  Once the raw resources are retrieved from the various
   compute facilities, it could be possible to generate topological
   network views of them, as being proposed in
   [I-D.llc-teas-dc-aware-topo-model].


Randriamasy, et al.     Expires 5 September 2024                [Page 9]

Internet-Draft             TODO - Abbreviation                March 2024


   Regarding the resources allocated or in use by a specific application
   or service function, two situations apply.  The total allocation and
   the allocation per service or application.  In the first case, the
   information can be supplied by the virtualization management systems
   described before.  For the specific allocation per service, it can be
   expected that the specific management systems of the service or
   application is capable to provide the resources being used at run
   time typically as part of the allocated ones.  In this last scenario,
   it is also reasonable to expect the availability of APIs offering
   such information, even though it can be particular to a service or
   application.

5.2.  Consumers of Compute-Related Information

   The consumption of compute-related information is relative to the
   different phases of the service lifecycle.  This means that such
   information can be consumed in different points of time and for
   different purposes.

   The expected consumers can be both external or internal to the
   network.  As external consumers it is possible to consider external
   application management systems requiring resource availability
   information for service function placement decision, workload
   migration in the case of consuming raw resources, or requiring
   information on the usage of resources for service assurance or
   service scaling, among others.

   As internal consumers it is possible to consider network management
   entities requiring view on the level of resource usage for traffic
   steering (as the Path Selector in [I-D.ldbc-cats-framework]), load
   balance, or analytics, among others.

6.  Metrics Exposure

   Regarding metrics exposure one can distinguish the topics of (1) how
   the metrics are exposed and (2) which kind of metrics need to be
   exposed.  The infrastructure resources can be divided into network
   and compute related resources.  Network based resources can roughly
   be subdivided according to the network structure into edge, backbone,
   and cloud resources.

   This section intends to give a brief outlook regarding these
   resources for stimulating additional discussion with related work
   going on in other IETF working groups or standardization bodies.


Randriamasy, et al.     Expires 5 September 2024               [Page 10]

Internet-Draft             TODO - Abbreviation                March 2024


6.1.  Edge Resources

   Edge resources are referring to latency, bandwidth, compute latency
   or traffic breakout.

6.2.  Network Resources

   Network resources relate to the traditional network infrastructure.
   The next table provides an overview of some of the commonly used
   metrics.

                      +=========+==================+
                      | Network | Kind of Resource |
                      +=========+==================+
                      | Path #1 | QoS              |
                      +---------+------------------+
                      |         | Latency          |
                      +---------+------------------+
                      |         | Bandwidth        |
                      +---------+------------------+
                      |         | RTT              |
                      +---------+------------------+
                      |         | Packet Loss      |
                      +---------+------------------+
                      |         | Jitter           |
                      +---------+------------------+

                                 Table 2

6.3.  Cloud Resources

   The next table provides an example of parameters that could be
   exposed:


Randriamasy, et al.     Expires 5 September 2024               [Page 11]

Internet-Draft             TODO - Abbreviation                March 2024


         +============+=========+================================+
         | CPU        | Compute | Sum of available cpu resources |
         +============+=========+================================+
         | Memory     | Compute | Sum of available memory        |
         +------------+---------+--------------------------------+
         | Storage    | Storage | Sum of available storage       |
         +------------+---------+--------------------------------+
         | Configmaps | Object  | Sum of config maps             |
         +------------+---------+--------------------------------+
         | Secrets    | Object  | Sum of possible secrets        |
         +------------+---------+--------------------------------+
         | Pods       | Object  | Sum of possible pods           |
         +------------+---------+--------------------------------+
         | Jobs       | Object  | Sum of all parallel jobs       |
         +------------+---------+--------------------------------+
         | Services   | Object  | Sum of parallel services       |
         +------------+---------+--------------------------------+

                                  Table 3

7.  Related Work

   Some existing work has explored compute-related metrics.  They can be
   categorized as follows:

   *  References providing raw compute infrastructure metrics:
      [I-D.contreras-alto-service-edge] includes references to cloud
      management solutions (i.e., OpenStack, Kubernetes, etc) which
      administer the virtualization infrastructure, providing
      information about raw compute infrastructure metrics.
      Furthermore, [NFV-TST] describes processor, memory and network
      interface usage metrics.

   *  References providing compute virtualization metrics: [RFC7666]
      provides several metrics as part of the Management Information
      Base (MIB) definition for managing virtual machines controlled by
      a hypervisor.  The objects there defined make reference to the
      resources consumed by a particluar virtual machine serving as host
      for services or applications.  Moreover, [NFV-INF] provides
      metrics associated to virtualized network functions.

   *  References providing service metrics including compute-related
      information: [I-D.dunbar-cats-edge-service-metrics] proposes
      metrics associated to services running in compute infrastructures.
      Some of these metrics do not depend on the infrastructure behavior
      itself but from where such compute infrastructure is topologically
      located.


Randriamasy, et al.     Expires 5 September 2024               [Page 12]

Internet-Draft             TODO - Abbreviation                March 2024


8.  Guiding Principles

   The driving principles for designing an interface to jointly extract
   network and compute information are as follows:

   P1.  Leverage metrics across working groups to avoid reinventing the
   wheel.  For instance:

   *  RFC 9439 [I-D.ietf-alto-performance-metrics] leverages IPPM
      metrics from RFC 7679.

   *  Section 5.2 of [I-D.du-cats-computing-modeling-description]
      considers delay as a good metric, since it is easy to use in both
      compute and communication domains.  RFC 9439 also defines delay as
      part of the performance metrics.

   *  Section 6 of [I-D.du-cats-computing-modeling-description] proposes
      to represent the network structure as graphs, which is similar to
      the ALTO map services in [RFC7285].

   P2.  Aim for simplicity, while ensuring the combined efforts don’t
   leave technical gaps in supporting the full life cycle of service
   deployment and selection.  For instance, the CATS working group is
   covering path selection from a network standpoint, while ALTO (e.g.,
   [RFC7285]) covers exposing of network information to the service
   provider and the client application.  However, there is currently no
   effort being pursued to expose compute information to the service
   provider and the client application for service placement or
   selection.

9.  GAP Analysis

   From this related work it is evident that compute-related metrics can
   serve several purposes, ranging from service instance instantiation
   to service instance behavior, and then to service instance selection.
   Some of the metrics could refer to the same object (e.g., CPU) but
   with a particular usage and scope.

   In contrast, the network metrics are more uniform and
   straightforward.  It is then necessary to consistently define a set
   of metrics that could assist to the operation in the different
   concerns identified so far, so that networks and systems could have a
   common understanding of the perceived compute performance.  When
   combined with network metrics, the combined network plus compute
   performance behavior will assist informed decisions particular to
   each of the operational concerns related to the different parts of a
   service life cycle.


Randriamasy, et al.     Expires 5 September 2024               [Page 13]

Internet-Draft             TODO - Abbreviation                March 2024


10.  Security Considerations

   TODO Security

11.  IANA Considerations

   This document has no IANA actions.

12.  References

12.1.  Normative References

   [I-D.du-cats-computing-modeling-description]
              Du, Z., Fu, Y., Li, C., Huang, D., and Z. Fu, "Computing
              Information Description in Computing-Aware Traffic
              Steering", Work in Progress, Internet-Draft, draft-du-
              cats-computing-modeling-description-02, 23 October 2023,
              <https://datatracker.ietf.org/doc/html/draft-du-cats-
              computing-modeling-description-02>.

   [I-D.ietf-alto-performance-metrics]
              Wu, Q., Yang, Y. R., Lee, Y., Dhody, D., Randriamasy, S.,
              and L. M. Contreras, "Application-Layer Traffic
              Optimization (ALTO) Performance Cost Metrics", Work in
              Progress, Internet-Draft, draft-ietf-alto-performance-
              metrics-28, 21 March 2022,
              <https://datatracker.ietf.org/doc/html/draft-ietf-alto-
              performance-metrics-28>.

   [I-D.ldbc-cats-framework]
              Li, C., Du, Z., Boucadair, M., Contreras, L. M., and J.
              Drake, "A Framework for Computing-Aware Traffic Steering
              (CATS)", Work in Progress, Internet-Draft, draft-ldbc-
              cats-framework-06, 8 February 2024,
              <https://datatracker.ietf.org/doc/html/draft-ldbc-cats-
              framework-06>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

   [RFC7285]  Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S.,
              Previdi, S., Roome, W., Shalunov, S., and R. Woundy,
              "Application-Layer Traffic Optimization (ALTO) Protocol",
              RFC 7285, DOI 10.17487/RFC7285, September 2014,
              <https://www.rfc-editor.org/rfc/rfc7285>.


Randriamasy, et al.     Expires 5 September 2024               [Page 14]

Internet-Draft             TODO - Abbreviation                March 2024


   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

12.2.  Informative References

   [DC-AI-COST]
              "Generative AI Breaks The Data Center - Data Center
              Infrastructure And Operating Costs Projected To Increase
              To Over $76 Billion By 2028", Forbes, Tirias Research
              Report , 2023.

   [EDGE-ENERGY]
              "Estimating energy consumption of cloud, fog, and edge
              computing infrastructures", IEEE Transactions on
              Sustainable Computing , 2019.

   [I-D.contreras-alto-service-edge]
              Contreras, L. M., Randriamasy, S., Ros-Giralt, J., Perez,
              D. A. L., and C. E. Rothenberg, "Use of ALTO for
              Determining Service Edge", Work in Progress, Internet-
              Draft, draft-contreras-alto-service-edge-10, 13 October
              2023, <https://datatracker.ietf.org/doc/html/draft-
              contreras-alto-service-edge-10>.

   [I-D.dunbar-cats-edge-service-metrics]
              Dunbar, L., Majumdar, K., Mishra, G. S., Wang, H., and H.
              Song, "5G Edge Services Use Cases", Work in Progress,
              Internet-Draft, draft-dunbar-cats-edge-service-metrics-01,
              6 July 2023, <https://datatracker.ietf.org/doc/html/draft-
              dunbar-cats-edge-service-metrics-01>.

   [I-D.llc-teas-dc-aware-topo-model]
              Lee, Y., Liu, X., and L. M. Contreras, "DC aware TE
              topology model", Work in Progress, Internet-Draft, draft-
              llc-teas-dc-aware-topo-model-03, 10 July 2023,
              <https://datatracker.ietf.org/doc/html/draft-llc-teas-dc-
              aware-topo-model-03>.

   [LF-EDGE]  "Linux Foundation Edge", https://www.lfedge.org/ , March
              2023.

   [NFV-INF]  "ETSI GS NFV-INF 010, v1.1.1, Service Quality Metrics", 1
              December 2014, <https://www.etsi.org/deliver/etsi_gs/NFV-
              INF/001_099/010/01.01.01_60/gs_NFV-INF010v010101p.pdf>.


Randriamasy, et al.     Expires 5 September 2024               [Page 15]

Internet-Draft             TODO - Abbreviation                March 2024


   [NFV-TST]  "ETSI GS NFV-TST 008 V3.3.1, NFVI Compute and Network
              Metrics Specification", 1 June 2020,
              <https://www.etsi.org/deliver/etsi_gs/NFV-
              TST/001_099/008/03.03.01_60/gs_NFV-TST008v030301p.pdf>.

   [RFC7666]  Asai, H., MacFaden, M., Schoenwaelder, J., Shima, K., and
              T. Tsou, "Management Information Base for Virtual Machines
              Controlled by a Hypervisor", RFC 7666,
              DOI 10.17487/RFC7666, October 2015,
              <https://www.rfc-editor.org/rfc/rfc7666>.

Acknowledgments

   TODO acknowledge.

Authors' Addresses

   S. Randriamasy
   Nokia Bell Labs
   Email: sabine.randriamasy@nokia-bell-labs.com


   L. M. Contreras
   Telefonica
   Email: luismiguel.contrerasmurillo@telefonica.com


   Jordi Ros-Giralt
   Qualcomm Europe, Inc.
   Email: jros@qti.qualcomm.com


   Roland Schott
   Deutsche Telekom
   Email: Roland.Schott@telekom.de


Randriamasy, et al.     Expires 5 September 2024               [Page 16]