Joint Exposure of Network and Compute Information for Infrastructure-Aware Service Deployment

Internet-Draft	TODO - Abbreviation	October 2023
Randriamasy, et al.	Expires 25 April 2024	[Page]

Abstract

Service providers are starting to deploy computing capabilities across the network for hosting applications such as AR/VR, vehicle networks, IoT, and AI training, among others. In these distributed computing environments, information about computing and communication resources is necessary to determine both the proper deployment location of each application and the best server location on which to run it. This information is used by numerous different implementations with different interpretations. This document proposes an initial approach towards a common understanding and exposure scheme for metrics reflecting compute capabilities.¶

1. Introduction

Operators are starting to deploy distributed computing environments in different parts of the network with the objective of addressing different service needs including latency, bandwidth, processing capabilities, storage, etc. This translates in the emergence of a number of data centers (both in the cloud and at the edge) of different sizes (e.g., large, medium, small) characterized by distinct dimension of CPUs, memory, and storage capabilities, as well as bandwidth capacity for forwarding the traffic generated in and out of the corresponding data center.¶

The proliferation of the edge computing paradigm further increases the potential footprint and heterogeneity of the environments where a function or application can be deployed, resulting in different unitary cost per CPU, memory, and storage. This increases the complexity of deciding the location where a given function or application should be best deployed or executed. This decision should be jointly influenced on the one hand by the available resources in a given computing environment, and on the other hand by the capabilities of the network path connecting the traffic source with the destination.¶

Network and compute aware function placement and selection has become of utmost importance in the last decade. The availability of such information is taken for granted by the numerous service providers and bodies that are specifying them. However, deployments may reach out to data centers running different implementations with different understandings and representations of compute capabilities and smooth operation is a challenge. While standardization efforts on network capabilities representation and exposure are well-advanced, similar efforts on compute capabilitites are in their infancy.¶

This document proposes an initial approach towards a common understanding and exposure scheme for metrics reflecting compute capabilities. It aims at leveraging on existing work in the IETF on compute metrics definitions to build synergies. It also aims at reaching out to working or research groups in the IETF that would consume such information and have particular requirements.¶

3. Problem Space and Needs

Visibility and exposure of both (1) network and (2) compute resources to the application is critical to enable the proper functioning of the new class of services arising at the edge (e.g., distributed AI, driverless vehicles, AR/VR, etc.). To understand the problem space and the capabilities that are lacking in today's protocol interfaces needed to enable these new services, we focus on the life cycle of a service.¶

At the edge, compute nodes are deployed near communication nodes (e.g., co-located in a 5G base station) to provide computing services that are close to users with the goal to (1) reduce latency, (2) increase communication bandwidth, (3) enable privacy/personalization (e.g., federated AI learning), and (4) reduce cloud costs and energy. Services are deployed on the communication and compute infrastructure through a two-phase life cycle that involves first a service deployment stage and then a service selection stage (Figure 1).¶

 +-------------+      +--------------+      +-------------+
 |             |      |              |      |             |
 |  New        +------>  Service     +------>  Service    |
 |  Service    |      |  Deployment  |      |  Selection  |
 |             |      |              |      |             |
 +-------------+      +--------------+      +-------------+

Figure 1: Service life cycle.

Service deployment. This phase is carried out by the service provider, and consists in the deployment of a new service (e.g., a distributed AI training/inference, an XR/AR service, etc.) on the communication and compute infrastructure. The service provider needs to properly size the amount of communication and compute resources assigned to this new service to meet the expected user demand. The decision on where the service is deployed and how many resources are requested from the infrastructure depends on the levels of QoE that the provider wants to guarantee to the user base. To make a proper deployment decision, the provider must have visibility on the resources available from the infrastructure, including communication resources (e.g., latency and bandwidth) and compute (e.g., CPU, GPU, memory, storage). For instance, to run a Large Language Model (LLM) with 175 billion parameters, a total aggregated memory of 400GB and 8 GPUs are needed. The service provider needs an interface to query the infrastructure, extract the available compute and communication resources, and decide which subset of resources are needed to run the service.¶

Service selection. This phase is initiated by the user, through a client application that connects to the deployed service. There are two main decisions that must be performed in the service selection stage: compute node selection and path selection. In the compute node selection step, as the service is generally replicated in N locations (e.g., by leveraging a microservices architecture), the application must decide which of the service replicas it connects to. Similar to the service deployment stage, this decision requires knowledge about communication and compute resources available in each replica. On the other hand, in the path selection decision, the application must decide which path it chooses to connect to the service. This decision depends on the communication properties (e.g., bandwidth and latency) of the available paths. Similar to the service deployment case, the service provider needs an interface to query the infrastructure and extract the available compute and communication resources, with the goal to make informed node and path selection decisions. It is also important to note that, ideally, the node and path selection decisions should be jointly optimized, since in general the best end-to-end performance is achieved by jointly taking into account both decisions. In some cases, however, such decisions may be owned by different players. For instance, in some network environments, the path selection may be decided by the network operator, wheres the node selection may be decided by the application. Even in these cases, it is crucial to have a proper interface (for both the network operator and the service provider) to query the available compute and communication resources from the system.¶

Table 1 summarizes the problem space, the information that needs to be exposed, and the stakeholders that need this information.¶

Table 1: Problem space, needs, and stakeholders.
Action to take	Information needed	Who needs it
Service placement	Compute and communication	Service provider
Service selection/node selection	Compute	Network/service provider and/or application
Service selection/path selection	Communication	Network/service and/or application

4. Guiding Principles

The driving principles for designing an interface to jointly extract network and compute information are as follows:¶

P1. Leverage metrics across working groups to avoid reinventing the wheel. For instance:¶

RFC 9439 [I-D.ietf-alto-performance-metrics] leverages IPPM metrics from RFC 7679.¶
Section 5.2 of [I-D.du-cats-computing-modeling-description] considers delay as a good metric, since it is easy to use in both compute and communication domains. RFC 9439 also defines delay as part of the performance metrics.¶
Section 6 of [I-D.du-cats-computing-modeling-description] proposes to represent the network structure as graphs, which is similar to the ALTO map services in [RFC7285].¶

P2. Aim for simplicity, while ensuring the combined efforts don’t leave technical gaps in supporting the full life cycle of service deployment and selection. For instance, the CATS working group is covering path selection from a network standpoint, while ALTO (e.g., [RFC7285]) covers exposing of network information to the service provider and the client application. However, there is currently no effort being pursued to expose compute information to the service provider and the client application for service placement or selection.¶

9. References

9.1. Normative References

[I-D.du-cats-computing-modeling-description]: Du, Z., Fu, Y., Li, C., Huang, D., and Z. Fu, "Computing Information Description in Computing-Aware Traffic Steering", Work in Progress, Internet-Draft, draft-du-cats-computing-modeling-description-02, 23 October 2023, <https://datatracker.ietf.org/doc/html/draft-du-cats-computing-modeling-description-02>.
[I-D.ietf-alto-performance-metrics]: Wu, Q., Yang, Y. R., Lee, Y., Dhody, D., Randriamasy, S., and L. M. Contreras, "Application-Layer Traffic Optimization (ALTO) Performance Cost Metrics", Work in Progress, Internet-Draft, draft-ietf-alto-performance-metrics-28, 21 March 2022, <https://datatracker.ietf.org/doc/html/draft-ietf-alto-performance-metrics-28>.
[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC7285]: Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S., Previdi, S., Roome, W., Shalunov, S., and R. Woundy, "Application-Layer Traffic Optimization (ALTO) Protocol", RFC 7285, DOI 10.17487/RFC7285, September 2014, <https://www.rfc-editor.org/rfc/rfc7285>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

9.2. Informative References

[I-D.contreras-alto-service-edge]: Contreras, L. M., Randriamasy, S., Ros-Giralt, J., Perez, D. A. L., and C. E. Rothenberg, "Use of ALTO for Determining Service Edge", Work in Progress, Internet-Draft, draft-contreras-alto-service-edge-10, 13 October 2023, <https://datatracker.ietf.org/doc/html/draft-contreras-alto-service-edge-10>.
[I-D.dunbar-cats-edge-service-metrics]: Dunbar, L., Majumdar, K., Mishra, G. S., Wang, H., and H. Song, "5G Edge Services Use Cases", Work in Progress, Internet-Draft, draft-dunbar-cats-edge-service-metrics-01, 6 July 2023, <https://datatracker.ietf.org/doc/html/draft-dunbar-cats-edge-service-metrics-01>.
[NFV-INF]: "ETSI GS NFV-INF 010, v1.1.1, Service Quality Metrics", 1 December 2014, <https://www.etsi.org/deliver/etsi_gs/NFV-INF/001_099/010/01.01.01_60/gs_NFV-INF010v010101p.pdf>.
[NFV-TST]: "ETSI GS NFV-TST 008 V3.3.1, NFVI Compute and Network Metrics Specification", 1 June 2020, <https://www.etsi.org/deliver/etsi_gs/NFV-TST/001_099/008/03.03.01_60/gs_NFV-TST008v030301p.pdf>.
[RFC7666]: Asai, H., MacFaden, M., Schoenwaelder, J., Shima, K., and T. Tsou, "Management Information Base for Virtual Machines Controlled by a Hypervisor", RFC 7666, DOI 10.17487/RFC7666, October 2015, <https://www.rfc-editor.org/rfc/rfc7666>.

Joint Exposure of Network and Compute Information for Infrastructure-Aware Service Deployment

Abstract

About This Document

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction

2. Conventions and Definitions

3. Problem Space and Needs

4. Guiding Principles

6. GAP Analysis

7. Security Considerations

8. IANA Considerations

9. References

9.1. Normative References

9.2. Informative References

Acknowledgments

Authors' Addresses