CATS                                                               Q. An
Internet-Draft                                             Alibaba Group
Intended status: Standards Track                             6 July 2023
Expires: 7 January 2024


               Use Case of Computing-Aware AI large model
                      draft-an-cats-usecase-ai-00

Abstract

   AI models, especially AI large models have been fastly developed and
   widely deployed to serve the needs of users and multiple industries.
   Due to that AI large models involve mega-scale data and parameters,
   high consumption on computing and network resources, distributed
   computing becomes a natural choice to deploy AI large models.

   This document desribes the key concepts and depolyment scenarios of
   AI large model, to demonstrate the necessity of considering computing
   and network resources to meet the requirements of AI tasks.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 7 January 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components


An                       Expires 7 January 2024                 [Page 1]

Internet-Draft                  AI-model                       July 2023


   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Computing-Aware AI large model  . . . . . . . . . . . . . . .   3
     3.1.  AI large model deployment . . . . . . . . . . . . . . . .   3
       3.1.1.  All-in-cloud deployment . . . . . . . . . . . . . . .   3
       3.1.2.  Cloud-device co-inference deployment  . . . . . . . .   4
       3.1.3.  Cloud-edge co-inference deployment  . . . . . . . . .   5
       3.1.4.  Cloud-edge-device co-inference deployment . . . . . .   5
     3.2.  Why traffic steering is needed  . . . . . . . . . . . . .   6
   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   7
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
   6.  Normative References  . . . . . . . . . . . . . . . . . . . .   7
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   7

1.  Introduction

   AI large model refers to a type of artificial intelligence model that
   is trained on massive amounts of data using deep learning techniques.
   These models are characterized by their large size, high complexity,
   and high computational requirements.  AI large models have become
   increasingly important in various fields, such as natural language
   processing, computer vision, and speech recognition.  Many AI large
   models have been widely adopted in industry applications such as
   search engines, and virtual assistants.

   There are usually two types of AI large models, AI general model and
   customized model.  AI general large model is a model that can handle
   multiple tasks and domains, and has wider applicability and
   flexibility, but may not perform as well as customized model in
   specific domain tasks.  Customized model is trained for specific
   industries or domains, and more focused on solving specific problems,
   but may not be applicable to other domains.  AI general model usually
   involve mega-scale parametere, while customized model involves large
   or middle-scale parameters.

   Also, AI large model contains two key phases: traing and inference.
   Training refers to the process of developing an AI model by feeding
   it with large amounts of data and optimizing it to learn and improve
   its performance.  During this process, the model is adjusted and
   refined until it achieves high levels of accuracy and predictive
   ability.  Therefore, training has high demand on accuracy, computing
   and memory resource . On the other hand, inference is the process of


An                       Expires 7 January 2024                 [Page 2]

Internet-Draft                  AI-model                       July 2023


   using the trained AI model to make predictions or decisions based on
   new input data.  The model is deployed in a production environment
   where it is given real-world data and makes predictions based on that
   data.  So, inference focuses more on the balance between computing
   resource, latency and power cost.

   There are mainly four types of AI tasks:

   *  Text: text-to-text (conversation), text classification (e.g.
      sentiment analysis)

   *  Vision: image classification (label images), object detection.

   *  Audio: speech-to-text, text-to-speech

   *  Multimodal: text-to-image, image-to-text, text-to-video, image-to-
      image, image-to-video, etc.

   Vison, audio, multimodal tasks often bring on high demand on network
   resource and computing resource.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Computing-Aware AI large model

   This section presents different use cases of AI large model
   depolyment, some of which will benefit from the dynamic selection of
   service instances and the traffic steering.

3.1.  AI large model deployment

3.1.1.  All-in-cloud deployment

   Figure 1 shows the All-in-cloud AI model depolyment.  Cloud is highly
   suitable for training, but may have issues for inference.  First is
   latency, especially for delay-sensitive AI applications.  Even if
   real-time interaction is not needed, high latency will affect user
   experience.  Second is high cost to ensure privacy protection in
   cloud-based inference.


An                       Expires 7 January 2024                 [Page 3]

Internet-Draft                  AI-model                       July 2023


                             Training + Inference
            +------------------------------------------------------+
            |                                                      |
            |                       Cloud                          |
            |                                                      |
            |   +---------------+          +-------------------+   |
            |   | General Model |          | Customized Models |   |
            |   +---------------+          +-------------------+   |
            +--------+-----------------+---------------+-----------+
                     |                 |               |
                     |                 |               |
                +----+---+        +----+---+       +---+----+
                | Device |        | Device |   ... | Device |
                +--------+        +--------+       +--------+

                           Figure 1: All-in-Cloud

3.1.2.  Cloud-device co-inference deployment

   Figure 2 shows the Cloud-device co-inference deployment.  It can
   achieve low latency as the AI inference is deployed on device
   locally.  But may support only limited AI tasks, because in most
   cases, only compressed, pruned model can be deployed on device.

                             Training + Inference
            +------------------------------------------------------+
            |                                                      |
            |                       Cloud                          |
            |                                                      |
            |   +---------------+          +-------------------+   |
            |   | General Model |          | Customized Models |   |
            |   +---------------+          +-------------------+   |
            +--------+-----------------+------------------+--------+
                     |                 |                  |
                     |                 |                  |
                +----+-----+      +----+-----+       +----+-----+
                |  Device  |      |  Device  |   ... |  Device  |
                | +------+ |      | +------+ |       | +------+ |
                | |Pruned| |      | |Pruned| |       | |Pruned| |
                | |Model | |      | |Model | |       | |Model | |
                | +------+ |      | +------+ |       | +------+ |
                +----------+      +----------+       +----------+
                  Inference         Inference          Inference

                    Figure 2: Cloud-device co-inference


An                       Expires 7 January 2024                 [Page 4]

Internet-Draft                  AI-model                       July 2023


3.1.3.  Cloud-edge co-inference deployment

   Figure 3 shows the Cloud-edge co-inference AI model depolyment.  It
   can achieve low latency as the AI inference is deployed near to
   device.  And it requires low demand on device resources.  But when
   handling AI inference tasks, if traffic load between device and edge
   is high or edge computing resource is overloaded, traffic steering is
   needed to ensure the QoS.

                             Training + Inference
            +------------------------------------------------------+
            |                                                      |
            |                       Cloud                          |
            |                                                      |
            |                 +---------------+                    |
            |                 | General Model |                    |
            |                 +---------------+                    |
            +--------------------------+---------------------------+
                                       |
                                       |     Training + Inference
          +----------------------------+-----------------------------+
          |  +--------------+  +--------------+   +--------------+   |
          |  |     Edge     |  |     Edge     |   |     Edge     |   |
          |  | +----------+ |  | +----------+ |   | +----------+ |   |
          |  | |Customized| |  | |Customized| |   | |Customized| |   |
          |  | |  Models  | |  | |  Models  | |   | |  Models  | |   |
          |  | +----------+ |  | +----------+ |   | +----------+ |   |
          |  +--------------+  +--------------+   +--------------+   |
          +----------+-----------------+---------------+-------------+
                     |                 |               |
                     |                 |               |
                +----+---+        +----+---+       +---+----+
                | Device |        | Device |   ... | Device |
                +--------+        +--------+       +--------+

                     Figure 3: Cloud-edge co-inference

3.1.4.  Cloud-edge-device co-inference deployment

   Figure 4 shows the Cloud-edge-device co-inference AI model
   depolyment.  It is a more flexible deployment (also more complex).
   It can achieve low latency as the AI inference is deployed locally or
   near to device.  And device can work when edge isn’t available.
   Careful consideration to ensure that edge will only be used when the
   trade-offs are right.  Similar to Cloud-edge co-inference AI model
   depolyment, traffic steering is needed.


An                       Expires 7 January 2024                 [Page 5]

Internet-Draft                  AI-model                       July 2023


                             Training + Inference
            +------------------------------------------------------+
            |                                                      |
            |                       Cloud                          |
            |                                                      |
            |                 +---------------+                    |
            |                 | General Model |                    |
            |                 +---------------+                    |
            +--------------------------+---------------------------+
                                       |
                                       |     Training + Inference
          +----------------------------+-----------------------------+
          |  +--------------+  +--------------+   +--------------+   |
          |  |     Edge     |  |     Edge     |   |     Edge     |   |
          |  | +----------+ |  | +----------+ |   | +----------+ |   |
          |  | |Customized| |  | |Customized| |   | |Customized| |   |
          |  | |  Models  | |  | |  Models  | |   | |  Models  | |   |
          |  | +----------+ |  | +----------+ |   | +----------+ |   |
          |  +--------------+  +--------------+   +--------------+   |
          +----------+-----------------+---------------+-------------+
                     |                 |                  |
                     |                 |                  |
                +----+-----+      +----+-----+       +----+-----+
                |  Device  |      |  Device  |   ... |  Device  |
                | +------+ |      | +------+ |       | +------+ |
                | |Pruned| |      | |Pruned| |       | |Pruned| |
                | |Model | |      | |Model | |       | |Model | |
                | +------+ |      | +------+ |       | +------+ |
                +----------+      +----------+       +----------+
                  Inference         Inference          Inference


                  Figure 4: Cloud-edge-device co-inference

3.2.  Why traffic steering is needed

   Many AI tasks brings on high demand on network resource and computing
   resource: vison, audio, multimodal.  Also, it is common that same
   customized model is deployed in multiple edge sites to achieve load
   balance and high reliability.


An                       Expires 7 January 2024                 [Page 6]

Internet-Draft                  AI-model                       July 2023


   The edge site’s computing resource and network info should be
   collectively considered to make suitable traffic steering decision.
   For example, if the available computing resource in nearest edge site
   is low, the traffic of AI tasks should be steered to another edge
   with high resource.  Also, if multiple AI tasks, delay-sensitive task
   (live streaming with AI-generated avatar) and delay-tolerant task
   (text-to-image) arrive in edge, delay-tolerate task should be steered
   to another edge if the nearest edge’s resource is limited.

4.  IANA Considerations

   This document makes no request of IANA.

5.  Security Considerations

   TBD

6.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Author's Address

   Qing An
   Alibaba Group
   China
   Email: anqing.aq@alibaba-inc.com


An                       Expires 7 January 2024                 [Page 7]