Internet Research Task Force (IRTF) S. Lee
Internet-Draft ETRI
Intended status: Informational S. Pack
Expires: April 21, 2016 KU
M-K. Shin
ETRI
E. Paik
KT
R. Browne
Intel
October 19, 2015

Resource Management in Service Chaining
draft-irtf-nfvrg-resource-management-service-chain-02

Abstract

This document specifies problem definition and use cases of NFV resource management in service chaining for path optimization, traffic optimization, failover, load balancing, etc. It further describes design considerations and relevant framework for the resource management capability that dynamically creates and updates network forwarding paths (NFPs) considering resource constraints of NFV infrastructure.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 21, 2016.

Copyright Notice

Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

Network Functions Virtualisation (NFV) [ETSI-NFV-WHITE] offers a new way to design, deploy and manage network services. The network service can be composed of one or more network functions and NFV relocates the network functions from dedicated hardware appliances to generic servers, so they can run in software. Using these virtualized network functions (VNFs), one or more VNF forwarding graphs (VNF-FGs; a.k.a. service chains) can be associated to the network service, each of which describes a network connectivity topology, by referencing VNFs and Virtual Links that connect them. One or more network forwarding paths (NFPs) can be built on top of such a topology, each defining an ordered sequence of VNFs and Virtual Links to be traversed by traffic flows matching certain criteria.

The network service is instantiated by allocating NFVI resources for VNFs and VLs which constitute the VNF-FGs. Thus, the capacity and performance of the network service depends on the state and attributes of the resources used for the VNF instances. While this brings a similar problem to the VM placement optimization in a cloud computing environment, it differs as one or more VNF instances are interconnected for a single network service. For example, if one of the VNF instances in the VNF-FG gets failed or overloaded, the whole network service also gets affected. Thus, the VNF instances need to be carefully placed during NS instantiation considering their connectivity within NFPs. They also need to be monitored and dynamically migrated or scaled at run-time to adapt to changes in the resources.

The resource placement problem in VNF-FGs matters not only to the performance and capacity of network services but also to the optimized use of NFVI resources. For example, if processing and bandwidth burden converges on the VNF instances placed in a specific NFVI-PoP, it may result in scalability problem of the NFV infrastructure. Thus care is encouraged to be taken in distributing load across local and external VNF instances at run-time.

This document addresses resource management problem in service chaining to optimize the NS performance and NFVI resource usage. It provides the relevant use cases of the resource management such as traffic optimization, failover, load balancing and further describes design considerations and relevant framework for the resource management capability that dynamically creates and updates NFP instances considering NFVI resource states for VNF instances and VL instances.

Note that this document mainly focuses on the resource management capability based on the ETSI NFV framework [ETSI-NFV-ARCH] but also studies contribution points to the work for control plane of SFC architecture [I-D.ietf-sfc-architecture] [I-D.ietf-sfc-architecture] [I-D.ietf-sfc-control-plane].

2. Terminology

This document uses the following terms and most of them were reproduced from [ETSI-NFV-TERM].

3. Resource management in service chain

3.1. Resource scheduling among network services

In the NFV framework, network services are realized with NS instantiation procedures at which virtualized NFVI resources are assigned to the VNFs and VLs which constitute VNF-FGs of the network service. The NFVI resources are placed and located along the VNF-FG by NFV Orchestrator (NFVO) dynamically according to:

In order to satisfy the deployment templates and resource policies, VNF-FGs of the network services need to be built by considering the state of NFVI resources for VNF instances (e.g., availability, throughput, load, disk usage) and VL instances (e.g., bandwidth, delay, delay variation, packet loss).

However, since the NFVI resources are shared by different network services and their deployment constraints are very different from each other, it is required to carefully schedule the NFVI resources for multiple network services to optimize their KPIs.

3.2. Performance coupling within a service chain

In NFV, a network service is composed of one or more virtualized network functions which are connected via virtual links along NFPs specified for a traffic flow for the network service. Thus, the performance of a network service is determined by the performance and capability of a coupling of VNF instances and VL instances. For example, if one of the VNF instances or VL instances of an NFP gets failed or overloaded, the whole network service also gets affected. Thus, the VNF instances need to be carefully placed during NS instantiation considering their connectivity within NFPs.

This performance coupling can be handled by considering deployment rules for affinity/anti-affinity, geography, or topological locations of VNFs; and QoS of virtual links.

Another important factor for virtual links is the inter-connectivity between different NFVI-PoPs, which is an enabler of resource sharing among different NFVI-PoPs. When the VNF instances of a network service are allocated at different NFVI-PoPs, the NFVI-PoP interconnect may be a bottleneck point which needs to be monitored to support KPIs of the service chain.

3.3. Multiple policies and conflicts

The NFVI resources for a network service should be allocated and managed according to a NS policy given in the network service descriptor (NSD), which describes how to govern NFVI resources for VNF instances and VL instances to support KPIs of the network service. The examples of NS policy are affinity/anti-affinity, scaling, fault and performance, geography, regulatory rules, NS topology, etc. Since network services may have different NS policies for their own deployment and performance, this may cause resource management difficult within the shared NFVI resources.

For network-wide (or NS-wide) resource management, NFVI policy (or network policy) can be also provided. It may describe the resource management policy for optimized use of infrastructure resources rather than the performance of a single network service. The examples of NFVI policy are NFVI resource access control, reservation and/or allocation policies, placement optimization based on affinity and/or anti-affinity rules, geography and/or regulatory rules, resource usage, etc.

Multiple administrative domains or subsystems may have different NFVI policies so that it may bring some conflicts when enforcing them in a global infrastructure. There could be a similar problem among NS policies and NFVI policies.

Note that the similar topics are being studied in [I-D.irtf-nfvrg-nfv-policy-arch]

3.4. Dynamic adaptation of service chains

The performance and capability of NFVI resources may vary in time due to different uses and management policies of the resources. If some changes in the resources make the service quality unacceptable, the VNF instances can be scaled according to the given auto-scaling policies. But it's only for local quality of the VNF.

In order to provide optimized KPIs to network services, the NFP instances need to dynamically adapt to the changes of the resource state at run-time. The performance of the whole NFP instance should be measured by monitoring the resource state of VNF instances and VL instances. Based on the monitoring results, some VNF instances may be determined and relocated at different virtualized resources with better performance and capabilities.

4. Use cases

In this section, several (but not exhausted) use cases for resource management in service chaining are provided: fail-over, load balancing, path optimization, traffic optimization, and energy efficiency.

4.1. Fail-over

For service continuity, failure of a VNF instance needs to be detected and the failed one needs to be replaced with the other one which is available to use as per redundancy policy. Figure 1 presents an example of the fail-over use case. A network service is defined as a chain of VNF-A and VNF-B; and the service chain is instantiated with VNF-A1 and VNF-B1 which are instances of VNF-A and VNF-B, respectively. In the meantime, failure of VNF-B1 is detected so that VNF-B2 replaces the failed one for fail-over of the NFP.

                                                             
                +--------+                        +--------+   
                | VNF-B2 |                       #| VNF-B2 |###
   +--------+   +--------+           +--------+ # +--------+   
###| VNF-A1 |       _|_           ###| VNF-A1 |#      _|_      
   +--------+      (___)             +--------+      (___)     
  ___/    #       /     \      \    ___/            /     \    
 (___)+---#------+       +   ===}  (___)+----------+       +   
          #       \ ___ /      /                    \ ___ /    
          #        (___)                             (___)     
          #          |                                 |
          #     +--------+                        +--------+   
          ######| VNF-B1 |###        (failure)--> | VNF-B1 |   
                +--------+                        +--------+   

### NFP

Figure 1: A fail-over use case

The above is in the case where there is a 1+1 or 1:N redundancy scheme. In event that VNF instance overloads before NFVO has time to scale out, or when resources do not permit a scale-out then we can route the service chain deterministically to a remote VNF instance. This adaptation may be revertive or non-revertive dependent on service provider policy and resource availability.

4.2. Load balancing

A single VNF instance may be a bottleneck point of a service chain due to its overload. It may affect the performance of the whole service chain consequently so that an NFP instance needs to be built to avoid bottleneck points or maintained to distribute workloads of overloaded VNF instances.

With NFVI-PoP Interconnect, service chains can be balanced between NFVI-PoPs in a way that best utilises NFV infrastructure and ensures service integrity. The wide area conditions can be monitored in real-time to provide KPIs, such as BW, delay, delay variation and packet loss per QoS class to the service chaining application which may enable use of external VNF instances when there is an overload or failure condition in the local NFVI-PoP. In this way the service chaining application can make a service chain reroute decision (in the event of failure and/or overload) that is network and platform-aware. The service chaining application understands the state of external VNFs and WAN conditions per QoS class between the local NFVI-PoP and remote NFVI-PoP in real-time.

4.3. Path optimization

Traffic for a network service traverses all of the VNF instances and the connecting VL instances given by a NFP instance to reach a target end point. Thus, quality of the network service depends on the resource constraints (e.g., processing power, bandwidth, topological locations, latency) of VNF instances, VL instances including NFVI-PoP interconnects. In order to optimize the path of the network service, the resource constraints of VNF instances and VLs need to be considered at constructing NFPs. Since the resource state may vary in time during the service, NFP instances also need to adapt to the changes of resource constraints of the VNF instances and VL instances by monitoring and replacing them at run-time.

4.4. Traffic optimization

A network operator may provide multiple network services with different VNF-FGs and different flows of traffic traverse between source and destination end-points along the VNF-FGs. For efficiency of resource usage, the NFP instances need to be built by default to localize the traffic flows and to avoid processing and network bottlenecks. It is only in the case of local failure or overload (whereby the NFVO is unable or has not completed a scale-out of on-site resources) that NFP instances would be constructed between NFVI-PoPs. In this case, multiple VNF instances of different NFVI-PoPs need to be considered together at constructing a new NFP instance or adapting one.

4.5. Energy efficiency

Energy efficiency in the network is getting important to reduce impact on the environment so that energy consumption of VNF instances using NFVI resources (e.g., compute, storage, I/O) needs to be considered at NFP instantiation or adaptation. For example, a NFP can be instantiated as to make traffic flows aggregated into a limited number of VNF instances as much as its performance is preserved in a certain level. Policy may vary between centralized or distributed NFV applications, and could include policies for even energy distribution between sites, time-of-day etc.

5. Evaluation Model

To derive specific algorithms for use cases discussed in Section 4, an evaluation model needs to be developed, which can address two problems for a given network topology and input parameters (e.g., VL/VNF capacity, incoming traffic flows, etc.) : 1) how much traffic flows pass on each VL instance and 2) how much processing capacity is needed for the installed VNF instance. This section first describes the system model and then presents main objectives for the evaluation model.

5.1. System Model

The system model considers the follwing network topoloy. The network topology under consideration is composed of start/end points and multiple NFVI-PoPs where multiple VNF instances locate. On the other hand, VL instances inter-connect VNF instances in NFVI-PoPs.

Start and end points are incoming and outgoing points of traffic flow for a given network service, respectively. Specifically, the amount of incoming traffic flows for a network service (i.e., a VNF-FG) at the start point is given as an input parameter in the model.

Since VNF instances process their incoming traffic flows, their processing capacities should be specified. Therefore, the VNF processing capacity is defined as the maximum amount of traffic flows that a VNF instance can process. Each VNF instance can be with different processing capacities depending on the class of the running virtual machine (VM). In the model, it is assumed that there are multiple VM classes and each VM class has a pre-defined processing capacity. Also, VL instances have different capacities for supporting traffic flows, which is defined as the maximum amount of traffic flows that can pass on a VL instance.

Under the network topology, multiple NFs (e.g., firewall and NAT) can be implemented as VNF instances. Therefore, a network service (i.e., a VNF-FG) refers to an order set in which several NFs are deployed depending on the requirements of traffic flows. That is, traffic flows for a VNF-FG should be processed according to the NF order described in the given VNF-FG. Accoridngly, traffic flows at the start point should not be processed by any NF. Meanchile, traffic flows at the end point should be processed by all NFs specified in the given VNF-FG.

In a given VNF-FG, VNFs should be individually placed on multiple NFVI-PoPs. Therefore, a decision variable, VNF placement indicator function (VPIF), is defined as:

Intuitively, the amount of traffic flows that pass a VL instance should not exceed the VL capacity to avoid any overload at the VL instance. Likewise, the amount of incoming traffic flows to a VNF instance should not exceed the VNF processing capacity. Therefore, traffic flows for a network service (i.e., a VNF-FG) can be distributed to multiple NFPs depending on resource and capacity constraints for VNF and VL instances. Moreover, multiple network services can be supported by distributing traffic flows for each network service. Therefore, another decision variable, traffic flow ratio (TFR), is defined as:

In addition, traffic flows should be conserved during the network service. Therefore, the amount of incoming traffic flows to a VNF or a NFVI-PoP should be equal to the amount of outgoing traffic flows from the VNF or the NFVI-PoP.

Consequently, in this evaluation model, the path/traffic optimization and VNF instance placement are jointly considered by determining two decision variables, i.e., TFR and VPIF. For the path/traffic optimization, how much traffic flows pass on each VL instance is determined by obtaining TFR. Meanwhile, for the given VNF instance placement, which VNF instances with different processing capacities (i.e., VM class) are installed in each NFVI-PoP is addressed, i.e., VPIF is obtained.

5.2. Main objectives

In the evaluation model, three objectives are considered including, but not limited to, 1) load balancing, 2) flow throughput maximization, and 3) energy efficiency.

5.2.1. Load balancing

For load balancing for a network service, the remaining capacity for VNF instances and VL instances should be balanced to avoid any bottlenecks. To this end, the minimum remaining processing capacity for VNF instances and the minimum remaining link capacity for VL instances should be maximized.

5.2.2. Throuhgput optimization

On the other hand, the flow throughput considers both throughputs for VNF processing and for VL instance. Then, the throughput of an NFP can be calculated as the product of TFR and the sum of capacities, and the total throughput is the sum of computed throughputs for all NFPs. By maximizing the total flow throughput, it is possible to reduce the network service time.

5.2.3. Energy efficiency

Since each VNF instance consumes an amount of energy for processing its function and tranmitting/receiving traffic flows across VL instances, the energy consumption for each VNF instance should be minimized for energy efficiency of network services. Detailed model is under construction.

6. Framework

To support the aforementioned use cases, it is required to support resource management capability which provides service chain (or NFP) construction and adaptation by considering resource state or constraints of VNF instances and VL instances which connect them. The resource management operations for service chain construction and adaptation can be divided into several sub-actions:

As listed above, VNF instances are relocated according to monitoring or evaluation results of performance metrics of the VNF instances and VL instances. Studies about evaluation methodologies and performance metrics for VNF instances and NFVI resources can be found at [ETSI-NFV-PER001] [I-D.liu-bmwg-virtual-network-benchmark] [I-D.ietf-bmwg-virtual-net]. The performance metrics of VNF instances and VL instances specific to service chain construction and adaptation can be defined as follows:

The resource management functionality for dynamic service chain adaptation takes role of NFV orchestration with support of VNF manager (VNFM) and Virtualised Infrastructure Manager (VIM) in the NFV framework [ETSI-NFV-ARCH]. Detailed functional building block and interfaces are still under study.

7. Applicability to SFC

7.1. Related works in IETF SFC WG

IETF SFC WG provides a new service deployment model that delivers the traffic along the predefined logical paths of service functions (SFs), called service function chains (SFCs) with no regard of network topologies or transport mechanisms. Basic concept of the service function chaining is similar to VNF-FG where a network service is composed of SFs and deployed by making traffic flows traversed instances of the SFs in a pre-defined order.

There are several works in progress in IETF SFC WG for resource management of service chaining. [I-D.ietf-sfc-architecture] defines SFC control plane that selects specific SFs for a requested SFC, either statically or dynamically but details are currently outside the scope of the document. There are other works [I-D.ietf-sfc-control-plane] [I-D.lee-sfc-dynamic-instantiation] [I-D.krishnan-sfc-oam-req-framework] [I-D.ietf-sfc-oam-framework] which define the control plane functionality for service function chain construction and adaptation but details are still under study. While [I-D.dunbar-sfc-fun-instances-restoration] and [I-D.meng-sfc-chain-redundancy] provide detailed mechanisms of service chain adaptation, they focus only on resilience or fail-over of service function chains.

7.2. Integration in SFC control-plane architecture

In SFC WG, [I-D.ietf-sfc-control-plane] describes requirements for conveying information between Service Function Chaining (SFC) control elements (including management components) and SFC functional elements. It also identifies a set of control interfaces to interact with SFC-aware elements to establish, maintain or recover service function chains.


                +----------------------------------------------+ 
                |                                              |
                |       SFC  Control & Management Planes       |
        +-------|                                              |
        |       |                                              |
        C1      +------^-----------^-------------^-------------+
 +---------------------|C3---------|-------------|-------------+
 |      |            +----+        |             |             |
 |      |            | SF |        |C2           |C2           |
 |      |            +----+        |             |             |
 | +----V--- --+       |           |             |             |
 | |   SFC     |     +----+      +-|--+        +----+          |
 | |Classifier |---->|SFF |----->|SFF |------->|SFF |          |
 | |   Node    |<----|    |<-----|    |<-------|    |          |
 | +-----------+     +----+      +----+        +----+          |
 |                     |           |              |            |
 |                     |C2      -------           |            |
 |                     |       |       |     +-----------+ C4  |
 |                     V     +----+ +----+   | SFC Proxy |-->  |
 |                           | SF | |SF  |   +-----------+     |
 |                           +----+ +----+                     |
 |                             |C3    |C3                      |
 |  SFC Data Plane Components  V      V                        |
 +-------------------------------------------------------------+

Figure 2: SFC control plane overview

The service chain adaptation addressed in this document may be integrated into the SFC Control & Management Planes and may use the C2 and C4 interfaces for monitoring or collecting the resource constraints of VNF instances, NFVI-PoP interconnects and VL instances.

To prevent constant integration between the application and probing functions we would propose a 3-tier architecture per NFVI-PoP.

  • Top level application control at the SFC Control & Management Planes
  • An abstraction layer between the application layer and the probing layer. This would decouple NFVI and link monitoring methods from the application layer
  • A probing layer that monitors VNF, physical and virtual link resources

Note that SFC does not assume that Service Functions are virtualized. Thus, the parameters of resource constraints may differ, and it needs further study for integration.

8. Security Considerations

TBD.

9. IANA Considerations

TBD.

10. References

10.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.

10.2. Informative References

[ETSI-NFV-ARCH] ETSI, "ETSI NFV Architectural Framework v1.1.1", October 2013.
[ETSI-NFV-MANO] ETSI, "Network Function Virtualization (NFV) Management and Orchestration V0.6.3", October 2014.
[ETSI-NFV-PER001] ETSI, "Network Function Virtualization: Performance and Portability Best Practices v1.1.1", June 2014.
[ETSI-NFV-TERM] ETSI, "NFV Terminology for Main Concepts in NFV", October 2013.
[ETSI-NFV-WHITE] ETSI, "NFV Whitepaper 2", October 2013.
[I-D.dunbar-sfc-fun-instances-restoration] Dunbar, L. and A. Malis, "Framework for Service Function Instances Restoration", Internet-Draft draft-dunbar-sfc-fun-instances-restoration-00, April 2014.
[I-D.ietf-bmwg-virtual-net] Morton, A., "Considerations for Benchmarking Virtual Network Functions and Their Infrastructure", Internet-Draft draft-ietf-bmwg-virtual-net-01, September 2015.
[I-D.ietf-sfc-architecture] Halpern, J. and C. Pignataro, "Service Function Chaining (SFC) Architecture", Internet-Draft draft-ietf-sfc-architecture-11, July 2015.
[I-D.ietf-sfc-control-plane] Li, H., Wu, Q., Huang, O., Boucadair, M., Jacquenet, C., Haeffner, W., Lee, S., Parker, R., Dunbar, L., Malis, A., Halpern, J., Reddy, T. and P. Patil, "Service Function Chaining (SFC) Control Plane Components & Requirements", Internet-Draft draft-ietf-sfc-control-plane-00, August 2015.
[I-D.ietf-sfc-oam-framework] Aldrin, S., Krishnan, R., Akiya, N., Pignataro, C. and A. Ghanwani, "Service Function Chaining Operation, Administration and Maintenance Framework", Internet-Draft draft-ietf-sfc-oam-framework-00, August 2015.
[I-D.irtf-nfvrg-nfv-policy-arch] Figueira, N., Krishnan, R., Lopez, D., Wright, S. and D. Krishnaswamy, "Policy Architecture and Framework for NFV Infrastructures", Internet-Draft draft-irtf-nfvrg-nfv-policy-arch-02, October 2015.
[I-D.krishnan-sfc-oam-req-framework] Krishnan, R., Ghanwani, A., Gutierrez, P., Lopez, D., Halpern, J., Kini, S. and A. Reid, "SFC OAM Requirements and Framework", Internet-Draft draft-krishnan-sfc-oam-req-framework-00, July 2014.
[I-D.lee-sfc-dynamic-instantiation] Lee, S., Pack, S., Shin, M. and E. Paik, "SFC dynamic instantiation", Internet-Draft draft-lee-sfc-dynamic-instantiation-01, October 2014.
[I-D.liu-bmwg-virtual-network-benchmark] Liu, V., Liu, D., Mandeville, B., Hickman, B. and G. Zhang, "Benchmarking Methodology for Virtualization Network Performance", Internet-Draft draft-liu-bmwg-virtual-network-benchmark-00, July 2014.
[I-D.meng-sfc-chain-redundancy] Meng, W. and C. Wang, "Redundancy Mechanism for Service Function Chains", Internet-Draft draft-meng-sfc-chain-redundancy-02, October 2015.

Acknowledgements

The authors would like to thank Insun Jang, Sukjin Choo, Myeongsu Kim for the review and comments.

Authors' Addresses

Seungik Lee ETRI 218 Gajeong-ro Yuseung-Gu Daejeon, 305-700 Korea Phone: +82 42 860 1483 EMail: seungiklee@etri.re.kr
Sangheon Pack Korea University 145 Anam-ro, Seongbuk-gu Seoul, 136-701 Korea Phone: +82 2 3290 4825 EMail: shpack@korea.ac.kr
Myung-Ki Shin ETRI 218 Gajeong-ro Yuseung-Gu Daejeon, 305-700 Korea Phone: +82 42 860 4847 EMail: mkshin@etri.re.kr
EunKyoung Paik KT 17 Woomyeon-dong, Seocho-gu Seoul, 137-792 Korea Phone: +82 2 526 5233 EMail: eun.paik@kt.com
Rory Browne Intel Dromore House, East Park Shannon, Co. Clare Ireland Phone: +353 61 477400 EMail: rory.browne@intel.com