RTGWG S. Gu INTERNET-DRAFT G. Zhuang Intended Status: Informational Huawei Technologies H. Yao X. Li China Mobile Expires: June 4, 2020 December 2, 2019 A Report on Compute First Networking (CFN) Field Trial draft-gu-rtgwg-cfn-field-trial-01 Abstract Compute First Networking (CFN) enables the routing of the service request to an optimal edge site to improve the overall system load balancing and efficiency. Especially when an edge site is overloaded, other edges with service equivalency can dynamically serve the request. This document describes a CFN field trial to show the effect that CFN can achieve. Edge to edge interaction to get the available computing resources information for services and the network status to each other is introduced. Data plane to support late binding based dynamic anycast is illustrated too. The field trial shows that CFN can greatly improve the overall query per second served for a service hosted on multiple edges in a more balanced way. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Gu, et al [Page 1] INTERNET DRAFT CFN Field Trial Dec 2019 Copyright and License Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Testbed overview . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1 Control Plane . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Data Plane . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Preliminary Tests . . . . . . . . . . . . . . . . . . . . . . . 9 4.1 Requests rush to an edge (no system background load) . . . . 9 4.2 Requests rush to an edge (system background load exists) . . 10 4.3 Mixed requests rush to an edge (no system background load) . 11 4.4 Impact from update frequency . . . . . . . . . . . . . . . . 12 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 13 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 13 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 9.1 Normative References . . . . . . . . . . . . . . . . . . . 13 9.2 Informative References . . . . . . . . . . . . . . . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 Gu, et al [Page 2] INTERNET DRAFT CFN Field Trial Dec 2019 1. Introduction Compute First Networking (CFN) Scenarios and Requirements [CFN-req] shows the usage scenarios and requirements to dynamically dispatch the service request to multiple edge sites in order to overcome the computing resource overloading problem in edge computing. Compute First Networking (CFN) framework document [CFN-fmwk] presents the basic system framework to dynamically route a service request to a selected edge in real time based on the computing load status and network conditions. This approach improves the load balancing between multiple edges with service equivalency in a distributed manner. This document introduces a more concrete CFN field trial and its performance. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2 Testbed overview We deployed CFN node on three edge sites in Hangzhou. The sites are approximately 30 kilometers apart. Figure 1 shows the topology and configuration we used for this CFN testbed. Gu, et al [Page 3] INTERNET DRAFT CFN Field Trial Dec 2019 +-----+ edge site 1 +-----+| +---+ +-----+|+ +----------+ | | +------+|+ ------ |CFN node 1| ----------------| | |client|+ +----------+ | | +------+ inter-edge itf:10.11.103.1 | | service ID:SID_S | | binding IP BIP1:10.11.102.1 | | | | | n | +-----+ edge site 2 | e | +-----+| | t | +-----+|+ +----------+ | w | +------+|+ ------ |CFN node 2|------------------| o | |client|+ +----------+ | r | +------+ inter-edge itf:10.12.103.1 | k | service ID:SID_S | | binding IP BIP2:10.12.102.1 | | | | | | | | +-----+ edge site 3 | | +-----+| | | +-----+|+ +----------+ | | +------+|+ ------ |CFN node 2| -----------------| | |client|+ +----------+ | | +------+ inter-edge itf:10.13.103.1 +---+ service ID:SID_S binding IP BIP3:10.13.102.1 Figure 1. CFN testbed overview A matrix multiplication service S is provided by all three edge sites (or edges for simplicity in this document). The CFN nodes use a unique service ID SID_S to announce the its reachability to service S. In our test, we use 200.200.200.201 for SID_S. Consider SID_S here as a anycast IP address. Though this service is reachable by a single SID_S in network, 3 edges indeed serve SID_S using 3 different binding IP (BIP) addresses , BIP1/2/3 with address 10.11/12/13.102.1 via CFN node 1/2/3 respectively. Service node hosted on or attached to a CFN node only knows that it uses its BIP to serve service S and has no knowledge about SID_S. Each CFN node has an inter-edge interface IP address for communicating the computing load information among CFN nodes. About 200 simulated clients connect to each CFN node in the test. Gu, et al [Page 4] INTERNET DRAFT CFN Field Trial Dec 2019 3. Procedures The procedures are introduced in [CFN-fmwk]. For easy reference, control plane and data plane timeline diagrams are shown here too. 3.1 Control Plane When a service node is initiated for service S, the edge platform manager will send the registration information about service ID SID_S and binding IP (BIP) to access SID_S to the CFN node that the service node attaches to. Each CFN node regularly gets the computing load information about the service node attached to it for SID_S. The computing load information can be CPU consumption for SID_S, number of current connections, query per second processed, total capacity, or other performance metrics. In our test, we give each type of metrics a weight. CFN nodes distribute those information to each other by BGP extensions. Figure 2 shows the CFN control plane procedures. Gu, et al [Page 5] INTERNET DRAFT CFN Field Trial Dec 2019 CFN CFN CFN Edge Platform Node 1 Node 2 Node 3 Manager | | | | | | | | | | |<------------------| | | | 1.Service info | | | | registration/ | | | | update/withdraw | | | | (SID_S, BIP 3) | | | | | | | | | | | |<------------------| | | | 2.Computing load | | | | update triggering | | | | (SID_S,computing | | | | load information) | | | | | | | | | | |<---------------------| | | | | | |<------------------------------| | | | 3.BGP update for | | | | computing load | | | | (SID_S, CFN node 3, | | | | computing load info)| | | | | | Figure 2. CFN control plane 3.2 Data Plane When a client sends a service request for service S, it uses SID_S as destination IP. In the test, SID_S is an anycast address. There are various ways that a client can get the SID_S for a service, such as by DNS or static configuration. When the CFN ingress which is CFN node 1 in figure 3 receives the request, it dynamically selects the most appropriate CFN egress based on computing load information received. As figure 4 shows, CFN node 3 is selected as CFN egress in this case. CFN ingress further tunnels the data packet to CFN egress. When CFN egress receives the packet, it decapsulates the packet and maps the destination address from SID_S to binding IP BIP3. The service node for service S gets the packet and processes it. The Gu, et al [Page 6] INTERNET DRAFT CFN Field Trial Dec 2019 service response is returned back to CFN node 3. CFN node 3 is conceptually the gateway of attached service nodes for CFN services. It maps BIP3 to SID_S as source IP and then tunnels it to CFN node 1. CFN node 1 further decapsulates the packet and sends it to the client. For the subsequent service request packets sent to CFN node 1 from the same flow, CFN node always uses CFN node 3 as the egress to ensure the flow affinity. Gu, et al [Page 7] INTERNET DRAFT CFN Field Trial Dec 2019 CFN node 1 CFN node 3 Service client (CFN ingress) (CFN egress) Node for S | | | | |1.service req | | | |------------->| | | |dst=SID_S | | | |src=client_IP | | | | | | | | | | | | +----------------+ | | | |2.Select CFN | | | | |egress & save it| | | | +----------------+ | | | | | | | |3. forward service req | | | |with encapsulation | | | |---------------------> | | | |outer: dst=CFN_Node_3 | | | | src=CFN_Node_1 | | | |inner: dst=SID_S | | | | src=client_IP | | | | | | | | +----------------+ | | | |4.decap & map | | | | |SID_S to binding| | | | |IP | | | | +----------------+ | | | | | | | | | | | |5. forward pkt | | | |------------------>| | | |dst=BIP3 | | | | | | | | | | | | | | | | 6. service rsp | | | |<----------------- | | | |src=BIP3 | | | | | | | +----------------+ | | | |7.map binding IP| | | | |back to SID_S & | | | | |encap | | | | +----------------+ | | | | | | |8. forward service rsp | | | |with encapsulation | | Gu, et al [Page 8] INTERNET DRAFT CFN Field Trial Dec 2019 | |<--------------------- | | | |outer: dst=CFN_Node_1 | | | | src=CFN_Node_3 | | | |inner: dst=client_IP | | | | src=SID_S | | | | | | | +----------+ | | | |9 decap | | | | +----------+ | | | | | | | 10. forward | | | |<------------ | | | |dst=client_IP | | | |src=SID_S | | | | | | | Figure 3. CFN data plane for the first request of a flow 4. Preliminary Tests 4.1 Requests rush to an edge (no system background load) In this test, we assume the service nodes capacities attached to all three edges are the same and there is no background computing tasks running. The overall computing task handling capacity from service nodes can handle about 670 queries per second (qps). The clients attached to edge 1 generating service request to it at about 40 qps. The number of clients simultaneously send requests varies. When 10 clients send requests, the computing power consumed by the system can reach approximately 60% of its overall maximum. The requests are all short-processing tasks and based on observation each request roughly take 4ms to be completed at the server side. CFN leverages the computing load reported by different edges and together with network status to spread the service request. On the other hand, a pure random selection from the edges to handle the request is used for comparison. We tested for 5, 10 and 15 clients attached to one edge which result in the consumption of medium low, medium high and high computing resources of the whole system respectively. Note it exceeds a single edge capacity in any case. For 15 clients case, it almost reaches the maximum system capacity. Figure 4 shows the average delay between a request being sent and the response being received by a client and Gu, et al [Page 9] INTERNET DRAFT CFN Field Trial Dec 2019 system qps. +-------------+--------+----------------+---------+ | number of | system | average delay | qps | | clients | | (ms) | | +-------------+--------+----------------+---------+ | | CFN | 3.954 | 208.5 | | 5 +--------+----------------+---------+ | (medium low)| random | 5.316 | 197.7 | +-------------+--------+----------------+---------+ | | CFN | 4.700 | 402.3 | | 10 +--------+----------------+---------+ |(medium high)| random | 5.595 | 302.1 | +-------------+--------+----------------+---------+ | | CFN | 5.506 | 559.3 | | 15 +--------+----------------+---------+ | (high) | random | 5.718 | 546.0 | +-------------+--------+----------------+---------+ Figure 4. Test results when service requests rush to a single edge when no system background load The CFN achieves better results compared with random selection based application layer service dispatch. Average delay decreased by 25.62% and 16.00% and total qps increased by 5.5% and 33.17% in medium low and medium high computing load respectively. The unbalanced incoming traffic is spread to all edges. Unlike random selection, CFN will dispatch more requests to the local edge since its network cost is the lowest. CFN balances between higher computing resources available at the remote sites and lower network cost at the local site to make a choice. Hence it outperforms the random selection. In high number of clients case, as the maximum system capacity is almost reached, the performance are similar for CFN and random case. 4.2 Requests rush to an edge (system background load exists) In this test, different edge has different background computing tasks to handle. We randomly select an edge to make it suffer from a computing intensive burst which consumes almost 90% of its capacity for about 4 seconds. Then computing load returns to zero for 2 seconds. It creates the busy edge and idle edges scenario. The other settings are same as shown in section 4.1. Figure 5 shows the average delay between a request being sent and the Gu, et al [Page 10] INTERNET DRAFT CFN Field Trial Dec 2019 response being received by a client and system qps for this case. +-------------+--------+----------------+---------+ | number of | system | average delay | qps | | clients | | (ms) | | +-------------+--------+----------------+---------+ | | CFN | 6.291 | 185.6 | | 5 +--------+----------------+---------+ | (medium low)| random | 9.630 | 165.3 | +-------------+--------+----------------+---------+ | | CFN | 6.854 | 360.9 | | 10 +--------+----------------+---------+ |(medium high)| random | 10.592 | 316.3 | +-------------+--------+----------------+---------+ | | CFN | 7.987 | 512.4 | | 15 +--------+----------------+---------+ | (high) | random | 12.156 | 441.7 | +-------------+--------+----------------+---------+ Figure 5. Test results when service requests rush to a single edge when system background load exists The results show that CFN has average delay decreased by 34.67%, 35.29% and 34.30% in medium low, medium high and high computing load respectively. And total qps is increased by 12.28%, 14.10% and 16.01% in medium low, medium high and high computing load respectively. The performance gain of CFN shown in this test case is much higher than that in section 4.1 The reason is that the random service dispatching has more than 20% chance to send the request to an edge with service node with very high background computing load while CFN can greatly reduce such possibility. In addition, compare with the results in section 4.1, delay increases 59.10%, 45.83% and 45.06% in different computing load level in CFN and 81.15%, 89.31%, 112.60% in random selection. It shows CFN can much better adapt to dynamic computing load change especially when system background load is high. 4.3 Mixed requests rush to an edge (no system background load) We changed the characteristics of service requests to reflect the co- existence nature of long-processing tasks and short-processing tasks. Short-processing task takes roughly 4ms to complete and long- processing task takes roughly 400ms to complete. And the ratio of Gu, et al [Page 11] INTERNET DRAFT CFN Field Trial Dec 2019 long and short tasks is approximately 1:100. Figure 6 shows the average delay between a request being sent and the response being received by a client and system qps for this case. +-------------+--------+----------------+---------+ | number of | system | average delay | qps | | clients | | (ms) | | +-------------+--------+----------------+---------+ | | CFN | 5.205 | 193.5 | | 5 +--------+----------------+---------+ | (medium low)| random | 5.398 | 193.5 | +-------------+--------+----------------+---------+ | | CFN | 5.201 | 393.4 | | 10 +--------+----------------+---------+ |(medium high)| random | 5.985 | 385 | +-------------+--------+----------------+---------+ | | CFN | 6.147 | 559.4 | | 15 +--------+----------------+---------+ | (high) | random | 8.499 | 559.4 | +-------------+--------+----------------+---------+ Figure 6. Test results when mixed service requests rush to a single edge when no system background load The results show that CFN has average delay decreased by 3.58%, 13.10% and 27.76% in medium low, medium high and high computing load respectively. The qps has no much difference for different levels of computing load especially for the medium low and high case. 4.4 Impact from update frequency The computing load information is updated and distributed when its metric changes exceed some threshold compared to the last distributed information. In the test, we used the 10% of maximum number of connections allowed and 5% CPU consumption as threshold. Frequency of update affects the system performance. We tested for different update interval to see their impact. The clients keep sending requests to make the computing resource consumption on each edge maintained at medium low which is about 5 connections. Update internal has been set to 10s, 5s, 1s, 100ms, 10ms, 1ms. Figure 7 shows the average delay between a request being sent and the response being received by a client under different update intervals and the improvement of delay when comparing to the case of 10 second interval. The results shows that the higher frequency of updates distributed the better performance. Gu, et al [Page 12] INTERNET DRAFT CFN Field Trial Dec 2019 +-------------+--------------+----+----+----+-----+----+----+ |# of clients | Interval | 10s| 5s |1s |100ms|10ms| 1ms| |-------------+--------------+----+----+----+-----+----+----+ | 5 | Delay(us) |6445|6255|5741|5312 |4883|4058| |(medium low) |--------------+----+----+----+-----+----+----+ | |Improvement(%)| 0 |3.5 |12.3|21.3 |32.3|58.8| +-------------+--------------+----+----+----+-----+----+----+ Figure 7. Test results under different update intervals 5. Summary This draft presents a field trial for CFN system with three edge sites in different locations. CFN enables a network-based fast-react system to serve multi-edge based computing service in a more balanced way. Computing load information are exchanged regularly between CFN nodes. CFN egress bound to serve a particular service is determined in real time and maintained to ensure flow affinity. The tests show that the overall clients' request delay is greatly decreased and the system qps has some improvement too. CFN is a feasible and efficient way in edge computing to provide multi-edge service balancing. 6. Security Considerations The security risks mentioned in [CFN-fmwk] apply in the tests. As a preliminary tests, no extra security risks control is implemented currently. Mechanisms such as authentication of edge node and fluctuation avoidance should be considered in deployment. 7. IANA Considerations No IANA action is required. 8. Acknowledgements The authors would like to thank Xunwen Li's team members for their help in setting up the testbed in Hangzhou. 9. References 9.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Gu, et al [Page 13] INTERNET DRAFT CFN Field Trial Dec 2019 9.2 Informative References [CFN-req] Geng, L., et al, "Compute First Networking (CFN) Scenarios and Requirements", draft-geng-cfn-req-00, November 2019. [CFN-fmwk] Li, Y., et al, "Framework of Compute First Networking (CFN)", draft-li-cfn-framework-00, November 2019. Authors' Addresses Shuheng Gu Huawei Technologies EMail: gushuheng@huawei.com Guanhua Zhuang Huawei Technologies EMail: zhuangguanhua@huawei.com Huijuan Yao China Mobile EMail: yaohuijuan@chinamobile.com Xunwen Li China Mobile EMail: lixunwen@zj.chinamobile.com Gu, et al [Page 14]