T2TRG M. McBride
Internet-Draft D. Kutscher
Intended status: Standards Track Huawei
Expires: April 25, 2019 E. Schooler
CJ. Bernardos
October 22, 2018

Overview of Edge Data Discovery


This document describes the problem of distributed data discovery in edge computing. Increasing numbers of IoT devices and sensors are generating a torrent of data that originates at the very edges of the network and that flows upstream, if it flows at all. Sometimes that data must be processed or transformed (transcoded, subsampled, compressed, analyzed, annotated, combined, aggregated, etc.) on edge equipment along the way, particularly in places where multiple high bandwidth streams converge and where resources are limited. Support for edge data analysis is critical to make local, low-latency decisions (e.g., regarding predictive maintenance, the dispatch of emergency services, identity, authorization, etc.). In addition, (transformed) data may be cached, copied and/or stored at multiple locations in the network on route to its final destination. Although the data might originate at the edge, for example in factories, automobiles, video cameras, wind farms, etc., as more and more distributed data is created, processed and stored, it becomes increasingly dispersed throughout the network and there needs to be a standard way to find it. New and existing protocols will need to be identified/developed/enhanced for distributed data discovery at the network edge and beyond.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 25, 2019.

Copyright Notice

Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

Table of Contents

1. Introduction

Edge computing is an architectural shift that migrates Cloud functionality (compute, storage, networking, control, data management, etc.) out of the back-end data center to be more proximate to the IoT data being generated at the edges of the network. Edge computing provides local compute, storage and connectivity services, often required for latency- and bandwidth-sensitive applications. Thus, Edge Computing plays a key role in verticals such as Energy, Manufacturing, Automotive, Video Analytics, Gaming, Healthcare, Mining, Buildings and Smart Cities.

Edge computing is motivated at least in part by the sheer volume of data that is being created by IoT devices (sensors, cameras, lights, vehicles, drones, wearables, etc.) at the very network edge and that flows upstream, in a direction for which the network was not originally provisioned. In fact, in dense IoT deployments (e.g., many video cameras are streaming high definition video), where multiple data flows collect or converge at edge nodes, data is likely to need transformation (transcoded, subsampled, compressed, analyzed, annotated, combined, aggregated, etc.) to fit over the next hop link, or even to fit in memory or storage. Note also that the act of performing compute on the data creates yet another new data stream! In addition, (transformed) data may be cached, copied and/or stored at multiple locations in the network on route to its final destination. With an increasing percentage of devices connecting to the Internet being mobile, support for in-the-network caching and replication is critical for continuous data availability, not to mention efficient network and battery usage for endpoint devices. Additionally, as mobile devices’ memory/storage fill up, in an edge context they may have the ability to offload their data to other proximate devices or resources, leaving a bread crumb trail of data in their wakes. Therefore, although data might originate at edge devices, as more and more data is continuously created, processed and stored, it becomes increasingly dispersed throughout the physical world (outside of or scattered across managed local data centers), increasingly isolated in separate local edge clouds or data silos. Thus there needs to be a standard way to find it. New and existing protocols will need to be identified/developed/enhanced for these purposes. Being able to discover distributed data at the edge or in the middle of the network - will be an important component of Edge computing.

An IETF T2T RG Edge discussion was held and a comparative study on the definition of Edge computing was presented in multiple sessions in T2T RG this last year. An IETF BEC (beyond edge computing) effort has been evaluating potential gaps in existing edge computing architectures. Edge Data Discovery is one potential gap that needs evaluation and a solution.

And businesses, such as industrial companies, are starting to understand how valuable the data is that they've kept in silo's. Once this data is able to be aggregated on edge computing platforms, they will be able to monetize the value of the data. But this will happen only if data can be discovered and searched among equipment in a standard way. Discovering the data, that its most useful to a given market segment, will be extremely useful in building business revenues. Having a mechanism to provide this granular discovery is the problem that needs solving either with existing, or new, protocols.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

1.2. Terminology

2. The Edge Data Discovery Scope

Edge Computing data will typically be found at the device or infrastructure edges. This is where we are focusing our efforts in defining this edge data discovery problem space. Edge data will also be sent to the cloud as needed. Discovering data which has be sent to the cloud is out of scope of this document.

|   Core Data Center            |
         ***   Backbone
        *   *  Network
|   Regional Data Center        |
         ***   Metropolitan
        *   *  Network
| Infrastructure Edge|
         ***   Access
        *   *  Network
|          |Device Edge

Figure 1: Edge Data Discovery Scope

2.1. Types of Discovery

There are many aspects of discovery.

Discovery of new devices added to an environment. Discovery of their capabilities/services in client/server environments. Discovery of these new devices automatically. Discovering a device and then synchronizing the device inventory and configuration for edge services. There are many existing protocols to help in this discovery: UPnP, mDNS, DNS-SD, SSDP, NFC, XMPP, W3C network service discovery, etc.

Edge devices discover each other in a standard way. We can use DHCP, SNMP, SMS, COAP, LLDP, and routing protocols such as OSPF for devices to discovery one another.

Discovery of link state and traffic engineering data/services by external devices. BGP-LS is one solution.

There is discovery of aggregated data on edge compute device, which is the focus of this draft. How can we discover aggregated data on the edge and make use of it.

Besides sensor data being aggregated on the edge computing infrastructure, there will also be streaming data (from a camera), meta data (about the data or about the device that generated the data or about the context, etc), or control data regarding an event that triggered, or an executable that embodies a function, method or service, or other piece of code or algorithm. And it could be new data that is created after (multiple) streams converge at the edge node and are processed/transformed in some manner.

Discovery of functions in an SFC environment: Service function chaining (SFC) allows the instantiation of an ordered set of service functions and subsequent "steering" of traffic through them. Service functions provide an specific treatment of received packets, therefore they need to be known so they can be used in a given service composition via SFC. So far, how the SFs are discovered and composed has been out of the scope of discussions in IETF. While there are some mechanisms that can be used and/or extended to provide this functionality, work needs to be done. An example of this can be found in "I-D.bernardos- sfc-discovery".

Discovery of resources in an NFV environment: virtualized resources do not need to be limited to those available in traditional data centers, where the infrastructure is stable, static, typically homogeneous and managed by a single admin entity. Computational capabilities are becoming more and more ubiquitous, with terminal devices getting extremely powerful, as well as other types of devices that are close to the end users at the edge (e.g., vehicular onboard devices for infotainment, micro data centers deployed at the edge, etc.). It is envisioned that these devices would be able to offer storage, computing and networking resources to nearby network infrastructure, devices and things (the fog paradigm). These resources can be used to host functions, for example to offload/complement other resources available at traditional data centers, but also to reduce the end-to- end latency or to provide access to specialized information (e.g., context available at the edge) or hardware. Similarly to the discovery of functions, while there are mechanisms that can be reused/extended, there is no complete solution yet defined. An example of work in this area is I-D.bernardos-intarea-vim-discovery"

3. Protocols for Discovering Resources

Mainly two types of situations need to be covered:

  1. A set of resources appears (e.g., by a mobile node hosting them joining a network) and they have to be discovered by an existing virtualization infrastructure.
  2. A mobile device wants to discover virtualization resources available at the current location.

Different alternatives of protocols can be used for this: from approaches coupled with the access technology used, to solutions over the top such as UPnP, mDNS, DNS-SD, SSDP, also including solutions embedded into IP discovery/autoconfiguration, such as Neighbor Discovery or DHCP.

4. Protocols for Discovering Functions

In an SFC environment deployed at the edge, the discovery protocol may need to make available the following information per SF:

5. Naming the Data

Named Data Networking (NDN) is one of five research projects funded by the U.S. National Science Foundation under its Future Internet Architecture Program. NDN has its roots in an earlier project, Content-Centric Networking (CCN), which Van Jacobson started at Xerox PARC around the time of his Google talk, to turn his architecture vision into a running prototype (see also his CoNEXT 2009 paper and especially Jacobsons ACM Queue interview). The motivation is the mis-match of todays Internet architecture and its usage. Today we build, support, and use Internet applications and services on top of an extremely capable architecture not designed to support them. What if we had an architecture designed to support them? Specifically, todays IP packets can name only endpoints of conversations (IP addresses) at the network layer. What if we generalize this layer to name any information (or content), not just endpoints? We make it easier to develop, manage, secure, and use our networks. NDN can be applied to edge data discovery to make it much easier to extract data by naming it. If data was named we would be able to discover the appropriate data simply by its name.

6. Edge Data Discovery

How can we discover aggregated data on the edge and make use of it? There are proprietary implementations of collecting data from various databases and consolidating it for evaluation. We need a standard protocol set for doing this data discovery, on the device or infrastructure edge, in order to meet the requirements of many use cases. We will have terabytes of data on the edge and need a way to identify its existance and find the desired data. A user requires the need to search for specific data in a data set and evaluate it using their own tools. The tools are outside the scope of this document, but the discovery of that data is in scope.

7. Use Cases of edge data discovery

1. Autonomous Vehicles

Description: Autonomous vehicles rely on the processing of huge amounts of complex data in real-time for fast and accurate decisions. These vehicles will rely on high performance compute, storage and network resources to process the volumes of data they produce in a low latency way. Various systems will need a standard way to discover the pertinent data for decision making

1. Video Surveillance

Description: The majority of the video surveillance footage will remain at the edge infrastructure (not sent to the cloud data center). This footage is coming from vehicles, factories, hotels, universities, farms, etc.Much of the video footage will not be interesting to those evaluating the data. A mechanism, set of protocols perhaps, is needed to identify the interesting data at the edge. The data will be in storage systems or in flight in networking equipment.

1. Elevator Networks

Description: Elevators are one of many industrial applications of edge computing. Edge equipment receives data from 100's of elevator sensors. The data coming into the edge equipment is vibration, temperature, speed, level, video, etc. We need the ability to identify where the data we need to evalute is located.

8. IANA Considerations


9. Security Considerations

Security considerations will be a critical component of edge data discovery particularly as intelligence is moved to the extreme edge where data is to be extracted.

10. Acknowledgement

11. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.

Authors' Addresses

Mike McBride Huawei EMail: michael.mcbride@huawei.com
Dirk Kutscher Huawei EMail: dirk.kutscher@huawei.com
Eve Schooler Intel EMail: eve.m.schooler@intel.com
Carlos J. Bernardos Universidad Carlos III de Madrid Av. Universidad, 30 Leganes, Madrid, 28911 Spain Phone: +34 91624 6236 EMail: cjbc@it.uc3m.es URI: http://www.it.uc3m.es/cjbc/