A Data Manifest for Contextualized Telemetry DataHuaweibenoit.claise@huawei.comHuaweijean.quilbeuf@huawei.comTelefonica I+DDon Ramon de la Cruz, 82Madrid 28006Spaindiego.r.lopez@telefonica.comUniversidad Politecnica de MadridAvenida Complutense 30Madrid 28040Spaini.dominguezm@upm.esSwisscomBinzring 17Zurich8045Switzerlandthomas.graf@swisscom.com
OPS
OPSAWG
Most network equipment feature telemetry as a mean to monitoring their status.
Several protocols exist to this end, for example, the model-driven telemetry
governed by YANG models.
Some of these protocols provide the data itself, without any contextual information about the
collection method. This can render the data unusable if that context is lost, for
instance when the data is stored without the relevant information. This document proposes a data manifest, composed of two YANG data models, to store that contextual information along with the collected data, in order to keep the collected data exploitable in the future.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
Data Manifest: all the necessary data required to interpret the telemetry information.
Platform Manifest: part of the Data Manifest that completely characterizes the platform producing the data.
Data Collection Manifest: part of the Data Manifest that completely characterizes how and when the telemetry
information was metered.
Network elements use Model-driven Telemetry (MDT) to continuously stream information, including both counters and state information.
This streamed information is used for network monitoring or closed-loop automation.
This streamed data can also be stored in a database (sometimes called a big data lake) for further analysis.
When streaming YANG-structured data with YANG-Push , there is a semantic definition in the corresponding YANG module definition. On top of that definition, it is also important to maintain contextual information about the collection environment.
As an example, a database could store a time series representing the evolution of a specific counter.
When analyzing the data, it is important to understand that this counter was requested from the network element at specific cadence, as this exact cadence might not be observed in the time series, potentially implying that the network element was under stress.
The same time series might report some values as 0 or might even omit some values.
This might be explained by a too small observation period, compared to the minimum-observed-period .
Again, knowing the conditions under which the counter was collected and streamed is crucial.
Indeed, taking into account the value of 0 might lead to a wrong conclusion that the counter dropped to zero.
This document specifies the data collection manifest, which contains the required information to characterize how and when the telemetry information was metered.
Precisely characterizing the source used for producing the data (that is the platform manifest) may also be useful to complete the data collection context.
As an example, knowing the exact data source software specification might reveal a particularity in the observed data, explained by a specific bug, or a specific bug fix.
This is also necessary to ensure the reliability of the collected data.
On top of that, in particular for MDT, it is crucial to know the set of YANG modules supported by the device, along with their deviations.
In some cases, there might even be some backwards incompatible changes in native modules between one OS version to the next one.
This information must be compiled in a platform manifest.
Some related YANG modules have been specified to retrieve the device capabilities:
which models the device capabilities regarding the production and export of telemetry data.
, which is based on the previous draft to define the optimal settings to stream specific items (i.e., per path).
While these related YANG modules are important to discover the capabilities before applying the telemetry configuration (such as on-change), some of their content is part of the context for the streamed data. The goal behind this specification is not to expose new information via YANG objects but rather to define what needs to be kept as metadata (the data manifest) to ensure that the collected data can still be interpreted correctly, even if the source device is not accesible (from the collection system), or if the device has been updated (new operating system or new configuration).
This manifest contains two parts, the platform manifest and the data collection manifest.
The platform manifest is "pretty" stable and should change only when the device is updated or patched.
On the other hand, the data collection manifest is likely to change each time a new MDT subscription is requested and might even change if the device load increases and collection periods are updated.
To separate these two parts, we enclose each of them in its own module.
We first present the module for the platform manifest in
and then the module for the data collection manifest in .
The full data manifest is obtained by combining these two modules.
We explain in how the data-manifest can be collected and how collected data is mapped to the data manifest.
contains the YANG tree diagram of the ietf-collected-data-platform-manifest module.
The platform manifest contains a comprehensive set of information characterize a data source.
The platform is identified by a set of parameters ('name', 'software-version', 'software-flavor', 'os-version', 'os-type') that are aligned with the YANG Catalog www.yangcatalog.org so that the YANG catalog could be used to retrieve the YANG modules a posteriori.
The platform manifest also includes the contents of the YANG Library .
That module set is particularly useful to define the paths, as they are based on module names.
Similarly, this module defines the available datastores, which can be referred to from the data-manifest, if necessary.
If supported by the device, fetching metrics from a specific datastore could enable some specific use cases: monitoring configuration before it is committed, comparing between the configuration and operational datastore.
Alternatively, the set of available YANG modules on the device can be described via packages-set which
contains a list of references to YANG Packages .
<CODE BEGINS> file "ietf-collected-data-platform-manifest@2021-10-15.yang"<CODE ENDS> contains the YANG tree diagram of the ietf-collected-data-manifest module.
The data-collection container contains the information related to individual items collection.
This subtree currently contains only information about MDT collection.
It could be extended and extendable to represent other kinds of data collection.
MDT collection is organized in subscriptions.
A given collector can subscribe to one ore more subscriptions that usually contain a list of paths.
Such a collector only needs the data manifest for subscriptions it subscribed to.
The data manifest for MDT is organized by subscriptions as well so that a collector can select only its subscriptions.
We now have a chicken-and-egg issue if the collector collects the
data-manifest via MDT and wants the data-manifest
for the data-manifest subscription. First the
collector will collect the actual paths that it needs in
subscription A. Once it has the subscription id
for A, it will need an additional subscription B for the data
manifest of paths in A. Then, it would need
another subscription C to fetch the data manifest for the
subscription B and so on... A possible solution
would be adding in the "mdt" container an
additional list in that contains the data manifest for every
path that is a data manifest.
By including that list in subscription B, the collector would have
the information about subscription B here.
The "datastore" leaf of the subscription container specifies from which datastore the YANG paths are streamed.
Within a given collection subscription, the granularity of the collection is defined by the path.
Note that all devices do not support an arbitrary granularity up to the leaf, usually for performance reasons.
Each path currently collected by the device should show up in the mdt-path-data-manifest list.
For each path, the collection context must be specified including:
'on-change': when set to true, an update is sent as soon as and only when a value changes. This is also known as Event-Driven Telemetry (EDT). When set to false, the values are sent regularly. 'suppress-redundancy' (only when 'on-change' is false): reduce bandwidth usage by sending a regular update only if the value is different from the previous update. 'requested-period' (only when 'on-change' is false): period between two updates requested by the client for this path 'actual-period' (only when 'on-change 'is false): actual period retained by the platform between two updates. That period could be larger than the requested one as the router can adjust it for performance reasons.
This information is crucial to understand the collected values. For instance, the 'on-change' and 'suppress-redundancy' options, if set, might remove a lot of messages from the database because values are sent only when there is a change.
<CODE BEGINS> file "ietf-collected-data-manifest@2021-10-15.yang"<CODE ENDS>
The data manifest MUST be streamed all with the data and stored along with the collected data.
In case the collected data are moved to a different place (typically a database), the data manifest MUST follow the collected data.
This can render the data unusable if that context is lost, for instance when the data is stored without the relevant information.
The data manifest MUST be updated when the data manifest information changes (for example, when a router is upgraded), when a new telemetry subscription is configured, or when the telemetry subscription paremeters change.
The data should be mapped to the data manifest. Since the data manifest will not change as frequently as the data itself, it makes sense to map several data to the same data manifest. Somehow, the collected data must include a metadata pointing to the corresponding data manifest.
The platform manifest is likely to remain the same until the device is updated. So, the platform manifest only needs to be collected once per streaming session and updated after a device reboot.
As this draft specifically focuses on giving context on data collected via streamed telemetry, we can assume that a streaming telemetry system is available.
Collecting the data and platform manifests can be done either by reusing that streaming telemetry system (in-band) or using another system (out-of-band), for instance by adding headers or saving manifests into a YANG instace file .
We propose to reuse the existing telemetry system (in-band approach) in order to lower the efforts for implementing this draft.
To enable a platform supporting streaming telemetry to also support data collection manifests, it is sufficient that this device supports
the models from and .
Recall that each type of manifest has its own rough frequency update, i.e. at reboot for the platform manifest and at new subscription or CPU load variation for the data collection manifest.
The data manifest MUST be streamed with the YANG-Push on-change feature (also called event-driven telemetry).
With MDT, a particular datapoint is always associated to a path that is itself part of a subscription.
In order to enable a posteriori retrieval of the data manifest associated to a datapoint, the collector must:
keep the path in the metadata of the collected valuescollect as well the data-manifest for the subscription and path associated to the datapoint.
With this information, to retrieve the data manifest from the datapoint, the following happens:
the path is retrieved from the datapoint metadatathe data-manifest for that path is retrieved by looking up on the collected data-manifest.
In that scenario, the reliability of the collection of the data manifest is the same as the reliability of the data collection itself, since the data manifest is like any other data.
For telemetry based on gRPC for instance, a disconnection to the server would be detected as the HTTP connection would fail.
Below is an example of a data-manifest file:<CODE BEGINS> file "ietf-collected-data-manifest@2021-10-15.yang"<CODE ENDS>
The file above contains the data manifest for paths collected in the subscription with id 4242.
The requested period for both path is this subscription was 100ms, however the status of the interface could only be collected every 10s.
As we are reusing an existing telemetry system, the security considerations lies with the new content divulged in the new manifests.
Appropriate access control must be associated to the corresponding leafs and containers.
This document includes no request to IANA.
Do we want to the hardware specifications, next to the OS information? How to fully characterize a virtual device? Do we need to include the vendor (as PEN for instance https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers) ?
Do we want to handle the absence of values, i.e. add information about missed collection or errors in the collection context ? It could also explain why some values are missing. On the other hand, this might also be out scope.
How do we handle other kinds of collection than MDT like netflow, SNMP, CLI ? How do we map the collected data to the data-manifest ?
Align the terms with the YANG Push specifications. Ex: path to subscription (TBC)
Better explain the on-change example.
Regarding the inclusion of ietf-yang-library in our module, do we want to include as well the changes from ietf-yang-library-revisions? What if other information are present in the yang-libary from the platform? Should we use a YANG mount to capture them as well (they would not be captured with our use of the main yang-library grouping).
Henk: how does this interact with SBOM effort?
Eliot: important to give integrity of the information a lot of thought. Threat model to be considered.
Version 3
Add when clause in YANG model Fix validation errors on YANG modules Augment YANG library to handle semantic versioning
Version 2
Alignment with YANGCatalog YANG module: name, vendor Clarify the use of YANG instance file Editorial improvements
Version 1
Adding more into data platform: yang packages, whole yanglib module to specify datastores Setting the right type for periods: int64 -> uint64 Specify the origin datastore for mdt subscription Set both models to config false Applying text comments from Mohamed Boucadair Adding an example of data-manifest file Adding rationale for reusing telemetry system for collection of the manifests Export manifest with on change telemetry as opposed to YANG instance file
Version 0
Initial version
Thanks to Mohamed Boucadair and Tianran Zhou for their reviews and comments.