A feature freezer for the Concise Data Definition Language (CDDL)
Universität Bremen TZI
Postfach 330440
Bremen
D-28359
Germany
+49-421-218-63921
cabo@tzi.org
Internet-Draft
In defining the Concise Data Definition Language (CDDL), some features
have turned up that would be nice to have. In the interest of
completing this specification in a timely manner, the present document
was started to collect nice-to-have features that did not make it into the first RFC
for CDDL, RFC 8610.
It is now time to discuss thawing some of the concepts discussed here.
A number of additional proposals have been added.
Introduction
In defining the Concise Data Definition Language (CDDL), some features
have turned up that would be nice to have. In the interest of
completing this specification in a timely manner, the present document
was started to collect nice-to-have features that did not make it into the first RFC
for CDDL .
It is now time to discuss thawing some of the concepts discussed here.
A number of additional proposals have been added.
There is always a danger for a document like this to become a shopping
list; the intention is to develop this document further based on
real-world experience with the first CDDL standard.
Base language features
Cuts
Section 3.5.4 of alludes to a new language feature, cuts,
and defines it in a fashion that is rather focused on a single
application in the context of maps and generating better diagnostic
information about them.
The present document is expected to grow a more complete definition of
cuts, with the expectation that it will be upwards-compatible to the
existing one in , before this possibly becomes a mainline
language feature in a future version of CDDL.
Literal syntax
Tag-oriented Literals
Some CBOR tags often would be most natural to use in a CDDL spec with a literal
syntax that is tailored to their semantics instead of their
serialization in CBOR. There is currently no way to add such syntaxes, no
defined extension point either.
The text form of CoRAL defines literals of
the form
for datetime items. (Similar advances should then probably be made in
diagnostic notation.)
Regular Expression Literals
Regular expressions currently are notated as strings in CDDL, with all
the string escaping rules applied once. It might be convenient to
have a more conventional literal format for regular expressions,
possibly also providing a place to add modifiers such as /i.
This might also imply text .regexp ..., which with the proposal in
then raises the question of how to indicate the regular
expression flavor.
Clarifications
A number of errata reports have been made around some details of text
string and byte string literal syntax: and .
These need to be addressed by re-examining the details of these
literal syntaxes.
Also, needs to be applied.
Err6527
The ABNF used in for the content of text string literals
is rather permissive:
text = %x22 *SCHAR %x22
SCHAR = %x20-21 / %x23-5B / %x5D-7E / %x80-10FFFD / SESC
SESC = "\" (%x20-7E / %x80-10FFFD)
This allows almost any non-C0 character to be escaped by a backslash,
but critically misses out on the \uXXXX and \uHHHH\uLLLL forms
that JSON allows to specify characters in hex. Both can be solved by
updating the SESC production to:
SESC = "\" ( %x22 / %x2F / %x5C / %x62 / %x66 / %x6E / %x72 / %x74 /
(%x75 hexchar) )
hexchar = non-surrogate / (high-surrogate "\" %x75 low-surrogate)
non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG) /
("D" %x30-37 2HEXDIG )
high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIG
low-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIG
Now that SESC is more restrictively formulated, this also requires an
update to the BCHAR production used in the ABNF syntax for byte string
literals:
bytes = [bsqual] %x27 *BCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF
bsqual = "h" / "b64"
The updated version explicit allows \', which is no longer allowed
in the updated SESC:
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / "\'" / CRLF
Err6543
The ABNF used in for the content of byte string literals
lumps together byte strings notated as text with byte strings notated
in base16 (hex) or base64 (but see also updated BCHAR production above):
bytes = [bsqual] %x27 *BCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF
Errata report 6543 proposes to handle the two cases in separate
productions (where, with an updated SESC, BCHAR obviously needs to be
updated as above):
bytes = %x27 *BCHAR %x27
/ bsqual %x27 *QCHAR %x27
BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF
QCHAR = DIGIT / ALPHA / "+" / "/" / "-" / "_" / "=" / WS
This potentially causes a subtle change, which is hidden in the WS production:
WS = SP / NL
SP = %x20
NL = COMMENT / CRLF
COMMENT = ";" *PCHAR CRLF
PCHAR = %x20-7E / %x80-10FFFD
CRLF = %x0A / %x0D.0A
This allows any non-C0 character in a comment, so this fragment
becomes possible:
foo = h'
43424F52 ; 'CBOR'
0A ; LF, but don't use CR!
'
The current text is not unambiguously saying whether the three apostrophes
need to be escaped with a \ or not, as in:
foo = h'
43424F52 ; \'CBOR\'
0A ; LF, but don\'t use CR!
'
... which would be supported by the existing ABNF in .
Controls
Controls are the main extension point of the CDDL language.
It is relatively painless to add controls to CDDL.
Several candidates have been identified that aren't quite ready for
adoption, of which one shall be listed here.
Control operator .pcre
There are many variants of regular expression languages.
Section 3.8.3 of defines the .regexp control, which is
based on XSD regular expressions.
As discussed in that section, the most desirable form of regular
expressions in many cases is the family called "Perl-Compatible
Regular Expressions" (); however, no formally stable
definition of PCRE is available at this time for normatively
referencing it from an RFC.
The present document defines the control operator .pcre, which is
similar to .regexp, but uses PCRE2 regular expressions.
More specifically, a .pcre control indicates that the text string
given as a target needs to match the PCRE regular expression given as
a value in the control type, where that regular expression is anchored
on both sides.
(If anchoring is not desired for a side, .* needs to be inserted
there.)
Similarly, .es2018re could be defined for ECMAscript 2018 regular
expressions with anchors added.
Endianness in .bits
How useful would it be to have another variant of .bits that counts
bits like in RFC box notation? (Or at least per-byte? 32-bit words
don't always perfectly mesh with byte strings.)
.bitfield control
Provide a way to specify bitfields in byte strings and uints to a
higher level of detail than is possible with .bits. Strawman:
Field = uint .bitfield Fieldbits
Fieldbits = [
flag1: [1, bool],
val: [4, Vals],
flag2: [1, bool],
]
Vals = &(A: 0, B: 1, C: 2, D: 3)
Note that the group within the controlling array can have choices,
enabling the whole power of a context-free grammar (but not much more).
Co-occurrence Constraints
While there are no co-occurrence constraints in CDDL, many actual use
cases can be addressed by using the fact that a group is a grammar:
postal = {
( street: text,
housenumber: text) //
( pobox: text .regexp "[0-9]+" )
}
However, constraints that are not just structural/tree-based but are
predicates combining parts of the structure cannot be expressed:
session = {
timeout: uint,
}
other-session = {
timeout: uint .lt [somehow refer to session.timeout],
}
As a minimum, this requires the ability to reach over to other parts
of the tree in a control. Compare JSON Pointer and JSON
Relative Pointer .
Stefan Goessner's jsonpath is a JSON variant of XPath that has not
been formally standardized .
More generally, something akin to what Schematron is to Relax-NG may
be needed.
Module superstructure
CDDL rules could be packaged as modules and referenced from other
modules. There could be some control of namespace pollution, as well
as unambiguous referencing ("versioning").
This is probably best achieved by a pragma-like syntax which could be
carried in CDDL comments, leaving each module to be valid CDDL (if
missing some rule definitions to be imported).
Namespacing
A convention for mapping CDDL-internal names to external ones could be
developed, possibly steered by some pragma-like constructs. External
names would likely be URI-based, with some conventions as they are
used in RDF or Curies. Internal names might look similar to XML
QNames. Note that the identifier character set for CDDL deliberately
includes $ and @, which could be used in such a convention.
Alternative Representations
For CDDL, alternative representations e.g. in JSON (and thus in YAML)
could be defined, similar to the way YANG defines an XML-based
serialization called YIN in Section 11 of .
One proposal for such a syntax is provided by the cddlc tool ; this
could be written up and agreed upon.
cddlj = ["cddl", +rule]
rule = ["=" / "/=" / "//=", namep, type]
namep = ["name", id] / ["gen", id, +id]
id = text .regexp "[A-Za-z@_$](([-.])*[A-Za-z0-9@_$])*"
op = ".." / "..." /
text .regexp "\\.[A-Za-z@_$](([-.])*[A-Za-z0-9@_$])*"
namea = ["name", id] / ["gen", id, +type]
type = value / namea / ["op", op, type, type] /
["map", group] / ["ary", group] / ["tcho", 2*type] /
["unwrap", namea] / ["enum", group / namea] /
["prim", ?(0..7, ?uint)]
group = ["mem", null/type, type] /
["rep", uint, uint/false, group] /
["seq", 2*group] / ["gcho", 2*group]
value = ["number"/"text"/"bytes", text]
IANA Considerations
This document makes no requests of IANA.
Security considerations
The security considerations of apply.
References
Normative References
Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures
This document proposes a notational convention to express Concise Binary Object Representation (CBOR) data structures (RFC 7049). Its main goal is to provide an easy and unambiguous way to express structures for protocol messages and data formats that use CBOR or JSON.
Informative References
XML Schema Part 2: Datatypes Second Edition
Perl-compatible Regular Expressions (revised API: PCRE2)
n.d.
CDDL conversion utilities
n.d.
jsonpath online evaluator
n.d.
Errata Report 6526
Errata Report 6527
Errata Report 6543
The Constrained RESTful Application Language (CoRAL)
Ericsson
The Constrained RESTful Application Language (CoRAL) defines a data
model and interaction model as well as two specialized serialization
formats for the description of typed connections between resources on
the Web ("links"), possible operations on such resources ("forms"),
and simple resource metadata.
JavaScript Object Notation (JSON) Pointer
JSON Pointer defines a string syntax for identifying a specific value within a JavaScript Object Notation (JSON) document.
Relative JSON Pointers
JSON Pointer is a syntax for specifying locations in a JSON document,
starting from the document root. This document defines an extension
to the JSON Pointer syntax, allowing relative locations from within
the document.
YANG - A Data Modeling Language for the Network Configuration Protocol (NETCONF)
YANG is a data modeling language used to model configuration and state data manipulated by the Network Configuration Protocol (NETCONF), NETCONF remote procedure calls, and NETCONF notifications. [STANDARDS-TRACK]
Acknowledgements
Many people have asked for CDDL to be completed, soon.
These are usually also the people who have brought up observations
that led to the proposals discussed here.
Sean Leonard has campaigned for a regexp literal syntax.