Norbert Bollow Internet-Draft DotGNU Project Expires: June 2, 2005 December 3, 2004 SXDF - Simple Extensible Data Format Status of this Memo This document is an Internet-Draft and is subject to all provisions of section 3 of RFC 3667. By submitting this Internet-Draft, the author represents that he is not aware of any applicable patent or other IPR claims, and any of which he becomes aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on June 2, 2005. Copyright Notice Copyright (C) The Internet Society (2004). Abstract The Simple Extensible Data Format (SXDF) defined in this document aims to combine the nice properties of XML (of providing a universal, text-based data format which allows adding additional data fields without breaking existing application programs) with a simple syntax which can be parsed efficiently by computer programs. This data format is intended for over-the-wire use in webservice protocols, where there is generally no interest in being able to directly medify the representation of the data with a standard text editor. 1. Introduction 1.1. Overview Over the past few years, the Extensible Markup Language (XML) [W3C.REC-xml] has become a widely used method for data markup. The Simple Extensible Data Format (SXDF) defined in this document aims to combine the nice properties of XML with a simple syntax which can be parsed efficiently by computer programs. SXDF shares the following good properties of XML: o It is a universal data format which can be used for expressing arbitrarily complex data. o It is a text-based format, which makes it more convenient to debug protocol interactions which use the data format. o Data can be validated in an automated manner to ensure that it adheres to a specified data structure. o There is great flexibility in how the data format used by a given protocol can be extended without breaking existing implementa- tions of the protocol. SXDF differs from XML in that with SXDF the main design goals are simplicity, and allowing efficient parsing by computer programs. SXDF is not a "markup language". It is not intended for data which will be edited with a text editor. A sequence of bytes (eight-bit octets) which satisfies the requirements of this specification is called a "SXDF resource". Here is an example: 484:// here is some data in SXDF format 1% 8:Booklist=3@ 5% 5:Title=16:Hardware Hacking 6:Author=19:Kevin Mitnick (Ed.) 4:Year=4:2004 4:ISBN=13:1-932-26683-6 9:Publisher=8:Syngress 5% 5:Title=12:We the Media 6:Author=11:Dan Gillmor 4:Year=4:2004 4:ISBN=13:0-596-00733-7 9:Publisher=8:O'Reilly 5% 5:Title=22:Matrix Decision Making 6:Author=21:Alex Lowy & Phil Hood 4:Year=4:2004 4:ISBN=13:0-787-97292-4 9:Publisher=11:Jossey-Bass ; Here is the same data expressed in XML format: Hardware Hacking Kevin Mitnick (Ed.) 2004 1-932-26683-6 Syngress We the Media Dan Gillmor 2004 0-596-00733-7 O'Reilly Matrix Decision Making Alex Lowy & Phil Hood 2004 0-787-97292-4 Jossey-Bass Parsers for the SXDF format are generally less complicated and faster than parsers for the XML format. 1.2. Pronunciation The acronym "SXDF" is pronounced like "sixdaf". 1.3. Notational conventions 1.3.1. Requirements notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [KEYWORDS]. 1.3.2. Syntactic notation This syntax specification of SXDF in section 2, and the specification of SXDF Document Structure Descriptions in section 3 uses the Augmented Backus-Naur Form (ABNF) notation specified in [RFC2234]. 2. Syntax specification With "SXDF resource" we mean any sequence of bytes which matches the production labeled "resource", below. With "byte" we mean an octet of bits. resource = nonnegint ":" *comment dictionary ";" nonnegint = 1*digit digit = %x30-39 The non-negative integer at the beginning of the resource MUST be equal to the number of bytes between the ":" which follows it and the ";" which ends the string. In this way, each SXDF resource is a "netstring" as described in [Netstrings]. The SXDF resource MAY contain comments; if it does, the comments MUST follow immediately after the initial colon. comment = "//" *( %x00-09 / %x0B-FF ) %x0A The fundamental SXDF data container is the . It contains "key = value" pairs which are called the elements of the dictionary. dictionary = nonnegint "%" line-end *( string "=" value line-end ) The initial non-negative integer of a dictionary MUST be equal to the number of ( string "=" value line-end ) lines which the dictionary contains. In the dictionary, the strings which precede the "=" in each ( string "=" value line-end ) line are known as "keys". Any given key MUST NOT occur more than once in the same dictionary. line-end = %x0A *" " In the line-end production, there MAY be any number of space characters following following the newline character. It is RECOMMENDED to use a number of space characters which is equal to the number of containing dictionary and sequence elements, as this improves readability for humans. string = nonnegint ":" *%x00-FF %x0A The initial non-negative integer of a string MUST be equal to the number of bytes between the ":" and the final newline character. value = ( string / dictionary / sequence / isequence / fsequence ) sequence = nonnegint "@" %x0A *( value line-end ) isequence = nonnegint "i" %x0A *( int line-end ) fsequence = nonnegint "f" %x0A *( float line-end ) In the sequence, isequence or fsequence production, the initial non-negative integer MUST be equal to the number of following ( value line-end ) lines. The isequence and fsequence productions are like a sequence with the difference that the values are restricted to integer or floating-point numeric constants. int = "0" / ( *1"-" %x31-39 *digit ) float = "0" / ( *1"-" ( "0" / %x31-39 *digit ) "." 1*digit ) 3. Reserved Keywords The keywords "resource" and "0" have a special meaning in the SXDF DSD specification below, and therefore they SHOULD NOT be used as keywords in specifically defined SXDF formats. It is RECOMMENDED to use Capitalized words as keywords when defining SXDF formats, as this reliably prevents the accidental use of a reserved keyword. SXDF parsers however MUST NOT generate an error merely because the reserved keyword is used as a dictionary key outside of a SXDF DSD. 4. SXDF Data Structure Descriptions (DSD) 4.1. Overview As mentioned in the introduction, SXDF data can be verified to adhere to a specific data structure, similar to how this is possible with XML. A SXDF resource MAY contain a "DSD" element, which, if present, is an assertion that the SXDF resource confirms to a particular SXDF Data Structure Description (DSD). The value of the DSD resource SHOULD be either a string which references the DSD by means of an URL, or a dictionary which contains the DSD explicitly. A DSD consists of a dictionary in which one key is "resource" and the other keys are the various types of elements in the data format which is to be described by the DSD. 4.2. Elements of type isequence or fsequence For elements which are required to be of type isequence, the SXDF DSD dictionary contains a value of the following form to indicate this: dsd-i = num "*" num %x69 ; %x69 is ascii lowercase 'i' num = "" / ( %x31-39 *digit ) The 'num "*" num' has the same semantics as in ABNF, i.e. the first numeric number specifies a minimum with a default of zero and the second number specifies a maximum with a default of infinity. Elements of type fsequence are described similarly: dsd-f = num "*" num %x66 ; %x66 is ascii lowercase 'f' 4.3. Elements with String Values A similar syntax is available for strings, with the minimum and maximum referring to the length of the string in bytes (octets of bits): dsd-s = num "*" num %x73 ; %x73 is ascii lowercase 's' While it may be desirable to be specificy the set of valid string values with greater accuracy, that is outside the scope of the Data Structure Description. This does not allow to prescribe an upper limit of zero; however the reserved keyword "0" is defined to represent a value which must always be the empty string. 4.4. Elements with Dictionary Values For elements of type dictionary there are two possibilities, corresponding to different possible uses of dictionaries: If the dictionary is used to encapsulate structured data, it is described in the SXDF DSD by means of a dictionary which has as its keys the possible keys of the dictionary that it describes, and as its values strings of the following form: dsd-a = num "*" num "@" *%x00-FF For required non-array elements, the first and the second number are both "1". For optional non-array elements, the first number is "" (meaning zero) and the second number is "1". For optional array elements, the first number is "" and the second number is either "" or greater than zero. If the keys and values of the dictionary are both part of the data (as opposed to using the dictionary construct for structuring the data) this is indicated by a dictionary with keys "size", "keys" and "values". The values for "size" and "keys" are always strings as follows: size-spec = num "*" num keys-spec = dsd-s The value for "values" is a string which references some element type defined in the DSD. 4.5. Elements with Multiple Possible Value Types Elements which may be of different types are described by an array which contains a list of the various possibilities. 4.6. Nested Arrays Nested arrays can be specified by means of a dsd-a which refers to a key which again has a dsd-a as its value. 4.7. DSD for the Booklist Example Here is a SXDF Data Structure Description for the Booklist example from section 1.1: 220:6% 8:resource=1% 8:Booklist=12:1*1@Booklist 8:Booklist=6:*@Book 4:Book=5% 5:Title=5:1*1@s 6:Author=5:1*1@s 4:Year=6:1*1@s4 4:ISBN=7:1*1@s13 9:Publisher=5:1*1@s 1:s=2:*s 2:s4=4:4*4s 3:s13=13*13s ; If this DSD is published at http://SXDF.org/Booklist-DSD.sxdf this URL can be added to the Booklist example, for purposes of validation, at follows: 521:// here is some data in SXDF format 2% 3:DSD=23:http://SXDF.org/Booklist-DSD.sxdf 8:Booklist=3@ 5% 5:Title=16:Hardware Hacking 6:Author=19:Kevin Mitnick (Ed.) 4:Year=4:2004 4:ISBN=13:1-932-26683-6 9:Publisher=8:Syngress 5% 5:Title=12:We the Media 6:Author=11:Dan Gillmor 4:Year=4:2004 4:ISBN=13:0-596-00733-7 9:Publisher=8:O'Reilly 5% 5:Title=22:Matrix Decision Making 6:Author=21:Alex Lowy & Phil Hood 4:Year=4:2004 4:ISBN=13:0-787-97292-4 9:Publisher=11:Jossey-Bass ; Alternatively, the DSD can be included in the SXDF resource, as follows: 681:// here is some data in SXDF format 2% 3:DSD=6% 8:resource=1% 8:Booklist=12:1*1@Booklist 8:Booklist=6:*@Book 4:Book=5% 5:Title=5:1*1@s 6:Author=5:1*1@s 4:Year=6:1*1@s4 4:ISBN=7:1*1@s13 9:Publisher=5:1*1@s 1:s=2:*s 2:s4=4:4*4s 3:s13=13*13s 8:Booklist=3@ 5% 5:Title=16:Hardware Hacking 6:Author=19:Kevin Mitnick (Ed.) 4:Year=4:2004 4:ISBN=13:1-932-26683-6 9:Publisher=8:Syngress 5% 5:Title=12:We the Media 6:Author=11:Dan Gillmor 4:Year=4:2004 4:ISBN=13:0-596-00733-7 9:Publisher=8:O'Reilly 5% 5:Title=22:Matrix Decision Making 6:Author=21:Alex Lowy & Phil Hood 4:Year=4:2004 4:ISBN=13:0-787-97292-4 9:Publisher=11:Jossey-Bass ; 5. The DSD for SXDF Data Structure Descriptions: Since most of the complexity of SXDF Data Structure Descriptions is in the string values and not in the structure of the DSDs, the DSD for SXDF Data Structure Descriptions is very simple: 83:2% 8:resource=3% 4:size=1:* 4:keys=2:*s 6:values=2:v1 2:v1=2@ 2:*s 8:resource ; This is read as follows: The DSD resource is a dictionary of any size, with keys which are strings of any length, and values which are either strings of any length, or arrays of elements of type "v1". Type "v1" is defined as being either a string of any length, or an element of type "resource". 6. Use of SXDF for storing persistent data Besides its use as an over-the-wire format for webservice protocols, the SXDF data format MAY also be used as an on-disk format for storing persistent data. However, programs which use SXDF for this purpose SHOULD also support the EXDF data format [EXDF] which is designed for allowing data resources to be edited conveniently in a text editor. 7. Security Considerations Webservices typically act on untrusted data; SXDF implementations therefore need to be carefully designed and reviewed to prevent security breaches caused by improper handling of malformed SXDF resources. 8. IANA Considerations This document has no actions for IANA. References Normative References [RFC2234] Crocker, D., Ed., "Augmented BNF for Syntax Specifications: ABNF", RFC 2234. [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119. Informative References [Netstrings] Bernstein, D. J., "Netstrings" [EXDF] Bollow, N., "EXDF - Editable Extensible Data Format", work in progress. [QQP] Bollow, N., "QQP - Quick Queues Protocol", work in progress. [QRPC] Bollow, N., "QRPC - Queueable Remote Procedure Calls", work in progress. [W3C.REC-xml] Bray, T., Paoli, J., Sperberg-McQueen, C. and E. Maler, "Extensible Markup Language (XML) 1.0 (2nd ed)", W3C REC-xml, October 2000, . Authors' Address Norbert Bollow Weidlistrasse 18 CH-8624 Gruet Pone: +41 1 972 2059 EMail: nb@bollow.ch Full Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and at www.rfc-editor.org, and except as set forth therein, the authors retain all their rights. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society.