Data At Rest Encryption: DARE ContainerComodo Group Inc.philliph@comodo.com
This document describes DARE Container, a message and file syntax that allows a sequence of data frames to be represented with cryptographic integrity, signature and encryption enhancements to be constructed in an append only format. The format supports data integrity checks using digest chains and Merkle trees. The simplest supports efficient append only write operations and efficient read operations in either the forward or reverse direction. Support for efficient random-access reads may be provided through the use of binary trees or index records appended to the end of the file.
This document is also available online at
http://mathmesh.com/Documents/draft-hallambaker-dare-container.html
.
DARE Container is a message and file syntax that allows a sequence of data frames to be represented with cryptographic integrity, signature, and encryption enhancements to be constructed in an append only format. DARE Container was developed in response to needs that arose out of the design of the Mathematical Mesh
. It is built on the binary encodings of JSON data objects, JSON-B and JSON-C
and the DARE Message format
.
The high level requirements supported include:
Recording Mesh transactions in persistent storage.
Synchronizing transaction logs between hosts.
Representing message archives (aka mail spool)
Signing and encrypting single data items.
The features supported by DARE Container include:
The format is append only, thus providing for rapid write operations and enabling the use of technologies that provide atomic transactions.
All length and index values support the use of integers of at least 64 bits.
Data frames may be of variable length.
Data frames may be read in either direction. This allows the last n frames to be read as efficiently as the first n frames.
Appending a data frame to an existing file is efficient taking no more than log2 (n) operations.
A binary tree index MAY be constructed on an incremental basis, allowing random access to the nth record in the file in log2 (n) operations.
An index MAY be appended to an existing container to allow random access to the nth record in the file in log2 (n) operations
Permits the use of modern data encodings (e.g. JSON
).
Supports digital signature and public key operations on the payloads of individual data frames.
Data frame content (i.e. payload data) may be overwritten without invalidating the integrity of any other frame. This allows content to be expunged in exigent circumstances (court order, regulatory, confidentiality breach, etc.) without compromising the integrity of the rest of the data in the container.
Many file proprietary formats are in use that support some or all of these capabilities but only a handful have public, let alone open, standards. DARE Container is designed to provide a superset of the capabilities of existing message and file syntaxes, including:
Cryptographic Message Syntax
defines a syntax used to digitally sign, digest, authenticate, or encrypt arbitrary message content.
The.ZIP File Format specification
developed by Phil Katz.
The BitCoin Block chain
.
JSON Web Encryption and JSON Web Signature
Attempting to make use of these specifications in a layered fashion would require at least three separate encoders and introduce unnecessary complexity.
Every data format represents a compromise between different concerns, in particular:
The space required to record data in the encoding.
The additional volatile storage (RAM) required to maintain indexes etc. to support efficient retrieval operations.
The number of operations required to retrieve data from or append data to an existing encoded sequence.
Optimizing the response time of magnetic storage media to random access read requests has traditionally been one of the central concerns of database design. The DARE Container format is designed to the assumption that this will cease to be a concern as solid state media replaces magnetic.
While the cost of storage of all types has declined rapidly over the past decades, so has the amount of data to be stored. DARE Container represents a pragmatic balance of these considerations for current technology. In particular, since payload volumes are likely to be very large, memory and operational efficiency are considered higher priorities than data volume.
DARE Container makes use of the following related standards and specifications.
Content frame headers are encoded using JavaScript Object Notation (JSON)
, JSON-B or JSON-C
.
The encryption and signature schemes used are based on JSON Web Signature
and JSON Web Encryption
.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119
.
A DARE Container consists of a sequence of JBCD Frames containing up to three ordered JBCD records as follows:
Metadata of any type including container metadata, payload metadata and DARE message headers.
The frame data. Payload records are either complete or incremental. A complete payload contains a complete unit of whatever data is being written to the log. An incremental payload contains a part of a unit that has been split across multiple records.
An opaque data record with a known value written to the end of a frame to allow a writer to avoid corruption of the container framing data by detecting an incomplete append state.
A DARE frame consists of a forward length indicator, the framed data and a reverse length indicator. The reverse length indicator is written out backwards to allow the frame to be read in the reverse direction:
When first reading an existing file, an application will typically read the first frame and the last frame (if the container has more than one frame). This allows the reader to quickly determine the format(s) used by the container, the number of frames in the container and the location of any index frames (if present).
The container format is designed to support creation of write-once and append-only file formats. Each frame SHOULD be written as an atomic operation.
The first frame in a container and the first record in a frame have special roles that are described in this document.
The first frame in a container describes the container format options and defaults. These include the range of encoding options for frame metadata supported and the container profiles to which the container conforms.
The first record in a frame MUST NOT contain payload data
A key objective of the DARE Container format is that the simplest possible reader be capable of reading any container file albeit with possibly reduced performance.
A Container MAY conform to one or more profiles. Conforming to a profile typically requires a writer to provide additional information when writing a file but does not require a reader to interpret it unless use of a feature (e.g. authentication) that depends on the additional information is required.
The following profiles are currently defined:
Frame headers contain IndexPosition entries that specify the start position of previous frames. This enables efficient random access to any frame in the file.
Frame headers contain PayloadDigest entries that specify the digest value of the corresponding payload data in that frame.
Frame headers contain ChainDigest entries that link each frame to the preceding frame.
Frame headers contain TreeDigestPartial and TreeDigestFinal entries linking all the frames in the container in a binary Merkle Tree.
The use of Chain and Merkle Trees for integrity checks is described below.
The use of Tree and Index frames is described below.
The following profiles are currently defined:
A container with exactly one content frame. A container declared as a singleton frame cannot have additional content frames appended (but metadata frames may be)
A container whose payload data is limited to content frames. A container declared as a multi container may contain 0, 1 or more content frames.
A multi-container whose payload data is limited to content frames whose last frame contains a metadata index for the content frames in the container.
A multi-container in which each frame represents exactly one payload object.
A multi-container in which payload objects MAY be split across multiple consecutive frames.
A multi-container in which payload objects MAY be split across multiple frames which may in turn be interleaved with frames containing other payload objects in complete or partial form.
DARE container payloads MAY be encrypted as DARE Message Enhanced Data Sequences
. This specification builds on JSON Web Encryption (JWE)
to provide a flexible framework allowing a single key exchange to be applied to encrypt multiple data sequences.
The DARE Container and DARE Message format are designed to compliment each other:
DARE Message Format allows a Master Key established in a single key exchange to be applied to multiple DARE related messages.
DARE Container Format allows sets of DARE related messages to be organized so that individual messages and the keying material required to interpret them can be efficiently retrieved.
An index may be appended to an existing file at any time. Since the use of bidirectional frames makes reading the last record is as efficient as reading the first, the last record in an indexed file is usually either the index itself or a pointer to the last index.
An index frame consists of a frame header
Use of index frames provides read access to any record in the file in O(1) operations but attempting to compiling a complete index with every write incurs an O(n) penalty on write for both operations and storage. Accordingly, random read access to a file while it is being written is better supported using an index tree.
Binary search is supported by means of the TreePosition parameter specified in the FrameHeader. This parameter specifies the value of the immediately preceding apex.
Calculation of the immediately preceding apex is most easily described by representing the array index in binary with base of 1 (rather than 0). An array index that is a power of 2 (2, 4, 8, 16, etc.) will be the apex of a complete tree. Every other array index has the value of the sum of a set of powers of 2 and the immediately preceding apex will be the value of the next smallest power of 2 in the sum.
For example, to find the immediately preceding apex for frame 5, we add 1 to get 6. 6 = 4 + 2, so we ignore the 2 and the preceding frame is 4.
The values of Tree Position are shown for the first 8 frames in figure xx below:
An algorithm for efficiently calculating the immediately preceding apex is provided in Appendix C.
Contains a table of index, position pairs pointing to prior locations in the file.
Contains a list of IndexMeta entries. Each entry contains a metadata description and a list of frame indexes (not positions) of frames that match the description.
Frame sequences in a DARE container MAY be protected against a frame insertion attack by means of a digest chain, a binary Merkle tree or both.
A digest chain is simple to implement but can only be verified if the full chain of values is known. Appending a frame to the chain has O(1) complexity but verification has O(n) complexity:
The value of the chain digest for the the first frame (frame 0) is H(IV+H(Payload0)), where IV is an initialization vector consisting of a string of zero bytes and payloadn is the sequence of payload data bytes for frame n
The value of the chain digest for frame n is H(H(Payloadn-1 +H(Payloadn)), where A+B stands for concatenation of the byte sequences A and B.
The tree index mechanism describe earlier may be used to implement a binary Merkle tree. The value TreeDigest specifies the apex value of the tree for that node.
Appending a frame to the chain has O(log2n) complexity provided that the container format supports at least the binary tree index. Verifying a chain has O(log2 n) complexity, provided that the set of necessary digest inputs is known.
To calculate the value of the tree digest for a node, we first calculate the values of all the sub trees that have their apex at that node and then calculate the digest of that value and the immediately preceding local apex.
TBS stuff
TBS stuff
Specifies the data encoding for the header section of for the following frames. This value is ONLY valid in Frame 0 which MUST have a header encoded in JSON.
Describes a container header. A container header MAY contain any DARE Message header.
The record index within the file. This MUST be unique and satisfy any additional requirements determined by the ContainerType.
Specifies the container type for the following records.
If true, the current frame is a meta frame and does not contain a payload.
Note: Meta frames MAY be present in any container. Applications MUST accept containers that contain meta frames at any position in the file. Applications MUST NOT interpret a meta frame as a data frame with an enpty payload.
Unique object identifier
Content meta data.
Position of the frame containing the apex of the preceding sub-tree.
Specifies the position in the file at which the last index entry is to be found
Specifies the position in the file at which the key exchange data is to be found
An index of records in the current container up to but not including this one.
If present, contains the digest of the Payload.
If present, contains the digest of the PayloadDigest values of this frame and the frame immediately preceding.
If present, contains the Binary Merkle Tree digest value.
Unique object identifier
List of labels that are applied to the payload of the frame.
List of key/value pairs describing the payload of the frame.
Frame number of the first object instance value.
Frame number of the immediately prior object instance value
TBS stuff
Information describing the object instance
The content type field as specified in JWE
List of filename paths for the payload of the frame.
Unique object identifier
Initial creation date.
Date of last modification.
TBS stuff
A container index
If true, the index is complete and contains position entries for all the frames in the file. If absent or false, the index is incremental and only contains position entries for records added since the last frame containing a ContainerIndex.
List of container position entries
List of container position entries
Specifies the position in a file at which a specified record index is found
The record index within the file.
The record position within the file relative to the index base.
Specifies a key/value entry
The key
The value corresponding to the key
Specifies the list of index entries at which a record with the specified metadata occurrs.
List of record indicies within the file where frames matching the specified criteria are found.
Content type parameter
List of filename paths for the current frame.
List of labels that are applied to the current frame.
Payload data MAY be signed using a JWS
as applied in the DARE Message format
.
Signatures are specified by the Signatures parameter in the content header. The data that the signature is calculated over is defined by the typ parameter of the Signature as follows.
The frame payload data.
The value of the PayloadDigest parameter
The value of the ChainDigest parameter
The value of the TreeDigestFinal parameter
If the typ parameter is absent, the value Payload is implied.
A frame MAY contain multiple signatures created with the same signing key and different typ values.
The use of signatures over chain and tree digest values permit multiple frames to be validated using a single signature verification operation.
The container format is intended to be the basis of future work to support:
Very large container sizes (larger than the size of the host's memory).
Partitioning of very large data sets across multiple hosts with parallel append.
Fault tolerance
The container format is designed to be capable of supporting efficient random access to frames in containers considerably larger than the processing memory of the host computer without the need to pre-load indexes.
A combination of the following strategies is being considered:
Use memory mapped file views to container data to optimize random access times while controlling memory use and time taken to construct memory views.
When the container is first bound, use the binary tree index data in TreePosition parameters to support random access operations until index building is complete.
Perform Index building operations as a non-blocking background task.
While storage devices capable of storing tends of Tb of data with RAID redundancy are commonplace, it is generally desirable that there be at least as many CPU cores as disks. Thus, partitioning of data sets across multiple hosts becomes desirable for throughput even if a single host could handle the storage requirement.
In the types of applications envisaged in the Mesh, almost every data set may be reduced to collections that are bound to a single account. While it is obviously desirable that a user's mail messages (for example) be replicated across multiple machines to provide fault tolerance, fragmenting the copies of this data set across multiple machines should be avoided unless the data volumes are so large as to require it.
The encoding scheme is 64-bit clean throughout and thus supports containers and frames as large as 18 petabytes. Larger data volumes could be supported through use of 128-bit integer pointers but even if the technology to support such data volumes were developed, it is highly unlikely anyone would want to represent data sets anywhere near this size in a serial format.
Due to limitations in the design of the encryption schemes that may be used (e.g. AES-GCM), the maximum encrypted frame size is 64GB. While this is not currently a major concern for encryption of individual data files, it is easy to see situations in which an archive of encrypted files could exceed that amount. One possibility would be to define a modification to AES -GCM which caused the encryption key to be incremented by a fixed amount after encrypting a certain amount of data, though this might well present implementation challenges unless the maximum data block size was chosen to be deliberately small so as to force code paths to be exercised. Another possibility would be to limit the size of encrypted data frames by requiring the frame pointer to be no larger than 32 bits and require larger data items to be represented as a sequence of frames.
The container format deliberately avoids support for concurrent write operations. Should this be desirable, some mechanism must be provided to cache write fragments to an intermediate file and then consolidate them for writing to the master log.
The data payloads in all the following examples are identical, only the authentication and/or encryption is different.
Frame 1..n consists of 300 bytes being the byte sequence 00, 01, 02, etc. repeating after 256 bytes.
For conciseness, the wire format is omitted for examples after the first, except where the data payload has been transformed, (i.e. encrypted).
Here the simple container:
The header values are:
Frame 0
Frame 0
Frame 0
The following example shows a container in which all the frame payloads are encrypted under the same master secret established in a key agreement specified in the first frame.
Frame 0
Here are the container bytes. Note that the content is now encrypted and has expanded by 25 bytes. These are the salt (16 bytes), the AES padding (4 bytes) and the JSON-B framing (5 bytes).
The following example shows a container in which all the frame payloads are encrypted under separate key agreements specified in the payload frames. The JavaScript Object Notation (JSON) Data Interchange FormatJSON Web Signature (JWS)JSON Web Encryption (JWE)Key words for use in RFCs to Indicate Requirement LevelsBinary Encodings for JavaScript Object Notation: JSON-B, JSON-C, JSON-DData At Rest Encryption Part 1: DARE MessageAPPNOTE.TXT - .ZIP File Format SpecificationPKWARE IncBlockchain SpecificationChain.comCryptographic Message Syntax (CMS)Mathematical Mesh: Architecture