JSON Canonicalization Scheme (JCS)
Independent
Montpellier
France
anders.rundgren.net@gmail.com
https://www.linkedin.com/in/andersrundgren/
Symantec Corporation
350 Ellis Street
CA
94043
Mountain View
USA
bret_jordan@symantec.com
Spotify AB
Birger Jarlsgatan 61, 4tr
113 56
Stockholm
Sweden
erdtman@spotify.com
Security
JSON, ECMAScript, Signatures, Cryptography, Canonicalization
Cryptographic operations like hashing and signing need the data to be
expressed in an invariant format so that the operations are reliably
repeatable.
One way to address this is to create a canonical representation of
the data. Canonicalization also permits data to be exchanged in its
original form on the "wire" while cryptographic operations
performed on the canonicalized counterpart of the data in the
producer and consumer end points, generate consistent results.
This document describes the JSON Canonicalization Scheme (JCS).
The JCS specification defines how to create a canonical representation
of JSON data by building on the strict serialization methods for
JSON primitives defined by ECMAScript, constraining JSON data to
the I-JSON subset, and by using deterministic property sorting.
Introduction
Cryptographic operations like hashing and signing need the data to be
expressed in an invariant format so that the operations are reliably
repeatable.
One way to accomplish this is to convert the data into
a format that has a simple and fixed representation,
like Base64Url .
This is how JWS addressed this issue.
Another solution is to create a canonical version of the data,
similar to what was done for the XML Signature standard.
The primary advantage with a canonicalizing scheme is that data
can be kept in its original form. This is the core rationale behind JCS.
Put another way, using canonicalization enables a JSON Object to remain a JSON Object
even after being signed. This can simplify system design, documentation, and logging.
To avoid "reinventing the wheel", JCS relies on the serialization of JSON primitives
(strings, numbers and literals), as defined by
ECMAScript (aka JavaScript) beginning with version 6 ,
hereafter referred to as "ES6".
Seasoned XML developers may recall difficulties getting XML signatures
to validate. This was usually due to different interpretations of the quite intricate
XML canonicalization rules as well as of the equally complex
Web Services security standards.
The reasons why JCS should not suffer from similar issues are:
- •
-
The absence of a namespace concept and default values.
- •
-
Constraining data to the I‑JSON subset.
This eliminates the need for specific parsers for dealing with canonicalization.
- •
-
JCS compatible serialization of JSON primitives is currently supported
by most Web browsers and as well as by Node.js ,
- •
-
The full JCS specification is currently supported by multiple
Open Source implementations (see ).
See also .
In summary the JCS specification defines how to create a canonical
representation of JSON data by building on the strict serialization
methods for JSON primitives defined by ECMAScript ,
constraining JSON data to the I-JSON subset,
and by using deterministic property sorting. The output from JCS is a
“Hashable” representation of JSON data that can be used by cryptographic methods.
JCS is compatible with some existing systems relying on JSON canonicalization
such as JWK Thumbprint and Keybase .
For potential uses outside of cryptography see .
The intended audiences of this document are JSON tool vendors, as
well as designers of JSON based cryptographic solutions.
The reader is assumed to have a basic knowledge of ECMAScript including the "JSON" object.
Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
Detailed Operation
This section describes different issues related to creating
a canonical JSON representation, and how they are addressed by JCS.
Creation of Input Data
Data to be serialized is usually achieved by:
- •
-
Parsing previously generated JSON data.
- •
-
Programmatically creating data.
Irrespective of the method used, the data to be serialized MUST be adapted
for I‑JSON formatting, which implies the following:
- •
-
JSON Objects MUST NOT exhibit duplicate property names.
- •
-
JSON String data MUST be expressible
as Unicode .
- •
-
JSON Number data MUST be expressible
as IEEE-754 double precision values.
For applications needing higher precision or longer integers than
offered by IEEE-754 double precision,
outlines how
such requirements can be supported in an interoperable and extensible way.
An additional constraint is that parsed JSON String data MUST NOT be altered during subsequent serializations.
For more information see .
Note: although the Unicode standard offers the possibility of combining
certain characters into one, referred to as "Unicode Normalization"
(),
JCS' string processing does not take this in consideration.
That is, all components involved in a scheme depending on JCS,
MUST preserve Unicode string data "as is".
Note: how structured objects like sets are represented in JSON is out of scope
for JCS. See also .
Generation of Canonical JSON Data
The following subsections describe the steps required to create a canonical
JSON representation of the data elaborated on in the previous section.
shows sample code for an ES6 based canonicalizer,
matching the JCS specification.
Whitespace
Whitespace between JSON tokens MUST NOT be emitted.
Serialization of Primitive Data Types
Assume a JSON object as follows is parsed:
If the parsed data is subsequently serialized
using a serializer compliant with ES6's JSON.stringify(),
the result would (with a line wrap added for display purposes only),
be rather divergent with respect to the original data:
The reason for the difference between the parsed data and its
serialized counterpart, is due to a wide tolerance on input data (as defined
by JSON ), while output data (as defined by ES6),
has a fixed representation. As can be seen in the example,
numbers are subject to rounding as well.
The following subsections describe the serialization of primitive JSON data types
according to JCS. This part is identical to that of ES6.
In the (unlikely) event that a future version of ECMAScript would
invalidate any of the following serialization methods, it will be
up to the developer community to
either stick to this specification or create a new specification.
Serialization of Literals
In accordance with JSON ,
the literals "null", "true", and
"false" are serialized as null, true, and false respectively.
Serialization of Strings
For JSON String data (which includes
JSON Object property names as well), each Unicode code point MUST be serialized as
described below (see section 24.3.2.2 of ):
- •
-
If the Unicode value falls within the traditional ASCII control
character range (U+0000 through U+001F), it MUST
be serialized using lowercase hexadecimal Unicode notation (\uhhhh) unless it is in the
set of predefined JSON control characters U+0008, U+0009, U+000A, U+000C or U+000D
which MUST be serialized as \b, \t, \n, \f and \r respectively.
- •
-
If the Unicode value is outside of the ASCII control character range, it MUST
be serialized "as is" unless it is equivalent to
U+005C (\) or U+0022 (") which MUST be serialized as \\ and \" respectively.
Finally, the resulting sequence of Unicode code points MUST be enclosed in double quotes (").
Note: some JSON systems permit the use of invalid Unicode data
like "lone surrogates" (e.g. U+DEAD).
Since this may lead to interoperability issues including broken signatures,
occurrences of such data MUST cause a compliant JCS implementation to terminate
with an appropriate error.
Serialization of Numbers
JSON Number data MUST be serialized according to
section 7.1.12.1 of
including the "Note 2" enhancement.
Due to the relative complexity of this part, the algorithm
itself is not included in this document.
For implementers of JCS compliant number serialization,
Google's V8 may serve as a reference.
Another compatible number serialization reference implementation
is Ryu ,
that is used by the JCS open source Java implementation
mentioned in .
ES6 builds on the IEEE-754 double precision
standard for representing JSON Number data.
holds a set of IEEE-754 sample values and their
corresponding JSON serialization.
Note: since NaN (Not a Number) and Infinity are not permitted in JSON,
occurrences of such values MUST cause a compliant JCS implementation to terminate
with an appropriate error.
Sorting of Object Properties
Although the previous step normalized the representation of primitive
JSON data types, the result would not yet qualify as "canonical" since JSON
Object properties are not in lexicographic (alphabetical) order.
Applied to the sample in ,
a properly canonicalized version should (with a
line wrap added for display purposes only), read as:
The rules for lexicographic sorting of JSON Object
properties according to JCS are as follows:
- •
-
JSON Object properties MUST be sorted recursively,
which means that JSON child Objects
MUST have their properties sorted as well.
- •
-
JSON Array data MUST also be scanned for the
presence of JSON Objects (if an object is found then its properties MUST be sorted),
but array element order MUST NOT be changed.
When a JSON Object is about to have its properties
sorted, the following measures MUST be adhered to:
- •
-
The sorting process is applied to property name strings in their "raw" (unescaped) form.
That is, a newline character is treated as U+000A.
- •
-
Property name strings to be sorted are formatted
as arrays of UTF-16 code units.
The sorting is based on pure value comparisons, where code units are treated as
unsigned integers, independent of locale settings.
- •
-
Property name strings either have different values at some index that is
a valid index for both strings, or their lengths are different, or both.
If they have different values at one or more index
positions, let k be the smallest such index; then the string whose
value at position k has the smaller value, as determined by using
the < operator, lexicographically precedes the other string.
If there is no index position at which they differ,
then the shorter string lexicographically precedes the longer string.
In plain English this means that property names are sorted in ascending order like the following:
The rationale for basing the sorting algorithm on UTF-16 code units is that
it maps directly to the string type in ECMAScript (featured in Web browsers
and Node.js), Java and .NET. In addition, JSON only supports escape sequences
expressed as UTF-16 code units making knowledge and handling of such data
a necessity anyway.
Systems using another internal representation of string data will need to convert
JSON property name strings into arrays of UTF-16 code units before sorting.
The conversion from UTF-8 or UTF-32 to UTF-16 is defined by the
Unicode standard.
The following test data can be used for verifying the correctness of the sorting
scheme in a JCS implementation. JSON test data:
Expected argument order after sorting property strings:
Note: for the purpose of obtaining a deterministic property order, sorting on
UTF-8 or UTF-32 encoded data would also work, but the outcome for JSON data
like above would differ and thus be incompatible with this specification.
However, in practice, property names are rarely defined outside of 7-bit ASCII making
it possible to sort on string data in UTF-8 or UTF-32 format without conversions
to UTF-16 and still be compatible with JCS. If this is a viable option or not
depends on the environment JCS is used in.
UTF-8 Generation
Finally, in order to create a platform independent representation,
the result of the preceding step MUST be encoded in UTF-8.
Applied to the sample in this
should yield the following bytes here shown in hexadecimal notation:
This data is intended to be usable as input to cryptographic methods.
IANA Considerations
This document has no IANA actions.
Security Considerations
It is vital performing "sanity" checks
on input data to avoid overflowing buffers and similar things that
could affect the integrity of the system.
When JCS is applied to signature schemes like the one described
in ,
applications MUST perform the following operations before acting
upon received data:
-
Parse the JSON data and verify that it adheres to I-JSON.
-
Verify the data for correctness according to the conventions defined by the
ecosystem where it is to be used. This also includes locating the
property holding the signature data.
-
Verify the signature.
If any of these steps fail, the operation in progress MUST be aborted.
Acknowledgements
Building on ES6 Number serialization was
originally proposed by James Manger. This ultimately led to the
adoption of the entire ES6 serialization scheme for JSON primitives.
Other people who have contributed with valuable input to this specification include
Scott Ananian,
Tim Bray,
Ben Campbell,
Adrian Farell,
Richard Gibson,
Bron Gondwana,
John-Mark Gurney,
John Levine,
Mark Miller,
Matt Miller,
Mike Jones,
Mark Nottingham,
Mike Samuel,
Jim Schaad,
Robert Tupelo-Schneck
and Michal Wadas.
For carrying out real world concept verification, the software and
support for number serialization provided by
Ulf Adams,
Tanner Gooding
and Remy Oudompheng
was very helpful.
References
Normative References
Key words for use in RFCs to Indicate Requirement Levels
In many standards track documents several words are used to signify
the requirements in the specification. These words are often capitalized.
This document defines these words as they should be interpreted in IETF
documents. This document specifies an Internet Best Current Practices for
the Internet Community, and requests discussion and suggestions for improvements.
The JavaScript Object Notation (JSON) Data Interchange Format
JavaScript Object Notation (JSON) is a lightweight, text-based,
language-independent data interchange format. It was derived from the
ECMAScript Programming Language Standard. JSON defines a small set of
formatting rules for the portable representation of structured data.
This document removes inconsistencies with other specifications of
JSON, repairs specification errors, and offers experience-based
interoperability guidance.
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words
RFC 2119 specifies common key words that may be used in protocol
specifications. This document aims to reduce the ambiguity by
clarifying that only UPPERCASE usage of the key words have the
defined special meanings.
The I-JSON Message Format
I-JSON (short for "Internet JSON") is a restricted profile of
JSON designed to maximize interoperability and increase confidence
that software can process it successfully with predictable results.
ECMAScript 2015 Language Specification
Ecma International
IEEE Standard for Floating-Point Arithmetic
IEEE
The Unicode Standard, Version 12.1.0
The Unicode Consortium
Informative References
JSON Web Key (JWK) Thumbprint
This specification defines a method for computing a hash value
over a JSON Web Key (JWK). It defines which fields in a JWK are used
in the hash computation, the method of creating a canonical form for
those fields, and how to convert the resulting Unicode string into a
byte sequence to be hashed. The resulting hash value can be used for
identifying or selecting the key represented by the JWK that is the
subject of the thumbprint.
The Base16, Base32, and Base64 Data Encodings
This document describes the commonly used base 64, base 32, and base 16
encoding schemes. It also discusses the use of line-feeds in encoded data,
use of padding in encoded data, use of non-alphabet characters in encoded data,
use of different encoding alphabets, and canonical encodings. [STANDARDS-TRACK]
JSON Web Signature (JWS)
JSON Web Signature (JWS) represents content secured with digital
signatures or Message Authentication Codes (MACs) using JSON-based
data structures. Cryptographic algorithms and identifiers for use
with this specification are described in the separate
JSON Web Algorithms (JWA) specification and an IANA registry defined
by that specification. Related encryption capabilities are described
in the separate JSON Web Encryption (JWE) specification.
"Comparable" JSON - Work in progress
A. Rundgren
Chrome V8 Open Source JavaScript Engine
Google LLC
Ryu floating point number serializing algorithm
Ulf Adams
Node.js
Keybase
The OpenAPI Initiative
XML Signature Syntax and Processing Version 1.1
W3C
ES6 Sample Canonicalizer
Below is an example of a JCS canonicalizer for usage with ES6 based systems:
{
if (next) {
buffer += ',';
}
next = true;
/////////////////////////////////////////
// Array element - Recursive expansion //
/////////////////////////////////////////
serialize(element);
});
buffer += ']';
} else {
/////////////////////////////////////////////////
// Object - Sort properties before serializing //
/////////////////////////////////////////////////
buffer += '{';
let next = false;
Object.keys(object).sort().forEach((property) => {
if (next) {
buffer += ',';
}
next = true;
///////////////////////////////////////////////
// Property names are strings - Use ES6/JSON //
///////////////////////////////////////////////
buffer += JSON.stringify(property);
buffer += ':';
//////////////////////////////////////////
// Property value - Recursive expansion //
//////////////////////////////////////////
serialize(object[property]);
});
buffer += '}';
}
}
};]]>
Number Serialization Samples
The following table holds a set of ES6 compatible Number serialization samples,
including some edge cases. The column
"IEEE‑754" refers to the internal
ES6 representation of the Number data type which is based on the
IEEE-754 standard using 64-bit (double precision) values,
here expressed in hexadecimal.
Notes:
-
For maximum compliance with the ES6 JSON object,
values that are to be interpreted as true integers
SHOULD be in the range -9007199254740991 to 9007199254740991.
However, how numbers are used in applications do not affect the JCS algorithm.
-
Although a set of specific integers like 2**68 could be regarded as having
extended precision, the JCS/ES6 number serialization
algorithm does not take this in consideration.
-
Invalid. See .
-
This number is exactly 1424953923781206.25 but will after the "Note 2" rule
mentioned in be truncated and
rounded to the closest even value.
Canonicalized JSON as "Wire Format"
Since the result from the canonicalization process (see ),
is fully valid JSON, it can also be used as "Wire Format".
However, this is just an option since cryptographic schemes
based on JCS, in most cases would not depend on that externally
supplied JSON data already is canonicalized.
In fact, the ES6 standard way of serializing objects using
JSON.stringify() produces a
more "logical" format, where properties are
kept in the order they were created or received. The
example below shows an address record which could benefit from
ES6 standard serialization:
Using canonicalization the properties above would be output in the order
"address", "city", "name", "state" and "zip", which adds fuzziness
to the data from a human (developer or technical support), perspective.
Canonicalization also converts JSON data into a single line of text, which may
be less than ideal for debugging and logging.
Dealing with Big Numbers
There are several issues associated with the
JSON Number type, here illustrated by the following
sample object:
Although the sample above conforms to JSON ,
applications would normally use different native data types for storing
"giantNumber" and "int64Max". In addition, monetary data like "payMeThis" would
presumably not rely on floating point data types due to rounding issues with respect
to decimal arithmetic.
The established way handling this kind of "overloading" of the
JSON Number type (at least in an extensible manner), is through
mapping mechanisms, instructing parsers what to do with different properties
based on their name. However, this greatly limits the value of using the
JSON Number type outside of its original somewhat constrained, JavaScript context.
The ES6 JSON object does not support mappings to JSON Number either.
Due to the above, numbers that do not have a natural place in the current
JSON ecosystem MUST be wrapped using the JSON String type. This is close to
a de-facto standard for open systems. This is also applicable for
other data types that do not have direct support in JSON, like "DateTime"
objects as described in .
Aided by a system using the JSON String type; be it programmatic like
or declarative schemes like OpenAPI ,
JCS imposes no limits on applications, including when using ES6.
String Subtype Handling
Due to the limited set of data types featured in JSON,
the JSON String type is commonly used for holding subtypes.
This can depending on JSON parsing method lead to
interoperability problems which MUST be dealt with by
JCS compliant applications targeting a wider audience.
Assume you want to parse a JSON object where the schema
designer assigned the property "big" for holding a "BigInteger" subtype and
"time" for holding a "DateTime" subtype, while "val" is supposed to be a JSON Number
compliant with JCS. The following example shows such an object:
Parsing of this object can accomplished by the following ES6 statement:
After parsing the actual data can be extracted which for subtypes also involve a conversion
step using the result of the parsing process (an ECMAScript object) as input:
Canonicalization of "object" using the sample code in would return the
following string:
Although this is (with respect to JCS) technically correct, there is another way parsing JSON data
which also can be used with ECMAScript as shown below:
k == 'time' ? new Date(v) : k == 'big' ? BigInt(v) : v
);]]>
If you now apply the canonicalizer in to "object", the
following string would be generated:
In this case the string arguments for "big" and "time" have changed with respect to the original,
presumable making an application depending on JCS fail.
The reason for the deviation is that in stream and schema based JSON parsers,
the original "string" argument is typically replaced on-the-fly
by the native subtype which when serialized, may exhibit a different
and platform dependent pattern.
That is, stream and schema based parsing MUST treat subtypes as "pure" (immutable) JSON String types,
and perform the actual conversion to the designated native type in a subsequent step.
In modern programming platforms like Go, Java and C# this can be achieved with
moderate efforts by combining annotations, getters and setters.
Below is an example in C#/Json.NET showing a part of a class that is serializable
as a JSON Object:
In an application "Amount" can be accessed as any other property
while it is actually represented by a quoted string in JSON contexts.
Note: the example above also addresses the constraints on numeric data
implied by I-JSON (the C# "decimal" data type has quite different
characteristics compared to IEEE-754 double precision).
Subtypes in Arrays
Since the JSON Array construct permits mixing arbitrary JSON data types,
custom parsing and serialization code may be required
to cope with subtypes anyway.
Implementation Guidelines
The optimal solution is integrating support for JCS directly
in JSON serializers (parsers need no changes).
That is, canonicalization would just be an additional "mode"
for a JSON serializer. However, this is currently not the case.
Fortunately JCS support can be performed through externally supplied
canonicalizer software, enabling signature creation schemes like the following:
-
Create the data to be signed.
-
Serialize the data using existing JSON tools.
-
Let the external canonicalizer process the serialized data and return canonicalized result data.
-
Sign the canonicalized data.
-
Add the resulting signature value to the original JSON data through a designated signature property.
-
Serialize the completed (now signed) JSON object using existing JSON tools.
A compatible signature verification scheme would then be as follows:
-
Parse the signed JSON data using existing JSON tools.
-
Read and save the signature value from the designated signature property.
-
Remove the signature property from the parsed JSON object.
-
Serialize the remaining JSON data using existing JSON tools.
-
Let the external canonicalizer process the serialized data and return canonicalized result data.
-
Verify that the canonicalized data matches the saved signature value
using the algorithm and key used for creating the signature.
A canonicalizer like above is effectively only a "filter", potentially usable with
a multitude of quite different cryptographic schemes.
Using a JSON serializer with integrated JCS support, the serialization performed
before the canonicalization step could be eliminated for both processes.
Open Source Implementations
The following Open Source implementations have been verified to be
compatible with JCS:
-
JavaScript:
-
Java:
-
Go:
-
.NET/C#:
-
Python:
Other JSON Canonicalization Efforts
There are (and have been) other efforts creating "Canonical JSON".
Below is a list of URLs to some of them:
The listed efforts all build
on text level JSON to JSON transformations. The primary feature
of text level canonicalization is that it can be made neutral to
the flavor of JSON used. However, such schemes also
imply major changes to the JSON parsing process which is a likely
hurdle for adoption. Albeit at the expense of certain JSON and
application constraints,
JCS was designed to be compatible with existing JSON tools.
Development Portal
The JCS specification is currently developed at:
.
The most recent "editors' copy" can be found at:
.
JCS source code and extensive test data is available at:
Document History
[[ to be removed by the RFC Editor before publication as an RFC ]]
Version 00-06:
Version 07:
-
Initial converson to XML RFC version 3.
-
Changed intended status to "Informational".
-
Added UTF-16 test data and explanations.
Version 08:
-
Updated Abstract.
-
Added a "Note 2" number serialization sample.
-
Updated Security Considerations.
-
Tried to clear up the JSON input data section.
-
Added a line about Unicode normalization.
-
Added a line about serialiation of structured data.
-
Added a missing fact about "BigInt" (V8 not ES6).
Version 09:
-
Updated initial line of Abstract and Introduction.
-
Added note about breaking ECMAScript changes.
-
Minor language nit fixes.
Version 10:
Version 11: