Network Working Group P. Hallam-Baker Internet-Draft Comodo Group Inc. Intended status: Informational April 11, 2018 Expires: October 13, 2018 Uniform Data Fingerprint (UDF) draft-hallambaker-udf-10 Abstract This document describes means of generating Uniform Data Fingerprint (UDF) values and their presentation as text sequences and as URIs. Uses of UDF fingerprints include but are not limited to creating Strong Internet Names (SINs). Cryptographic digests provide a means of uniquely identifying static data without the need for a registration authority. A fingerprint is a form of presenting a cryptographic digest that makes it suitable for use in applications where human readability is required. The UDF fingerprint format improves over existing formats through the introduction of a compact algorithm identifier affording an intentionally limited choice of digest algorithm and the inclusion of an IANA registered MIME Content-Type identifier within the scope of the digest input to allow the use of a single fingerprint format in multiple application domains. Alternative means of rendering fingerprint values are considered including machine-readable codes, word and image lists. This document is also available online at http://mathmesh.com/Documents/draft-hallambaker-udf.html [1] . Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Hallam-Baker Expires October 13, 2018 [Page 1] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 This Internet-Draft will expire on October 13, 2018. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Algorithm Identifier . . . . . . . . . . . . . . . . . . 4 1.2. Content Type Identifier . . . . . . . . . . . . . . . . . 4 1.3. Representation . . . . . . . . . . . . . . . . . . . . . 5 1.4. Truncation . . . . . . . . . . . . . . . . . . . . . . . 5 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 2.2. Defined Terms . . . . . . . . . . . . . . . . . . . . . . 6 2.3. Related Specifications . . . . . . . . . . . . . . . . . 7 2.4. Implementation Status . . . . . . . . . . . . . . . . . . 7 3. Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1. Binary Fingerprint Value . . . . . . . . . . . . . . . . 7 3.1.1. Version ID . . . . . . . . . . . . . . . . . . . . . 8 3.2. Truncation . . . . . . . . . . . . . . . . . . . . . . . 8 3.3. Base32 Representation . . . . . . . . . . . . . . . . . . 8 3.4. Example Encoding . . . . . . . . . . . . . . . . . . . . 9 3.4.1. Using SHA-2-512 Digest . . . . . . . . . . . . . . . 9 3.4.2. Using SHA-3-512 Digest . . . . . . . . . . . . . . . 10 3.5. Fingerprint Improvement . . . . . . . . . . . . . . . . . 10 3.6. Compressed Presentation . . . . . . . . . . . . . . . . . 11 3.7. Example of Compressed Encoding. . . . . . . . . . . . . . 11 3.7.1. Example . . . . . . . . . . . . . . . . . . . . . . . 12 4. Content Types . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1. PKIX Certificates and Keys . . . . . . . . . . . . . . . 12 4.2. OpenPGP Key . . . . . . . . . . . . . . . . . . . . . . . 13 4.3. DNSSEC . . . . . . . . . . . . . . . . . . . . . . . . . 13 5. Additional UDF Renderings . . . . . . . . . . . . . . . . . . 13 5.1. Machine Readable Rendering . . . . . . . . . . . . . . . 13 5.2. Word Lists . . . . . . . . . . . . . . . . . . . . . . . 13 Hallam-Baker Expires October 13, 2018 [Page 2] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 5.3. Image List . . . . . . . . . . . . . . . . . . . . . . . 14 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 6.1. Work Factor and Precision . . . . . . . . . . . . . . . . 14 6.2. Semantic Substitution . . . . . . . . . . . . . . . . . . 15 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 7.1. URI Registration . . . . . . . . . . . . . . . . . . . . 16 7.2. Content Type Registration . . . . . . . . . . . . . . . . 16 7.3. Version Registry . . . . . . . . . . . . . . . . . . . . 16 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 8.1. Normative References . . . . . . . . . . . . . . . . . . 16 8.2. Informative References . . . . . . . . . . . . . . . . . 16 8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 17 1. Introduction The use of cryptographic digest functions to produce identifiers is well established as a means of generating a unique identifier for fixed data without the need for a registration authority. While the use of fingerprints of public keys was popularized by PGP, they are employed in many other applications including OpenPGP, SSH, BitCoin and PKIX. A cryptographic digest is a particular form of hash function that has the properties: o It is easy to compute the digest value for any given message o It is infeasible to generate a message from its digest value o It is infeasible to modify a message without changing the digest value o It is infeasible to find two different messages with the same digest value. If these properties are met, the only way that two data objects that map to the same digest value is by random chance. If the number of possible digest values is sufficiently large (i.e. is a sufficiently large number of bits in length), this chance is reduced to an arbitrarily infinitesimal probability. Such values are described as being probabilistically unique. A fingerprint is a representation of a cryptographic digest value optimized for purposes of verification and in some cases data entry. Hallam-Baker Expires October 13, 2018 [Page 3] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 1.1. Algorithm Identifier Although a secure cryptographic digest algorithm has properties that make it ideal for certain types of identifier use, several cryptographic digest algorithms have found widespread use, some of which have been demonstrated to be insecure. For example the MD5 message digest algorithm [RFC1321] , was widely used in IETF protocols until it was demonstrated to be vulnerable to collision attacks [Dobertin95] . The secure use of a fingerprint scheme therefore requires the digest algorithm to either be fixed or otherwise determined by the fingerprint value itself. Otherwise an attacker may be able to use a weak, broken digest algorithm to generate a data object matching a fingerprint value generated using a strong digest algorithm. The two digest algorithms currently used in the UDF scheme are both believed to be strong. These are SHA-2-512 [SHA-2] and SHA-3-512 [SHA-3] . The most secure, 512 bit version of the algorithm is used in both cases although the output is almost invariably truncated to a shorter length. Use of the strongest version of the algorithm in every circumstance eliminates the need to negotiate the algorithm strength. 1.2. Content Type Identifier A secure cryptographic digest algorithm provides a unique digest value that is probabilistically unique for a particular byte sequence but does not fix the context in which a byte sequence is interpreted. While such ambiguity may be tolerated in a fingerprint format designed for a single specific field of use, it is not acceptable in a general purpose format. For example, the SSH and OpenPGP applications both make use of fingerprints as identifiers for the public keys used but using different digest algorithms and data formats for representing the public key data. While no such vulnerability has been demonstrated to date, it is certainly conceivable that a crafty attacker might construct an SSH key in such a fashion that OpenPGP interprets the data in an insecure fashion. If the number of applications making use of fingerprint format that permits such substitutions is sufficiently large, the probability of a semantic substitution vulnerability being possible becomes unacceptably large. A simple control that defeats such attacks is to incorporate a content type identifier within the scope of the data input to the hash function. Hallam-Baker Expires October 13, 2018 [Page 4] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 1.3. Representation The representation of a fingerprint is the format in which it is presented to either an application or the user. Base32 encoding is used to produce the preferred text representation of a UDF fingerprint. This encoding uses only the letters of the Latin alphabet with numbers chosen to minimize the risk of ambiguity between numbers and letters (2, 3, 4, 5, 6 and 7). To enhance readability and improve data entry, characters are grouped into groups of five. 1.4. Truncation Different applications of fingerprints demand different tradeoffs between compactness of the representation and the number of significant bits. A larger the number of significant bits reduces the risk of collision but at a cost to convenience. Modern cryptographic digest functions such as SHA-2 produce output values of at least 256 bits in length. This is considerably larger than most uses of fingerprints require and certainly greater than can be represented in human readable form on a business card. Since a strong cryptographic digest function produces an output value in which every bit in the input value affects every bit in the output value with equal probability, it follows that truncating the digest value to produce a finger print is at least as strong as any other mechanism if digest algorithm used is strong. Using truncation to reduce the precision of the digest function has the advantage that a lower precision fingerprint of some data content is always a prefix of a higher prefix of the same content. This allows higher precision fingerprints to be converted to a lower precision without the need for special tools. 2. Definitions This section presents the related specifications and standard, the terms that are used as terms of art within the documents and the terms used as requirements language. 2.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Hallam-Baker Expires October 13, 2018 [Page 5] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 2.2. Defined Terms Cryptographic Digest Function A hash function that has the properties required for use as a cryptographic hash function. These include collision resistance, first pre-image resistance and second pre-image resistance. Content Type An identifier indicating how a Data Value is to be interpreted as specified in the IANA registry Media Types. Data Value The binary octet stream that is the input to the digest function used to calculate a digest value. Data Object A Data Value and its associated Content Type Digest Algorithm A synonym for Cryptographic Digest Function Digest Value The output of a Cryptographic Digest Function Data Digest Value The output of a Cryptographic Digest Function for a given Data Value input. Fingerprint A presentation of the digest value of a data value or data object. Fingerprint Presentation The representation of at least some part of a fingerprint value in human or machine readable form. Fingerprint Improvement The practice of recording a higher precision presentation of a fingerprint on successful validation. Fingerprint Work Hardening The practice of generating a sequence of fingerprints until one is found that matches criteria that permit a compressed presentation form to be used. The compressed fingerprint thus being shorter than but presenting the same work factor as an uncompressed one. Hash A function which takes an input and returns a fixed-size output. Ideally, the output of a hash function is unbiased and not correlated to the outputs returned to similar inputs in any predictable fashion. Precision The number of significant bits provided by a Fingerprint Presentation. Work Factor A measure of the computational effort required to perform an attack against some security property. Hallam-Baker Expires October 13, 2018 [Page 6] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 2.3. Related Specifications This specification makes use of Base32 [RFC4648] encoding, SHA-2 [SHA-2] and SHA-3 [SHA-3] digest functions. UDFs are used in the definition of Strong Internet Names [hallambaker-sin] . 2.4. Implementation Status The implementation status of the reference code base is described in the companion document [draft-hallambaker-mesh-developer] . 3. Encoding A UDF fingerprint for a given data object is generated by calculating the Binary Fingerprint Value for the given data object and type identifier, truncating it to obtain the desired degree of precision and then converting the truncated value to a representation. 3.1. Binary Fingerprint Value The binary encoding of a fingerprint is calculated using the formula: Fingerprint = & + H (& + ?:? + H(&)) Figure 1 Where H(x) is the cryptographic digest function & is the fingerprint version and algorithm identifier. & is the MIME Content-Type of the data. & is the binary data. Figure 2 The use of the nested hash function permits a fingerprint to be taken of data for which a digest value is already known without the need to calculate a new digest over the data. The inclusion of a MIME content type prevents message substitution attacks in which one content type is substituted for another. Hallam-Baker Expires October 13, 2018 [Page 7] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 3.1.1. Version ID A Version Identifier consists of a single byte. The following digest algorithm identifiers are specified in this document: +-----------------+------------------------+-----------------+ | Version ID | Algorithm | Reference | +-----------------+------------------------+-----------------+ | 96 | SHA-2-512 | | | 97, 98, 99, 100 | SHA-2-512 (compressed) | | | 144 | SHA-3-512 | | +-----------------+------------------------+-----------------+ Table 1 These algorithm identifiers have been chosen so that the first character in a SHA-2-512 fingerprint will always be ?M? and the first character in a SHA-3-512 fingerprint will always be ?S?. These provide mnemonics for ?Merkle-Damgard? and ?Sponge? respectively. 3.2. Truncation The Binary Fingerprint Value is truncated to an integer multiple of 25 bits regardless of the intended output presentation. The output of the hash function is truncated to a sequence of n bits by first selecting the first n/8 bytes of the output function. If n is an integer multiple of 8, no additional bits are required and this is the result. Otherwise the remaining bits are taken from the most significant bits of the next byte and any unused bits set to 0. For example, to truncate the byte sequence [a0, b1, c2, d3, e4] to 25 bits. 25/8 = 3 bytes with 1 bit remaining, the first three bytes of the truncated sequence is [a0, b1, c2] and the final byte is e4 AND 80 = 80 which we add to the previous result to obtain the final truncated sequence of [a0, b1, c2, 80] 3.3. Base32 Representation A modified version of Base32 [RFC4648] encoding is used to present the fingerprint in text form grouping the output text into groups of five characters separated by a dash ?-?. This representation improves the accuracy of both data entry and verification. Hallam-Baker Expires October 13, 2018 [Page 8] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 3.4. Example Encoding In the following examples, is the UTF8 encoding of the string "text/plain" and is the UTF8 encoding of the string "UDF Data Value" Data = 55 44 46 20 44 61 74 61 20 56 61 6C 75 65 ContentType = 74 65 78 74 2F 70 6C 61 69 6E Figure 3 3.4.1. Using SHA-2-512 Digest H( ) = 48 DA 47 CC AB FE A4 5C 76 61 D3 21 BA 34 3E 58 10 87 2A 03 B4 02 9D AB 84 7C CE D2 22 B6 9C AB 02 38 D4 E9 1E 2F 6B 36 A0 9E ED 11 09 8A EA AC 99 D9 E0 BD EA 47 93 15 BD 7A E9 E1 2E AD C4 15 H ( + ':' + H())= 74 65 78 74 2F 70 6C 61 69 6E 3A 48 DA 47 CC AB FE A4 5C 76 61 D3 21 BA 34 3E 58 10 87 2A 03 B4 02 9D AB 84 7C CE D2 22 B6 9C AB 02 38 D4 E9 1E 2F 6B 36 A0 9E ED 11 09 8A EA AC 99 D9 E0 BD EA 47 93 15 BD 7A E9 E1 2E AD C4 15 H ( + ':' + H())= C6 AF B7 C0 FE BE 04 E5 AE 94 E3 7B AA 5F 1A 40 5B A3 CE CC 97 4D 55 C0 9E 61 E4 B0 EF 9C AE F9 EB 83 BB 9D 5F 0F 39 F6 5F AA 06 DC 67 2A 67 71 4F FF 8F 83 C4 55 38 36 38 AE 42 7A 82 9C 85 BB Figure 4 Text Presentation (100 bit) MDDK7-N6A72-7AJZN-OSTR3 Text Presentation (125 bit) MDDK7-N6A72-7AJZN-OSTRX-XKS72 Text Presentation (150 bit) MDDK7-N6A72-7AJZN-OSTRX-XKS7D-JAFXD Text Presentation (250 bit) MDDK7-N6A72-7AJZN-OSTRX-XKS7D-JAFXI- 6OZSL-U2VOA-TZQ6J-MHPTS Hallam-Baker Expires October 13, 2018 [Page 9] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 3.4.2. Using SHA-3-512 Digest H( ) = 6D 2E CF E6 93 5A 0C FC F2 A9 1A 49 E0 0C D8 07 A1 4E 70 AB 72 94 6E CC BB 47 48 F1 8E 41 49 95 07 1D F3 6E 0D 0C 8B 60 39 C1 8E B4 0F 6E C8 08 65 B4 C4 45 9B A2 7E 97 74 7B BE 68 BC A8 C2 17 H ( + ':' + H())= 74 65 78 74 2F 70 6C 61 69 6E 3A 6D 2E CF E6 93 5A 0C FC F2 A9 1A 49 E0 0C D8 07 A1 4E 70 AB 72 94 6E CC BB 47 48 F1 8E 41 49 95 07 1D F3 6E 0D 0C 8B 60 39 C1 8E B4 0F 6E C8 08 65 B4 C4 45 9B A2 7E 97 74 7B BE 68 BC A8 C2 17 H ( + ':' + H())= 58 9B 76 70 35 B4 55 E5 41 4C 29 4D 73 C1 FD 48 F9 9A D6 29 35 A3 14 9A 32 6C EA 9E 7D 7A 8C 3F 26 B0 0F 15 84 CB BE 6F 35 C6 37 48 AF 5C F1 02 31 79 50 B1 A1 4F 97 50 97 49 5E DA A2 A0 A9 B5 Figure 5 Text Presentation (100 bit) SCFIN-CQGDR-KG47R-7OVPZ Text Presentation (125 bit) SCFIN-CQGDR-KG47R-7OVPT-TCHZ5 Text Presentation (150 bit) SCFIN-CQGDR-KG47R-7OVPT-TCHZ7-UXY4I Text Presentation (250 bit) SCFIN-CQGDR-KG47R-7OVPT-TCHZ7-UXY5S- CFSMN-YBKBP-FELHX-I56EH 3.5. Fingerprint Improvement Since an application must always calculate the full fingerprint value as part of the verification process, an application MAY accept a low precision (e.g. 100 bit) fingerprint value from the user and replace it with a higher precision fingerprint (e.g. 250 bits) after verification. Applications are encouraged to make use of the practice of fingerprint improvement wherever possible. Hallam-Baker Expires October 13, 2018 [Page 10] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 3.6. Compressed Presentation Fingerprint compression permits the use of shorter fingerprint presentation without a reduction in the attacker work factor by requiring the fingerprint value to match a particular pattern. UDF fingerprints MUST use compression if possible. A compressed fingerprint uses a version identifier that specifies the form of compression used as follows: +------------+-------------------------+ | Version ID | Compression | +------------+-------------------------+ | 96 | None | | 97 | First 25 bits are zeros | | 98 | First 35 bits are zeros | | 99 | First 40 bits are zeros | | 100 | First 45 bits are zeros | | 101 | First 50 bits are zeros | +------------+-------------------------+ Table 2 Support for compression may introduce perverse incentives such as performing key generation on machines that less secure but offer fast (or cheap) processing power. An attacker might even offer to generate public key pairs for free using their 'ultra fast' machine. For this reason, it is probably desirable to at least support if not mandate the use of some sort of salting scheme when compression is in use. This allows the key to be generated in secure, trusted hardware and only the discovery of a salt providing the desired compression being performed on less trusted or untrusted devices. Currently, 25 bit compression may be achieved on commodity machines with minimal impact on key generation if salting is used. Use of 35 bit compression has a noticeable impact, but can still be achieved within hours without the use of special purpose hardware (e.g. use of a GPU unit). Use of 40 bit compression is feasible with a GPU and use of 50 bit compression which would allow a fingerprint to be shortened by ten significant characters is on the outer edge of practicality. While support for even higher levels of compression is conceivable, it is probably not very sensible. 3.7. Example of Compressed Encoding. Hallam-Baker Expires October 13, 2018 [Page 11] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 3.7.1. Example The string "290668103" has a SHA-2-512 UDF fingerprint with 29 leading zero bits. The inputs to the fingerprint are: Data = 32 39 30 36 36 38 31 30 33 ContentType = 74 65 78 74 2F 70 6C 61 69 6E Figure 6 The 100 bit UDF fingerprint is: MF3VV-FOFE2-CLRW (Maybe) NB: The above is not generated from code and might well be incorrect. 4. Content Types While a UDF fingerprint MAY be used to identify any form of static data, the use of a UDF fingerprint to identify a public key signature key provides a level of indirection and thus the ability to identify dynamic data. The content types used to identify public keys are thus of particular interest. As described in the security considerations section, the use of fingerprints to identify a bare public key and the use of fingerprints to identify a public key and associated security policy information are very different. 4.1. PKIX Certificates and Keys UDF fingerprints MAY be used to identify PKIX certificates, CRLs and public keys in the ASN.1 encoding used in PKIX certificates. Since PKIX certificates and CLRs contain security policy information, UDF fingerprints used to identify certificates or CRLs SHOULD be presented with a minimum of 200 bits of precision. PKIX applications MUST not accept UDF fingerprints specified with less than 200 bits of precision for purposes of identifying trust anchors. PKIX certificates, keys and related content data are identified by the following content types: application/pkix-cert A PKIX Certificate Hallam-Baker Expires October 13, 2018 [Page 12] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 application/pkix-crl A PKIX CRL application/pkix-keyinfo The KeyInfo structure defined in the PKIX certificate specification 4.2. OpenPGP Key OpenPGPv5 keys and key set content data are identified by the following content types: application/pgp-key-v5 An OpenPGP key application/pgp-keys An OpenPGP key set. 4.3. DNSSEC DNSSEC record data consists of DNS records which are identified by the following content type: application/dns A DNS resource record in binary format 5. Additional UDF Renderings By default, a UDF fingerprint is rendered in the Base32 encoding described in this document. Additional renderings MAY be employed to facilitate entry and/or verification of fingerprint values. 5.1. Machine Readable Rendering The use of a machine-readable rendering such as a QR Code allows a UDF value to be input directly using a smartphone or other device equipped with a camera. A QR code fixed to a network capable device might contain the fingerprint of a machine readable description of the device. 5.2. Word Lists The use of a Word List to encode fingerprint values was introduced by Patrick Juola and Philip Zimmerman for the PGPfone application. The PGP Word List is designed to facilitate exchange and verification of fingerprint values in a voice application. To minimize the risk of misinterpretation, two word lists of 256 values each are used to encode alternative fingerprint bytes. The compact size of the lists used allowed the compilers to curate them so as to maximize the phonetic distance of the words selected. Hallam-Baker Expires October 13, 2018 [Page 13] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 The PGP Word List is designed to achieve a balance between ease of entry and verification. Applications where only verification is required may be better served by a much larger word list, permitting shorter fingerprint encodings. For example, a word list with 16384 entries permits 14 bits of the fingerprint to be encoded at once, 65536 entries permits 16. These encodings allow a 125 bit fingerprint to be encoded in 9 and 8 words respectively. 5.3. Image List An image list is used in the same manner as a word list affording rapid visual verification of a fingerprint value. For obvious reasons, this approach is not generally suited to data entry. 6. Security Considerations 6.1. Work Factor and Precision A given UDF data object has a single fingerprint value that may be presented at different precisions. The shortest legitimate precision with which a UDF fingerprint may be presented has 96 significant bits A UDF fingerprint presents the same work factor as any other cryptographic digest function. The difficulty of finding a second data item that matches a given fingerprint is 2^n and the difficulty or finding two data items that have the same fingerprint is 2^(n/2). Where n is the precision of the fingerprint. For the algorithms specified in this document, n = 512 and thus the work factor for finding collisions is 2^256, a value that is generally considered to be computationally infeasible. Since the use of 512 bit fingerprints is impractical in the type of applications where fingerprints are generally used, truncation is a practical necessity. The longer a fingerprint is, the less likely it is that a user will check every character. It is therefore important to consider carefully whether the security of an application depends on second pre-image resistance or collision resistance. In most fingerprint applications, such as the use of fingerprints to identify public keys, the fact that a malicious party might generate two keys that have the same fingerprint value is a minor concern. Combined with a flawed protocol architecture, such a vulnerability may permit an attacker to construct a document such that the signature will be accepted as valid by some parties but not by others. Hallam-Baker Expires October 13, 2018 [Page 14] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 For example, Alice generates keypairs until two are generated that have the same 100 bit UDF presentation (typically 2^48 attempts). She registers one keypair with a merchant and the other with her bank. This allows Alice to create a payment instrument that will be accepted as valid by one and rejected by the other. The ability to generate of two PKIX certificates with the same fingerprint and different certificate attributes raises very different and more serious security concerns. For example, an attacker might generate two certificates with the same key and different use constraints. This might allow an attacker to present a highly constrained certificate that does not present a security risk to an application for purposes of gaining approval and an unconstrained certificate to request a malicious action. In general, any use of fingerprints to identify data that has security policy semantics requires the risk of collision attacks to be considered. For this reason the use of short, ?user friendly? fingerprint presentations (Less than 200 bits) SHOULD only be used for public key values. 6.2. Semantic Substitution Many applications record the fact that a data item is trusted, rather fewer record the circumstances in which the data item is trusted. This results in a semantic substitution vulnerability which an attacker may exploit by presenting the trusted data item in the wrong context. The UDF format provides protection against high level semantic substitution attacks by incorporating the content type into the input to the outermost fingerprint digest function. The work factor for generating a UDF fingerprint that is valid in both contexts is thus the same as the work factor for finding a second preimage in the digest function (2^512 for the specified digest algorithms). It is thus infeasible to generate a data item such that some applications will interpret it as a PKIX key and others will accept as an OpenPGP key. While attempting to parse a PKIX key as an OpenPGP key is virtually certain to fail to return the correct key parameters it cannot be assumed that the attempt is guaranteed to fail with an error message. The UDF format does not provide protection against semantic substitution attacks that do not affect the content type. Hallam-Baker Expires October 13, 2018 [Page 15] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 7. IANA Considerations [This will be extended later] 7.1. URI Registration [Here a URI registration for the udf: scheme] 7.2. Content Type Registration [application/pkix-keyinfo] [application/pgp-key] 7.3. Version Registry 96 = SHA-2-512 97 = SHA-2-512 with 25 leading zeros 98 = SHA-2-512 with 40 leading zeros 99 = SHA-2-512 with 50 leading zeros 100 = SHA-2-512 with 55 leading zeros 144 = SHA-3-512 Figure 7 8. References 8.1. Normative References [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006. [SHA-2] NIST, "Secure Hash Standard", August 2015. [SHA-3] Dworkin, M., "SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions", August 2015. 8.2. Informative References [Dobertin95] Eurocrypt 1996, "Cryptanalysis of MD5 Compress". [draft-hallambaker-mesh-developer] Hallam-Baker, P., "Mathematical Mesh: Reference Implementation", draft-hallambaker-mesh-developer-06 (work in progress), April 2018. Hallam-Baker Expires October 13, 2018 [Page 16] Internet-Draft Uniform Data Fingerprint (UDF) April 2018 [hallambaker-sin] "[Reference Not Found!]". [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, DOI 10.17487/RFC1321, April 1992. 8.3. URIs [1] http://mathmesh.com/Documents/draft-hallambaker-udf.html Author's Address Phillip Hallam-Baker Comodo Group Inc. Email: philliph@comodo.com Hallam-Baker Expires October 13, 2018 [Page 17]