Network Working Group P. Saint-Andre
Internet-Draft &yet
Intended status: Informational March 31, 2014
Expires: October 2, 2014
An Interoperable Subset of Characters for Internationalized Usernames
draft-saintandre-username-interop-03
Abstract
Various Internet protocols define constructs for usernames, i.e., the
localpart of an address such as "localpart@example.com". This
document describes a subset of Unicode characters to allow in
internationalized usernames for the sake of maximal interoperability
across Internet protocols.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 2, 2014.
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Saint-Andre Expires October 2, 2014 [Page 1]
Internet-Draft Username Interoperability March 2014
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2
3. Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
5. Security Considerations . . . . . . . . . . . . . . . . . . . 5
6. References . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.1. Normative References . . . . . . . . . . . . . . . . . . 6
6.2. Informative References . . . . . . . . . . . . . . . . . 6
Appendix A. Analysis . . . . . . . . . . . . . . . . . . . . . . 7
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 12
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction
Various Internet protocols define constructs for usernames, i.e., the
localpart of an address such as "localpart@example.com". As further
described under Appendix A), examples include the localparts of email
addresses, Kerberos Principal Names, Network Access Identifiers, SIP
URIs, instant messaging URIs and presence URIs, XMPP addresses, and
account URIs, as well as certain forms of SASL simple user names (see
[I-D.ietf-precis-saslprepbis]). This document describes a subset of
Unicode characters [UNICODE] to allow in internationalized usernames
for the sake of maximal interoperability across Internet protocols.
This subset might prove useful in cases where a provider offers
multiple services (say, email and instant messaging) using the same
underlying identifier, or where the same identifier (e.g., an account
URI) is used when interacting with multiple providers.
2. Terminology
Many important terms used in this document are defined in
[I-D.ietf-precis-framework], [RFC6365], and [UNICODE].
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC2119].
3. Subset
The interoperable subset of characters provided here is defined as a
profile of the PRECIS IdentifierClass specified in
[I-D.ietf-precis-framework]. In essence, the IdentifierClass
restricts the allowable characters to letters and digits from all the
scripts of Unicode [UNICODE] while grandfathering all the characters
from the ASCII range [RFC20]. The profile defined here,
Saint-Andre Expires October 2, 2014 [Page 2]
Internet-Draft Username Interoperability March 2014
"LocalpartIdentifierClass", further restricts the characters from the
ASCII range to those known to work across existing application
protocols (as described under Appendix A).
The syntax is defined as follows using the Augmented Backus-Naur Form
(ABNF) as specified in [RFC5234].
localpart = 1*1023(localpoint)
;
; a "localpoint" is a UTF-8 encoded Unicode code point
; that conforms to the "LocalpartIdentifierClass"
; profile of the PRECIS IdentifierClass
A "localpart" MUST consist only of Unicode code points that conform
to the "LocalpartIdentifierClass" profile of the "IdentifierClass"
base string class defined in [I-D.ietf-precis-framework]. The
LocalpartIdentifierClass profile includes all code points allowed by
the IdentifierClass base class, with the exception of the following
characters, which are disallowed (again, see Appendix A for the
reasoning behind these restrictions):
U+0022 (QUOTATION MARK), i.e., '"'
U+0023 (NUMBER SIGN), i.e., '#'
U+0025 (PERCENT SIGN), i.e., '%'
U+0026 (AMPERSAND), i.e., '&'
U+0027 (APOSTROPHE), i.e., "'"
U+0028 (LEFT PARENTHESIS), i.e., '('
U+0029 (RIGHT PARENTHESIS), i.e., ')'
U+002C (COMMA), i.e., ','
U+002E (FULL STOP), i.e., '.'
U+002F (SOLIDUS), i.e., '/'
U+003A (COLON), i.e., ':'
U+003B (SEMICOLON), i.e., ';'
U+003C (LESS-THAN SIGN), i.e., '<'
U+003E (GREATER-THAN SIGN), i.e., '>'
Saint-Andre Expires October 2, 2014 [Page 3]
Internet-Draft Username Interoperability March 2014
U+003F (QUESTION MARK), i.e., '?'
U+0040 (COMMERCIAL AT), i.e., '@'
U+005B (LEFT SQUARE BRACKET), i.e., '['
U+005C (REVERSE SOLIDUS), i.e., '\'
U+005D (RIGHT SQUARE BRACKET), i.e., ']'
U+005E (CIRCUMFLEX ACCENT), i.e., '^'
U+0060 (GRAVE ACCENT), i.e., '`'
U+007B (LEFT CURLY BRACKET), i.e., '{'
U+007C (VERTICAL), i.e., '|'
U+007D (RIGHT CURLY BRACKET), i.e., '}'
The normalization and mapping rules for the LocalpartIdentifierClass
are as follows, where the operations specified MUST be completed in
the order shown:
1. Fullwidth and halfwidth characters MUST be mapped to their
decomposition mappings.
2. So-called additional mappings MAY be applied, such as mapping of
characters that are similar to common delimiters (such as '@',
':', '/', '+', '-', and '.', e.g., mapping of IDEOGRAPHIC FULL
STOP (U+3002) to FULL STOP (U+002E)) and special handling of
certain characters or classes of characters (e.g., mapping of
non-ASCII spaces to ASCII space); the PRECIS mappings document
[I-D.ietf-precis-mappings] describes such mappings in more
detail.
3. Uppercase and titlecase characters MUST be mapped to their
lowercase equivalents.
4. All characters MUST be mapped using Unicode Normalization Form C
(NFC).
With regard to directionality, applications MUST apply the "Bidi
Rule" defined in [RFC5893] (i.e., each of the six conditions of the
Bidi Rule must be satisfied).
Saint-Andre Expires October 2, 2014 [Page 4]
Internet-Draft Username Interoperability March 2014
A localpart MUST NOT be zero octets in length and MUST NOT be more
than 1023 octets in length. This rule is to be enforced after any
normalization and mapping of code points.
4. IANA Considerations
The IANA shall add the following entry to the PRECIS Profiles
Registry:
Name: LocalpartIdentifierClass.
Applicability: Usernames that are intended to be interoperable
across multiple application protocols.
Base Class: IdentifierClass.
Replaces: None.
Width Mapping: Map fullwidth and halfwidth characters to their
decomposition mappings.
Additional Mappings: None required or recommended.
Case Mapping: Map uppercase and titlecase characters to lowercase.
Normalization: NFC.
Directionality: The "Bidi Rule" defined in RFC 5893 applies.
Exclusions: 24 non-alphanumeric characters in the ASCII range.
Enforcement: Up to the application protocol or deployment.
Specification: this document. [Note to RFC Editor: please change
"this document" to the RFC number issued for this specification.]
5. Security Considerations
Deploying usernames that are interoperable across multiple protocols
could potentially give malicious entities multiple ways to attack an
account or user.
The security considerations described in [I-D.ietf-precis-framework]
apply to the "IdentifierClass" base string class used in this
document.
The security considerations described in [UTS39] apply to the use of
Unicode characters.
Saint-Andre Expires October 2, 2014 [Page 5]
Internet-Draft Username Interoperability March 2014
6. References
6.1. Normative References
[I-D.ietf-precis-framework]
Saint-Andre, P. and M. Blanchet, "Precis Framework:
Handling Internationalized Strings in Protocols", draft-
ietf-precis-framework-15 (work in progress), March 2014.
[RFC20] Cerf, V., "ASCII format for network interchange", RFC 20,
October 1969.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234, January 2008.
[RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
Internationalized Domain Names for Applications (IDNA)",
RFC 5893, August 2010.
[UNICODE] The Unicode Consortium, "The Unicode Standard, Version
6.3", 2013,
.
6.2. Informative References
[I-D.ietf-appsawg-acct-uri]
Saint-Andre, P., "The 'acct' URI Scheme", draft-ietf-
appsawg-acct-uri-07 (work in progress), January 2014.
[I-D.ietf-precis-mappings]
Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS
classes", draft-ietf-precis-mappings-07 (work in
progress), February 2014.
[I-D.ietf-precis-saslprepbis]
Saint-Andre, P. and A. Melnikov, "Preparation and
Comparison of Internationalized Strings Representing
Usernames and Passwords", draft-ietf-precis-saslprepbis-07
(work in progress), March 2014.
[RFC821] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC
821, August 1982.
[RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April
2001.
Saint-Andre Expires October 2, 2014 [Page 6]
Internet-Draft Username Interoperability March 2014
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
A., Peterson, J., Sparks, R., Handley, M., and E.
Schooler, "SIP: Session Initiation Protocol", RFC 3261,
June 2002.
[RFC3856] Rosenberg, J., "A Presence Event Package for the Session
Initiation Protocol (SIP)", RFC 3856, August 2004.
[RFC3860] Peterson, J., "Common Profile for Instant Messaging
(CPIM)", RFC 3860, August 2004.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66, RFC
3986, January 2005.
[RFC4120] Neuman, C., Yu, T., Hartman, S., and K. Raeburn, "The
Kerberos Network Authentication Service (V5)", RFC 4120,
July 2005.
[RFC4282] Aboba, B., Beadles, M., Arkko, J., and P. Eronen, "The
Network Access Identifier", RFC 4282, December 2005.
[RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322,
October 2008.
[RFC6120] Saint-Andre, P., "Extensible Messaging and Presence
Protocol (XMPP): Core", RFC 6120, March 2011.
[RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in
Internationalization in the IETF", BCP 166, RFC 6365,
September 2011.
[UTS39] The Unicode Consortium, "Unicode Technical Standard #39:
Unicode Security Mechanisms", July 2012,
.
Appendix A. Analysis
This document takes the following username constructs into
consideration:
o Email addresses [RFC5322]
o Kerberos Principal Names [RFC4120]
o Network Access Identifiers [RFC4282]
o SIP URIs [RFC3261]
Saint-Andre Expires October 2, 2014 [Page 7]
Internet-Draft Username Interoperability March 2014
o Instant messaging URIs [RFC3860] and presence URIs [RFC3856]
o XMPP addresses (a.k.a. Jabber Identifiers) [RFC6120]
o Account URIs [I-D.ietf-appsawg-acct-uri]
Each of those address formats defines something that can be used as
the "localpart" of an address.
The localpart of an email address uses either the "local-part" or the
"dot-atom-text" rule in [RFC5322]. Here we make the simplifying
assumption that the "dot-atom-text" rule applies:
dot-atom-text = 1*atext *("." 1*atext)
atext = ALPHA / DIGIT / ; Any character except
"!" / "#" / "$" / ; controls, SP, and
"%" / "&" / "'" / ; specials. Used for
"*" / "+" / "-" / ; atoms.
"/" / "=" / "?" /
"^" / "_" / "`" /
"{" / "|" / "}" /
"~"
We make the same simplifying assumption for im: and pres: URIs
(although their specifications reference [RFC2822]).
A Kerberos Principal Name is a sequence of strings of type
KerberosString, where each KerberosString is a GeneralString that is
constrained to contain only characters in IA5String.
PrincipalName ::= SEQUENCE {
name-type [0] Int32,
name-string [1] SEQUENCE OF KerberosString
}
KerberosString ::= GeneralString (IA5String)
A Network Address Identifier inherits from [RFC821]. Here we care
only about the "username" rule:
Saint-Andre Expires October 2, 2014 [Page 8]
Internet-Draft Username Interoperability March 2014
username = dot-string
dot-string = string
dot-string =/ dot-string "." string
string = char
string =/ string char
char = c
char =/ "\" x
c = %x21 ; '!' allowed
; '"' not allowed
c =/ %x23 ; '#' allowed
c =/ %x24 ; '$' allowed
c =/ %x25 ; '%' allowed
c =/ %x26 ; '&' allowed
c =/ %x27 ; ''' allowed
; '(', ')' not allowed
c =/ %x2A ; '*' allowed
c =/ %x2B ; '+' allowed
; ',' not allowed
c =/ %x2D ; '-' allowed
; '.' not allowed
c =/ %x2F ; '/' allowed
c =/ %x30-39 ; '0'-'9' allowed
; ';', ':', '<' not allowed
c =/ %x3D ; '=' allowed
; '>' not allowed
c =/ %x3F ; '?' allowed
; '@' not allowed
c =/ %x41-5a ; 'A'-'Z' allowed
; '[', '\', ']' not allowed
c =/ %x5E ; '^' allowed
c =/ %x5F ; '_' allowed
c =/ %x60 ; '`' allowed
c =/ %x61-7A ; 'a'-'z' allowed
c =/ %x7B ; '{' allowed
c =/ %x7C ; '|' allowed
c =/ %x7D ; '}' allowed
c =/ %x7E ; '~' allowed
; DEL not allowed
c =/ %x80-FF ; UTF-8-Octet allowed
x = %x00-FF ; all 128 ASCII characters
The localpart of a sip:/sips: URI inherits from the "userinfo" rule
in [RFC3986] with several changes; here we discuss the SIP "user"
rule only:
Saint-Andre Expires October 2, 2014 [Page 9]
Internet-Draft Username Interoperability March 2014
user = 1*( unreserved / escaped / user-unreserved )
user-unreserved = "&" / "=" / "+" / "$" / "," / ";" / "?" / "/"
unreserved = alphanum / mark
mark = "-" / "_" / "." / "!" / "~" / "*" / "'"
/ "(" / ")"
The localpart of an XMPP address allows any ASCII character except
space, controls, and the " & ' / : < > @ characters.
The 'acct' URI syntax borrows the 'host', 'pct-encoded', 'sub-
delims', 'unreserved' rules from [RFC3986]:
acctURI = "acct" ":" userpart "@" host
userpart = unreserved / sub-delims
0*( unreserved / pct-encoded / sub-delims )
To summarize the foregoing information, the following table lists the
allowed and disallowed characters in the localpart of identifiers for
each protocol (aside from the alphanumeric, space, and control
characters), in order by hexadecimal character number (where each "A"
row shows the allowed characters and each "D" row shows the
disallowed characters).
Saint-Andre Expires October 2, 2014 [Page 10]
Internet-Draft Username Interoperability March 2014
Table 1: Allowed and Disallowed Characters (Non-Alphanumeric)
+---+----------------------------------+
| EMAIL ADDRESSES, IM/PRES URIs |
+---+----------------------------------+
| A | ! #$%&' *+ - / = ? ^_`{|}~ |
| D | " () , . :;< > @[\] |
+---+----------------------------------+
| KERBEROS PRINCIPAL NAMES |
+---+----------------------------------+
| A | !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ |
| D | |
+---+----------------------------------+
| NETWORK ADDRESS IDENTIFIERS |
+---+----------------------------------+
| A | ! #$%&' *+ - / = ? ^_`{|}~ |
| D | " () , . :;< > @[\] |
+---+----------------------------------+
| SIP/SIPS URIs |
+---+----------------------------------+
| A | ! $ &'()*+,-./ ; = ? _ ~ |
| D | "# % : < > @[\]^ `{|} |
+---+----------------------------------+
| XMPP ADDRESSES |
+---+----------------------------------+
| A | ! #$% ()*+,-. ; = ? [\]^_`{|}~ |
| D | " &' /: < > @ |
+---+----------------------------------+
| ACCT URIs |
+---+----------------------------------+
| A | ! $%&'()*+,-. ; = \ ^_`{|}~ |
| D | "# /: < >?@[ ] |
+---+----------------------------------+
The interoperable subset allows only characters that are allowed in
all of the foregoing formats, as shown in the following table.
Table 2: Subset Characters (Non-Alphanumeric)
+---+----------------------------------+
| INTEROPERABLE SUBSET |
+---+----------------------------------+
| A | ! $ *+ - = _ ~ |
| D | "# %&'() , ./:;< >?@[\]^ `{|} |
+---+----------------------------------+
Saint-Andre Expires October 2, 2014 [Page 11]
Internet-Draft Username Interoperability March 2014
Appendix B. Acknowledgements
Thanks to Sean Turner for inspiring the work on this document.
Thanks also to Paul Hoffman, John Klensin, and Glen Zorn for their
comments.
Author's Address
Peter Saint-Andre
&yet
P.O. Box 787
Parker, CO 80134
USA
Email: ietf@stpeter.im
Saint-Andre Expires October 2, 2014 [Page 12]