Internet Draft Paul Hoffman
draft-hoffman-imaa-01.txt IMC & VPNC
April 18, 2003 Adam M. Costello
Expires in six months UC Berkeley
Internationalizing Mail Addresses in Applications (IMAA)
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note
that other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
The Internationalizing Domain Names in Applications (IDNA)
specification describes how to process domain names that have
characters outside the ASCII repertoire. A user who has an
internationalized domain name may want to have their full Internet
mail address internationalized, including the local part (that
is, the part to the left of the "@"). This document describes
how to use non-ASCII characters in local parts, by defining
internationalized local parts (ILPs), internationalized mail
addresses (IMAs), and a mechanism called IMAA for handling them in a
standard fashion.
1. Introduction
A mail address consists of local part, an at-sign (@), and a domain
name. The IDNA specification [IDNA] describes how to handle domain
names that have non-ASCII characters. This document describes how
to handle non-ASCII characters in the rest of the mail address.
This document explicitly does not discuss internationalization of
display names and comments in mail addresses that appear in message
headers [RFC2822]. MIME part three [RFC2047] describes how use an
extended set of characters in message headers, and this document
does not alter that specification.
This document is being discussed on the ietf-imaa mailing list. See
for information about subscribing
and the list's archive.
1.1 Relationship to IDNA
This document relies heavily on IDNA for both its concepts and
its justification. This document omits a great deal of the
justification and design information that might otherwise be found
here because it is identical to that in IDNA. Anyone reading this
document needs to have first read [IDNA], [PUNYCODE], [NAMEPREP],
and [STRINGPREP].
The main differences between how IMAA treats local parts of mail
addresses and how IDNA treats domain names are:
- The ACE prefix for internationalized local parts is different
from the ACE prefix for internationalized domain labels.
[[ OPEN ISSUE: Should it be the same? ]]
- Domain names have an intrinsic segmentation into labels, and
are already segmented before transformations are performed.
Local parts, on the other hand, have no intrinsic segmentation.
The transformations on local parts perform a segmentation
internally, but it has no external significance.
- There is no UseSTD3ASCIIRules flag for local parts.
One apparent difference that is not really a difference is the
handling of quoting mechanisms. IDNA did not discuss quoting
because the phrase "domain label" is presumed to refer to a simple
literal string. [STD13] defines domain labels in terms of their
literal form (which is used in DNS protocol messages), and later
introduces a quoting syntax for representing domain labels in master
files, but there is never any doubt that the domain label itself is
a simple unstructured sequence. It goes without saying that domain
labels obtained from contexts that use quoting (like master files)
need to be reduced to their literal form before any processing is
done on them.
Local parts, on the other hand, are defined in [RFC2822] and
[RFC2821] in terms of their quoted form, as they appear in message
headers and SMTP commands. Later it is stated that the quotation
characters are not really part of the local part. To avoid any
ambiguity, IMAA explicitly discusses the process of dequoting and
requoting local parts.
1.2 Open issues
This section describes the issues that are known to be unresolved.
There may also be other issues we haven't thought of yet. This
section might be easier to follow after the rest of the draft has
been read. This section will be removed before the document is
passed to the IESG or RFC Editor for publication.
Throughout the draft, comments related to these open issues appear
inside brackets like this: [[ OPEN ISSUE: comments ]].
The IMAA model in this draft is incompatible with case-sensitive
mail exchangers, and therefore IMAs cannot be created in domains
whose mail exchangers are case-sensitive. Case-sensitivity in
mail exchangers is allowed but discouraged by [RFC2821], and
is thought to be very rare. It would be possible for IMAA to
support case-sensitive mail exchangers, but it would entail
complications to the model. Non-traditional local parts would not
always be case-insensitive, but could be either case-insensitive
or lowestcase-only (the concept of lowestcase would need to be
defined). Instead of the symmetric notion of "equivalence"
between local parts, there would be an asymmetric notion of
"substitutability" (whose definition would depend on the concept
of lowestcase). The ToASCII and ToUnicode operations would be
constrained to preserve the lowestcase property (that is, the output
must be lowestcase if the input is lowestcase). The details have
all been worked out, but perhaps it is not worth the trouble, and
better to just let case-sensitive mail exchangers go unsupported.
Currently hyphen is not a protected character, because it is used by
both Punycode and the ACE prefix. It is possible, however, to avoid
the use of hyphen for those purposes, which would allow hyphen to
be protected, for better compatibility with structured local part
conventions that use hyphen as a delimiter. Here is how it could be
done: After applying the Punycode encoder, instead of prepending
the ACE prefix, insert the ACE infix in place of the hyphen (or
prepend the infix if there is no hyphen). On the decoding side,
instead of looking for the ACE prefix and removing it, look for the
ACE infix and change it to a hyphen (or just delete it if it occurs
at the beginning), then apply the Punycode decoder.
If we decide to stick with a prefix containing hyphens, we might
want to consider reusing the IDNA ACE prefix (this was not
considered in draft 00 because in that draft IMAA used a different
stringprep profile from IDNA). The disadvantage of using a
different prefix is that humans cannot, without computational
assistance, copy local parts into domain labels (as in SOA records)
or copy domain names into local parts, because copying the non-ASCII
form and then converting to ASCII would give a different result
versus converting to ASCII and then copying, and it's the latter
procedure that must be considered correct (for compatibility with
IMA-unaware and IDN-unaware software that might try to do the same
sort of copying). Furthermore, once the copying has happened,
the result will display unintelligibly (the ACE will be visible),
because the different ACE prefix won't be recognized on the other
side of the at-sign. It is impossible to fully solve this problem,
because encoded strings don't mark their own endings, only their
own beginnings. Even if the same ACE prefix is used on both sides
of the at-sign, if local parts are segmented then a multi-segment
local part copied into a domain label will not display intelligibly,
while if local parts are not segmented then a multi-label domain
name copied into a local part will not display intelligibly.
However, using the same ACE prefix would allow the common cases to
work intuitively: Local parts containing only LDH characters and
non-ASCII characters could be copied (by humans, in non-ACE form)
into domain labels (where they would display correctly), and domain
names obeying the STD3 ASCII rules could be copied (by humans, in
non-ACE form) into local parts (where they would display correctly).
One concern with using the same prefix is that in the uncommon cases
where it doesn't work nicely, the unintelligible display will not be
an ACE, but will be non-ASCII gobbledygook (which will still work if
copied back to the other side of the at-sign, but might be even less
user-friendly than an ACE).
Should we keep the requirement about recognizing fullwidth at-signs?
It seems needed for consistency with IDNA's requirement about
recognizing fullwidth dots.
If we were to drop the at-sign requirement, it would become possible
to narrow our focus from "mail address slots" to "local part slots".
But would we want to do that? If we keep the at-sign requirement,
it's a moot point, because then we're talking about the whole
address.
When converting mail addresses to ASCII, should ideographic full
stop be converted to ASCII full stop in local parts, as is done in
domain names? This was desirable in domain names because all domain
names contain dots, so we wanted them to be easy to type. But local
parts need not contain dots, and most don't, so that motivation is
not nearly as compelling in local parts. Also, the conversion in
IDNA makes it difficult or impossible to include ideographic full
stop inside domain labels. If the conversion were done in local
parts, the same difficulty would arise. Users might prefer the
ability to use honest-to-goodness ideographic full stops in local
parts, rather than reserve them as a typing shortcut for ASCII full
stops. For example, one of the most well-known pop groups in Japan,
Morning Musume, has an ideographic full stop in their name.
In the dequoting step, fullwidth versions of nonliteral ASCII
characters (like quote marks and backslashes) are required to be
recognized as equivalent to the regular ASCII versions. Should we
keep this requirement?
In the requoting step, the original quoted local part is recommended
when ToASCII/ToUnicode had no effect and the original quoting style
is compatible with the destination context. Should we keep that
recommendation? It adds complexity, and should not be necessary,
but it makes IMAA less likely to trigger quotation-related bugs,
and is motivated by the principle of not altering local parts
unnecessarily (for example, when converting an already-ASCII local
part to ASCII, don't gratuitously change the way it's quoted).
The 59-character limit on the Punycode encoder output is aimed
at making it easier to reuse Punycode implementations that were
written for IDNA (and which might use fixed-sized buffers). Should
this limit be relaxed for IMAA? Unlike domain labels, which have
a hard size limit imposed by the syntax of DNS messages, local
parts have no hard limit (SMTP must support local parts up to 64
character, but may support arbitrarily large local parts). A
Punycode implementation using 31-bit unsigned integers (or 32-bit
signed integers) ought to be able to handle Unicode strings in
excess of 2000 code points (I have not calculated the exact limit).
For very long strings, the O(n^2) running time of Punycode might
become an issue.
What more should we say about stored strings versus query strings?
1.3 Closed issues that could be reopened
Rather than transform the local part as multiple segments, another
approach is to transform it as a single unit. The tradeoff is
complexity versus compatibility with various unofficial conventions
for structured local parts, like owner-listname, user+tag,
sublocal.local, path!user, etc. Breaking a local part into segments
is about as complex as breaking a domain name into labels.
If segmentation were abandoned, we would lose a major reason to
avoid punctuation in the ACE prefix. By using using punctuation
other than hyphens, we could use the same letters as IDNA. For
example, the IDNA ACE prefix is xn--, and the IMAA ACE prefix could
be xn__.
2. Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
and "MAY" in this document are to be interpreted as described in RFC
2119 [RFC2119].
Code point, Unicode, and ASCII are defined in [IDNA].
Each ASCII character whose code point is in the range 21..7E has a
corresponding "fullwidth version" whose code point is in the range
FF01..FF5E, respectively.
[[ OPEN ISSUE: The above definition is not needed if the requirement
about fullwidth versions of nonliteral ASCII characters is removed.
]]
The "protected code points" are 0..40, 5B..60, 7B..7F (in other
words, those corresponding to ASCII characters other than letters,
digits, and hyphen-minus).
[[ OPEN ISSUE: We might want to add hyphen-minus to the set of
protected characters, but we'd need to deal with the use of
hyphen-minus by Punycode and the ACE prefix. ]]
A "mail address" consists of a local part, an at-sign, and a domain
name, in that order. The exact details of the syntax depend on
the context; for example, a "mailbox" in [RFC2821] (SMTP) and an
"addr-spec" in [RFC2822] (message format) are both mail addresses,
but they define slightly different syntaxes for local parts and
domain names.
A "dequoted local part" is the simple literal text string that
is the intended "meaning" of a local part after it has undergone
lexical interpretation. A dequoted local part excludes optional
white space, comments, and lexical metacharacters (like backslashes
and quotation marks used to quote other characters). Dequoted local
parts are generally not allowed in protocols (like SMTP commands and
message headers), but they are needed by IMAA as an intermediate
form. The dequoted form of X is sometimes written dequote(X).
An "internationalized local part" (ILP) is anything that satisfies
both of the following conditions: (1) It conforms to the same
syntax as a non-internationalized local part except that (a)
non-ASCII Unicode characters are allowed wherever ASCII letters are
allowed, and (b) for every ASCII character that has a nonliteral
meaning (like quotation or comment delimitation), the fullwidth
version (if there is one) has the same meaning. (2) After it has
been dequoted, the ToASCII operation can be applied to it without
failing (see section 4). The term "internationalized local part"
is a generalization, embracing both old ASCII local parts and
new non-ASCII local parts. Although most Unicode characters can
appear in internationalized local parts, ToASCII will fail for some
inputs. Anything that fails to satisfy condition 2 is not a valid
internationalized local part.
[[ OPEN ISSUE: Should we keep (1)(b)? ]]
A "traditional local part" is a local part that contains only ASCII
characters and whose dequoted form would be left unchanged by the
ToUnicode operation (see section 4).
An "internationalized mail address" (IMA) consists of an
internationalized local part, an at-sign, and an internationalized
domain name [IDNA], in that order.
Equivalence of local parts is defined in terms of the dequoted form
(see above) and the ToASCII operation, which constructs an ASCII
form for a given dequoted local part (whether or not the local part
was already an ASCII local part). Two traditional local parts X
and Y are equivalent if and only if dequote(X) and dequote(Y) are
exactly identical. (That is not a new rule, it is inferred from
[RFC2821] and [RFC2822].) For internationalized local parts X and
Y that are not both traditional, they are defined to be equivalent
if and only if ToASCII(dequote(X)) matches ToASCII(dequote(Y)) using
a case-insensitive ASCII comparison. Unlike traditional local
parts, non-traditional internationalized local parts are always
case-insensitive.
Two internationalized mail addresses are equivalent if and only
if their local parts are equivalent (according to the previous
definition) and their domain parts are equivalent (according to
IDNA).
To allow internationalized labels to be handled by existing
applications, IDNA uses an "ACE local part" (ACE stands for ASCII
Compatible Encoding). An ACE local part is an internationalized
local part that can be rendered in ASCII and is equivalent to an
internationalized local part that cannot be rendered in ASCII.
Given any internationalized local part (in dequoted form) that
cannot be rendered in ASCII, the ToASCII operation will convert it
to an equivalent ACE local part (whereas an ASCII local part will
be left unaltered by ToASCII). ACE local parts are unsuitable for
display to users. The ToUnicode operation will convert any local
part (in dequoted form) to an equivalent non-ACE local part. In
fact, an ACE local part is formally defined to be any local part
that the ToUnicode operation would alter (whereas non-ACE local
part are left unaltered by ToUnicode). The ToASCII and ToUnicode
operations are specified in section 4.
The "ACE prefix for local parts" (or simply the "ACE prefix" when
the context is clear) is defined in this document to be a string of
ASCII characters that begins every encoded segment within a dequoted
ACE local part. It is specified in section 5.
[[ OPEN ISSUE: It might be preferrable to use an infix rather than a
prefix. ]]
A "mail address slot" is defined in this document to be a protocol
element or a function argument or a return value (and so on)
explicitly designated for carrying a mail address. Mail address
slots exist, for example, in the MAIL and RCPT commands of the SMTP
protocol, in the To: and Received: fields of message headers, and
in a mailto: URI in the href attribute of an HTML tag. General
text that just happens to contain an mail address is not a mail
address slot; for example, a mail address appearing in the plain
text body of a message is not occupying a mail address slot.
An "IMA-aware mail address slot" is defined in this document to
be a mail address slot explicitly designated for carrying an
internationalized mail address as defined in this document. The
designation may be static (for example, in the specification of
the protocol or interface) or dynamic (for example, as a result of
negotiation in an interactive session).
An "IMA-unaware mail address slot" is defined in this document to be
any mail address slot that is not an IMA-aware mail address slot.
Obviously, this includes any mail address slot whose specification
predates this document.
3. Requirements and applicability
3.1 Requirements
IMAA conformance means adherence to the following four requirements:
1) In an internationalized mail address, the following characters
MUST be recognized as at-signs for separating the local part
from the domain name: U+0040 (commercial at), U+FF20 (fullwidth
commercial at).
[[ OPEN ISSUE: Keep that requirement? ]]
2) Whenever a mail address is put into an IMA-unaware mail address
slot (see section 2), it MUST contain only ASCII characters.
Given an internationalized mail address, an equivalent mail
address satisfying this requirement can be obtained by applying
ToASCII to the local part as specified in section 4, changing
the at-sign to U+0040, and processing the domain name as
specified in [IDNA].
3) ACE local parts obtained from mail address slots SHOULD be
hidden from users when it is known that the environment
can handle the non-ACE form, except when the ACE form is
explicitly requested. When it is not known whether or not the
environment can handle the non-ACE form, the application MAY
use the non-ACE form (which might fail, such as by not being
displayed properly), or it MAY use the ACE form (which will
look unintelligible to the user). Given an internationalized
local part, an equivalent non-ACE local part can be obtained
by applying the ToUnicode operation as specified in section
4. When requirements 2 and 3 both apply, requirement 2 takes
precedence.
4) If two mail addresses are equivalent and either one refers to a
mailbox, then both MUST refer to the same mailbox, regardless of
whether they use the same form of at-sign.
Discussion: This implies that non-ASCII local parts cannot be
deployed in domains whose mail exchangers are case-sensitive.
IMAA is designed to work without upgrading mail exchangers,
but it works only for mail exchangers that treat ASCII local
parts as case-insensitive (which is the common and preferred
behavior). All local parts received by an IMA-unaware
mail exchanger are ASCII, either traditional or ACE, and a
case-insensitive exchanger will automatically obey requirement 4
without being aware of it. Case-sensitive exchangers will not
correctly handle ACE local parts, but administrators can simply
refrain from creating ACE local parts in those domains. This is
necessary because a round-trip through ToUnicode and ToASCII is
not case-preserving, and therefore the result might refer to a
different mailbox (in violation of requirement 4) if interpreted
by a case-sensitive mail exchanger.
[[ OPEN ISSUE: IMAA could work with case-sensitive mail
exchangers if we added some complexity to the model. ]]
3.2 Applicability
IMAA is applicable to all mail addresses in all mail address slots
except where it is explicitly excluded.
This implies that IMAA is applicable to protocols that predate IMAA.
Note that mail addresses occupying mail address slots in those
protocols MUST be in ASCII form (see section 3.1, requirement 2).
3.2.1. Case-sensitive local parts
IMAA does not apply to local parts that are interpreted
case-sensitively (see section 3.1 requirement 4).
4. Conversion operations
An application converts a local part put into an IMA-unaware mail
address slot or displayed to a user. This section specifies the
steps to perform in the conversion, and the ToASCII and ToUnicode
operations.
The input to ToASCII or ToUnicode is a dequoted local part that is a
sequence of Unicode code points (remember that all ASCII code points
are also Unicode code points). If a local part is represented using
a character set other than Unicode or US-ASCII, it will first need
to be transcoded to Unicode.
Starting from a local part, the steps that an application takes to
do the conversions are:
1) Decide whether the local part is a "stored string" or a "query
string" as described in [STRINGPREP]. If this conversion
follows the "queries" rule from [STRINGPREP], set the flag
called "AllowUnassigned".
[[ OPEN ISSUE: We need more here, possibly pointing to a
different section where we specify exactly what kinds of things
are stored and queries. ]]
2) Save a copy of the local part.
3) Dequote the local part; that is, perform lexical interpretation
and remove all nonliteral characters. For example, for local
parts that use the lexical syntax of [RFC2821] (SMTP) or
[RFC2822] (message format), unfold it, remove comments and
unquoted white space, and remove backslashes and quotation marks
used to quote other characters. The result is a simple literal
text string. Fullwidth versions of nonliteral ASCII characters
MUST be accepted as equivalent to the ASCII versions.
4) Process the string with either the ToASCII or the ToUnicode
operation as appropriate. Typically, you use the ToASCII
operation if you are about to put the local part into an
IMA-unaware slot, and you use the ToUnicode operation if you are
displaying the local part to a user.
5) Apply whatever quoting is needed in the destination context
(if any). For "mailbox" slots [RFC2821] and "addr-spec" slots
[RFC2822] the following action suffices: If the string contains
any control characters, spaces, or specials [RFC2822], or if it
begins or ends with a dot, or contains two consecutive dots,
then convert it to a quoted-string: insert a backslash before
every quotation mark and backslash, then enclose the string with
quotation marks. If step 4 had no effect on the string, and if
the saved local part from step 2 is a valid representation of
the string in the destination context, then the saved local part
SHOULD be used, even if it uses more quoting than necessary.
[[ OPEN ISSUE: Keep that last sentence and step 2? ]]
The destination context might also impose a length restriction.
Depending on whether the restriction applies to the quoted form or
the dequoted form, the application might want to check the length
just before or after step 5.
The following two subsections define the ToASCII and ToUnicode
operations that are used in step 4.
This description of the protocol uses specific procedure names,
names of flags, and so on, in order to facilitate the specification
of the protocol. These names, as well as the actual steps of the
procedures, are not required of an implementation. In fact, any
implementation which has the same external behavior as specified in
this document conforms to this specification.
4.1 ToASCII
The ToASCII operation takes a sequence of Unicode code points that
make up a dequoted local part and transforms it into a sequence of
code points in the ASCII range (0..7F). If ToASCII succeeds, the
original sequence and the resulting sequence are equivalent dequoted
local parts.
It is important to note that the ToASCII operation can fail.
ToASCII fails if any step of it fails. If any step of the
ToASCII operation fails, that string MUST NOT be used as an
internationalized local part. The method for dealing with this
failure is application-specific.
The inputs to ToASCII are a sequence of code points, and the
AllowUnassigned flag. The output of ToASCII is either a sequence of
ASCII code points or a failure condition.
ToASCII never alters a sequence of code points that are all in the
ASCII range to begin with. Applying the ToASCII operation multiple
times has exactly the same effect as applying it just once.
ToASCII consists of the following steps:
1. If the sequence contains any code points outside the ASCII range
(0..7F) then proceed to step 2, otherwise stop, leaving the
sequence unchanged.
2. Perform the steps specified in [NAMEPREP] and fail if there is
an error. The AllowUnassigned flag is used in [NAMEPREP].
3. If the sequence is empty then stop, leaving an empty result.
4. Divide the sequence into segments. Segment boundaries occur
wherever a protected code point is adjacent to a non-protected
code point, and nowhere else. (Therefore segments are never
empty, and they alternate between segments containing only
protected code points and segments containing only non-protected
code points.)
5. For each segment perform the following substeps:
(a) If the segment contains any code points outside the ASCII
range (0..7F) then proceed to substep b, otherwise leave the
segment unchanged.
(b) Verify that the segment does NOT begin with the ACE prefix.
(c) Encode the sequence using the encoding algorithm in
[PUNYCODE] and fail if there is an error.
(d) Verify that the result contains no more than 59 code points.
[[ OPEN ISSUE: Relax this restriction? ]]
(e) Prepend the ACE prefix.
6. Rejoin the segments into a single sequence.
4.2 ToUnicode
The ToUnicode operation takes a sequence of Unicode code points that
make up a dequoted local part and returns a sequence of Unicode code
points. If the input sequence is a dequoted local part in ACE form,
then the result is an equivalent dequoted internationalized local
part that is not in ACE form, otherwise the original sequence is
returned unaltered.
ToUnicode never fails. If any step fails, then the original input
sequence is returned immediately in that step.
The ToUnicode output never contains more code points than its input.
Note that the number of octets needed to represent a sequence of code
points depends on the particular character encoding used.
The inputs to ToUnicode are a sequence of code points, and the
AllowUnassigned flag. The output of ToUnicode is a sequence of code
points.
ToUnicode consists of the following steps:
1. If the sequence contains any code points outside the ASCII range
(0..7F) then proceed to step 2, otherwise skip to step 3.
2. Perform the steps specified in [NAMEPREP] and fail if there is
an error. The AllowUnassigned flag is used in [NAMEPREP].
3. Verify that the sequence is nonempty, and save a copy of the
sequence.
4. Divide the sequence into segments (same as step 4 of ToASCII).
5. For each segment perform the following substeps:
(a) If the segment does not begin with the ACE prefix then leave
the segment unchanged, otherwise save a copy of the segment
and proceed to substep b.
(b) Remove the ACE prefix.
(c) Decode the segment using the decoding algorithm in
[PUNYCODE] and catch any error. If there was an error then
restore the saved copy from substep a.
6. Verify that at least one segment was altered in step 5.
7. Rejoin the segments into a single sequence, and save a copy of
the result.
8. Apply ToASCII to the current sequence and to the saved copy from
step 3.
9. Verify that the two results of step 8 match using a
case-insensitive ASCII comparison.
10. Return the saved copy from step 7.
5. ACE prefix
[[ Note to the IESG and Internet Draft readers: The two uses of the
string "iesg--" below are to be changed at time of publication to a
prefix which fulfills the requirements in the first paragraph. IANA
will assign this value. ]]
The ACE prefix, used in the conversion operations (section 4), is
two ASCII letters followed by two hyphen-minuses. It cannot be the
same as the prefix assigned to IDNA. The ToASCII and ToUnicode
operations MUST recognize the ACE prefix in a case-insensitive
manner.
[[ OPEN ISSUE: We might want to consider a prefix that uses
different punctuation, or an infix that uses no punctuation. ]]
[[ OPEN ISSUE: We might want to consider using the same prefix as
IDNA. ]]
The ACE prefix for IMAA is "iesg--" or any capitalization thereof.
This means that an ACE local part might be
"foobar!iesg--de-jg4avhby1noc0d!iesg--d9juau41awczczp", where
"de-jg4avhby1noc0d" and "d9juau41awczczp" are the parts of the ACE
local part that are generated by the encoding steps in [PUNYCODE].
While every encoded segment (segment that would be altered by
ToUnicode) within an ACE local part begins with the ACE prefix, not
every segment beginning with the ACE prefix is an encoded segment.
Segments that begin with the ACE prefix but are not encoded segments
will confuse users, and local parts containing such segments SHOULD
NOT be used as mailbox names.
6. References
6.1 Normative references
[IDNA] Faltstrom, P., Hoffman, P. and A. Costello,
"Internationalizing Domain Names in Applications
(IDNA)", RFC 3490, March 2003.
[NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names (IDN)",
RFC 3491, March 2003.
[PUNYCODE] Costello, A., "Punycode: A Bootstring encoding of
Unicode for use with Internationalized Domain Names in
Applications (IDNA)", RFC 3492, March 2003.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
April 2001.
[RFC2822] Resnick, P., "Internet Message Format", RFC 2822,
April 2001.
[STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454,
December 2002.
6.2 Informative references
[RFC2047] Moore, K., "MIME (Multipurpose Internet Mail
Extensions) Part Three: Message Header Extensions for
Non-ASCII Text", RFC 2047, November 1996.
7. Security considerations
Because this document normatively refers to [IDNA], [NAMEPREP],
[PUNYCODE], and [STRINGPREP], it includes the security
considerations from those documents as well.
Internationalized local parts will cause mail addresses to become
longer, and possibly make it harder to keep lines in a header under
78 characters. Lines that are longer than 78 characters (which
is a SHOULD specification, not a MUST specification, in RFC 2822)
could possibly cause mail user agents to fail in ways that affect
security.
8. IANA considerations
IANA will assign the ACE prefix in consultation with the IESG,
possibly following the same process used for [IDNA].
9. Authors' addresses
Paul Hoffman
Internet Mail Consortium and VPN Consortium
127 Segre Place
Santa Cruz, CA 95060 USA
phoffman@imc.org
Adam M. Costello
University of California, Berkeley
http://www.nicemice.net/amc/