Internet-Draft H. Alvestrand draft-alvestrand-i18n-howto-01.txt Cisco Systems Target Category: Informational November 2001 Expires: May 2002 Protocol Redesigner's Handbook ? volume i18n Guidelines for internationalization of protocols Status of this Memo The file name of this memo is draft-alvestrand-i18n-howto-01.txt This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Discussion on this draft should be directed to the mailing list intloc- discuss@ops.ietf.org. This is NOT an open mailing list. Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 Abstract This document attempts to give guidelines for the people who have to deal with existing protocols where issues of languages and character sets were not considered from the beginning, and tries to help them a little along the way. Some of the advice might also be useful for people designing new protocols. With new protocols, the document might help in getting the internationalization right in the first attempt; at this stage, we all know that protocols MUST be internationalized. draft-alvestrand-i18n-howto-01.txt [Page 2] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-02.txt Expires May 2002 Protocol Redesigner's Handbook ? volume i18n.....................1 Guidelines for internationalization of protocols.................1 Status of this Memo..............................................1 Abstract.........................................................2 1. Introduction..................................................3 2. Classes of information........................................4 3. Designing Internet internationalization.......................6 3.1 Basic concepts for the Internet...........................6 3.2 Internationalization components outside IETF scope........7 3.3 Operations likely to be impacted by internationalization..7 3.4 How to tell whether you have a script problem or a language problem........................................................9 4. Specific sorting, matching and canonicalization options......10 4.1 Internationalized encodings..............................11 4.2 Normalization............................................12 5. Tricks to shoehorn stuff into older protocols................13 6. Security Considerations......................................16 7. Acknowledgements.............................................17 8. Author's Address.............................................17 9. References...................................................17 1. Introduction Human beings on our planet have, past and present, used a number of languages. draft-alvestrand-i18n-howto-02.txt [Page 3] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-02.txt Expires May 2002 These have been represented in a number of media using a variety of encoding systems, most commonly in scripts using some kinds of characters. These days, humans want to use the Internet to communicate between themselves, and to interact with information stores on the Internet, and see no reason to learn a new language in order to do so. This means that they have to use Internet protocols to communicate. And they will want to represent the scripts they are used to from off the Internet when they use the Internet protocols. And they expect the Right Thing to happen. This document talks about what doing the Right Thing means. 2. Classes of information Most protocols are designed with pieces that belong in various categories: <> A. Protocol elements, defined by the protocol designer, which should never be shown to the user, and are never changed. Examples: Verbs in the SMTP protocol [RFC2822], SNMP object identifiers [SNMP] B. Managed-namespace identifiers, defined by some orderly process, intended to be used by any protocol user anywhere, often through interfaces that hide the actual values, but sometimes directly. Examples: Language tags [RFC3066], URI schemes [URLREG] C. Global-scope identifiers, intended for visibility to any user who has an use for them anywhere, but not completely managed by a central authority Examples: DNS names, URLs, IP addresses, user@domain email addresses D. Local-scope identifiers, intended for visibility to a small set of users, but may be visible in several contexts Reasonable usage of such identifiers means that it is possible to appeal to some shared context in order to decide what it "means" Examples: login account names, filenames within a directory, port numbers on a host E. Data elements, intended for visibility within a certain context only Examples: Text of email messages, Web page content, instant messages, subject lines in mail Internationalizing an identifier or a data element in this context means making it capable of representing information relevant to any draft-alvestrand-i18n-howto-01.txt [Page 4] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 user, no matter which script or language this user uses. This may involve dealing with character representation, processing rules, language tagging, language negotiation or other functions as appropriate. For each element to be considered, there are 3 alternatives: 1. State that the element is immutable, invisible and inviolable, and therefore internationalization is irrelevant (and the protocol/product designer should try REALLY hard to make sure the user never knows or needs to know the value) 2. State that the element has to be in a very limited representation (such as the A-Z 0-9 character repertoire) so that it can be globally recognized and entered (822 headers, language tags) (the protocol designer might reasonably want the user to get at the value of the element, but should not depend on the user associating anything meaningful with the identifier) 3. State that the element is a textual element for which the user decides the appropriate content. Basically, it has to be internationalized. Internationalization requirements started out with data content (MIME for email, for instance), and are working their way up the chain. For a long time (see [IAB-ARCH], for instance), we thought that global-scope identifiers like DNS names should be kept in category 2 (limited repertoire), but increasing pressure from the community of people who do not use ASCII in their daily lives has led to a reconsideration here (IDN). The current thinking of the group discussing this document, which is suggested as IETF policy, is that protocol elements (A) and most if not all managed-namespace identifiers (B) should be treated according to alternative 2 above; their values should be binary, numeric or invariant-subset ASCII. This makes testing and debugging easier, and does not limit the expressive power of any protocol. Note: Experience in the IETF is that implementers are lousy at hiding things from the users, and users are often very fond of finding the things implementers think should be hidden; that most people now know that http:// means "you can look it up in a browser" is unsurprising; the colloquial use of "402" (the HTTP error code for "document does not exist") as a synonym for "not where he should be" is perhaps more so. draft-alvestrand-i18n-howto-01.txt [Page 5] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 3. Designing Internet internationalization 3.1 Basic concepts for the Internet The fundamental difference between common internationalization/localization and Internet protocol internationalization is this: ON THE INTERNET, THE TWO ENDS OF THE COMMUNICATION CANNOT BE ASSUMED TO BE IN THE SAME PLACE. This means, in particular, that: . The two ends of the communication may not share a common external context such as a "locale"; quite commonly, the two ends are in different countries, and may not even know (or care) what country the other end of the conversation is in. . The two ends of the communication do not necessarily have ANY common knowledge except for the implementation of the protocol. With implementations in local networks, not even Internet access can be assumed, so even references to Internet-accessible resources are not guaranteed to work. This means that: . ALL information required for correct operation of the protocol must be specified in the protocol documentation, or be carried in the communication between the parties . When user preferences are involved, and multiple values are possible, the specification must guarantee a least common subset of identifiers, and properly handle the enumeration of identifiers (for instance by IANA registration). One note that has more to do with psychology of developers than with correct specification: It is better to fill in a field than to specify a default in a protocol specirfication. At times, one has had protocols that stated a "default value", and that one added a parameter to change this value. Sometimes, for instance with the HTTP content-type field, which had a "charset" parameter for the "text/html" type, implementers reinterpreted the absence of a parameter as "anything goes", and let their implementations ship anything they wanted without labeling it, leaving the recipient to guess at charsets. This had predictably dire consequences, and has led some people to believe that it is better to "waste" the bytes required to always specify explicitly what a parameter is, instead of relying on a default. draft-alvestrand-i18n-howto-01.txt [Page 6] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 When discussing internationalization, it is also very important to use common terminology. The terminology of this field is littered with seemingly simple words that are used for different things by different people, with "character set", "script" and "language" being high on the list of abused terms. Refer to [Hoffman]. 3.2 Internationalization components outside IETF scope Internationalizing a program or a service involves much more than the protocols. But these other matters are not IETF issues, and do not impinge upon the IETF standards process except indirectly. In particular: . The IETF does not standardize user interfaces. This means that input methods, display methods and display characteristics are out of scope for the IETF. (However, information about such methods and characteristics may at times have to be communicated using parameters of IETF protocols.) . The IETF does not standardize APIs, except for the rare case of an API to a protocol This also means that the presentation of data, and conversions upon data performed in order to do presentation, is outside the scope of IETF standards, while conversions upon data in order to do protocol operations are in scope (and may possibly be reused for presentation purposes). The IETF standards are chiefly concerned with communicating the data needed, not how the data are presented. But the separation can be unenforceable at times; we have a long history of defining data representations "as seen by the user" ? see, for instance, RFC 1685, which talks about how to write down X.400 email addresses. 3.3 Operations likely to be impacted by internationalization A basic level of internationalization is text representation. A protocol where it is not possible to send an Arabic letter SAD (U+0635), and let the recipient recognize this as such, is useless for communication in Arabic. This was addressed in RFC 2277, "IETF Policy on Character Sets and Languages". This is sufficient for handling text where that text is not treated further by the protocol endpoint entities. But there are a number of common operations that require the protocol designer to do more thinking and specification when dealing with an internationalized context: draft-alvestrand-i18n-howto-01.txt [Page 7] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 . Equality tests ? for instance deciding whether a typed string is identical to an username, or (even worse) a password . Matching. If the protocol has any operation where one party gives a text element, and the other party performs an action based on the content of that text element, matching must take place. This needs specification. Typical sources of confusion include: . What characters match (does a SPACE match a NON-BREAKING SPACE? Does A match a? Does LATIN LETTER A match GREEK LETTER ALPHA?) . What, if anything, is used as "units" in a match? The concept of "word" can get very tricky with languages like Thai, which often do not use word separator characters. . How many characters there are. This is especially a problem when one uses "regular expressions", which can specify (for instance) "A and B, with exactly one character between them" ? does A followed by COMBINING RING ABOVE followed by B match or not? . Collation (sometimes called sorting). If the protocol requires elements to appear in a consistent order, collation needs specification. Collation will often need far more information than matching in order to provide the results the user expects; a collation based on codepoint value ("binary sort") is useless to the user except for the rare case where he does not care what the order is, as long as it is consistent. A common example is the case sensitivity problem; on Unix with the "C" locale, "Bread" is sorted before "apples", while under Windows, "Bread" is sorted after "apples", because Windows disregards case when sorting file names. . Canonical forms. If the protocol ever expects to binary compare two objects for equality, or compute checksums over the objects as done for digital signatures, the implementations will often want to increase the probability that if a human looking at the data in the object thinks that it is unchanged, it actually compares equal. The most common method of doing this is to define a single "canonical" form for the data. . Field truncation. In single-byte encodings, one is guaranteed that a field value produced by truncating a longer value is at least a valid string. With multibyte encodings, this is not the case; with variable-length encodings like UTF-8, there is no way to know without inspecting the string where legal truncation points may be. (In UTF-8 one can find a legal point by inspecting relatively few octets around the cut point; in ISO 2022 based encodings, it may require significantly more effort) . Checks for legal and illegal characters. In some cases, one wants to specify things like "no spaces". One then has to consider whether draft-alvestrand-i18n-howto-01.txt [Page 8] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 this means no SPACE (U+0020) no space (Unicode class Sp) or no separators (a class that includes TAB, for instance). . Bi-directional issues. If a protocol element (for instance an URI or a domain name) contains multiple elements of different directionality, what is the directionality of the separator elements? (This makes display REALLY awkward?.) An example treatment of this problem can be found in [IRI-BIDI]. 3.4 How to tell whether to identify a script or a language In many applications, the application is well served as long as a string can be entered, stored and displayed correctly to the end user. In other applications, there is some degree of interaction between the meaning of the string and the action to be applied to it; in these cases information about language is critical to make a correct decision. Approaches to language identification usually fall into 3 categories: . Guess the language (this requires a reasonably large chunk of text for accurate determination; with closely related languages, such as Norwegian and Danish, the required chunk can be in the hundreds of words) . Let a recipient (human) user identify the language, and apply the appropriate action manually . Make the application language independent, dealing with "words", and let the user define (for instance by configuration or by choice of words in search interfaces) what words should be considered. Which one is appropriate depends on context. Typical operations where language information is needed: - Dispatching on language: Trying to route an incoming query to a person who can understand it. - High quality display ? due to the nature of the Han unification performed in Unicode, some native speakers claim that one must use different fonts for representing the same character codepoint in Japanese and in Chinese. The same problem occurs in some languages for the Cyrillic fonts. - Text to speech processing - Selecting an appropriate name ? "Feuerwehr" versus "Fire station" in a German airport; "Bruxelles" versus "Brussels" on a map of Belgium draft-alvestrand-i18n-howto-01.txt [Page 9] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 Things to consider when you decide what language information you need: - How much does it matter if you don't know the language? - How precise do you need language to be? If you mark something as "US English", will the Right Thing happen when the recipient understands only "English"? If you mark it as "Nynorsk" (language code "nn"), will the recipient who indicates a desire for Norwegian ("no") or English ("en") see the right content? Examples of things that are really script issues: - Displaying and on either side of a picture ? as long as the correct shapes are generated, the user does not care which language they are considered to be in - I have a business card with on it, and the keyboard's keycaps have ASCII legends, and I don't know how to use it to enter Arabic characters HTML 4.01 Section 8.1, "Specifying the language of content: the lang attribute", http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.1 gives a reasonable treatment of language tagging in the context of HTML. Many problem areas that turned out to have a script solution can be regarded as solved (at protocol level) when the carrier is defined to support ISO 10646. 4. Specific choices in sorting, matching and canonicalization options The cardinal rule of protocol internationalization should be: DO NOT INVENT ANYTHING IF YOU CAN AVOID IT. There are a number of ready-made things available, and a number of pitfalls that these things have already dealt with. However, there is no substitute for actually understanding the tools you are using. (specifics: Unicode identifier definition, UTF-8, ACAP/IMAP comparator registry, IDN nameprep, UTR-15 canonicalization, case- folding?..suggestions!) draft-alvestrand-i18n-howto-01.txt [Page 10] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 4.1 Internationalized encodings When you transport I18N script across the wire, you don't actually transport the script itself. You are transporting the bits which represent the script. How the bits are assembled and disassembled from scripts are dependent on character sets and encodings. I18N is not just a simple "8-bit clean" problem. ISO10646 is a character set with a very large number of characters (94.000+ of which have defined meanings in Unicode 3.1) and thus "8- bit" is technically not sufficient. An encoding is how you transport an I18N script through your constrained environment. It is STRONGLY recommended that ISO10646, and ONLY that, be used as a reference character repertoire. When one encoding that is easy to retrofit into an ASCII/8-bit environment is desired, and variable length encoding is acceptable, UTF-8 is the preferred encoding. In other contexts, a four-octet encoding, possibly supplemented by a compression function, might be appropriate (UCS-4/UTF-32BE). This MUST ONLY be used in big-endian order. (Note that functions that involve encryption almost always include a compression function.) UTF-16 suffers from the endianness problem (UTF-16BE vs UTF-16LE), and from the likelihood of badly implemented surrogate support; UTF-16 is NOT RECOMMENDED. Having two encodings defined inside a single protocol is a REALLY, REALLY BAD IDEA. DO NOT DO THIS. If you allow multiple encodings for a piece of text, the encoding must be labelled. The MIME protocol has shown that, while adequate, this is a bad idea. Sending software will use obscure encoding that the receiving software cannot handle. Worse yet, sending software will encode something with an obscure label for which there is a more common equivalent, but this still prevents the receiver from interpreting it. draft-alvestrand-i18n-howto-01.txt [Page 11] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 Using a single encoding avoids this problem. 4.2 Normalization Normalization is needed when you want canonical forms of scripts where one gets string input from multiple sources and want to compare them or show them to each other, e.g. in cases when you need to do matching on "functional equality", comparison or sorting of I18N elements. If normalization is needed, a good starting reference would be Unicode UTR-15 UTR-15 specifies multiple forms of normalization; this document recommends normalization form C when dealing with text, and form KC (or equivalently ? restricting the character repertoire to some subset of that which is invariant under KC normalization) when limiting namespaces for identifiers. ISO/IEC 10646 contains characters that look similar or identical to each other. For example, U+0041 (LATIN CAPITAL LETTER A) looks just like U+0391 (GREEK CAPITAL LETTER ALPHA) in most fonts; there are literally hundreds of other examples. In some cases, characters that have very similar meaning but different looks can be normalized with minimal loss of functionality, but full normalization to prevent visually-similar characters is not feasible without losing character meaning and thus possibly confusing typical users. Note that normalization is not enough to convert the matching problem into a binary comparision problem; see section 3.3 Do remember that normalization is an one-way function which will not preserve the original form. 4.3 Choosing limited character sets for "names" In quite a few cases, there is a need to support a limited character set for something like "names", where more characters than ASCII are needed, but large swaths of special things (spaces, punctuation and so on). There is a lot of work already done on this; in particular, the Unicode "identifier" definition [REFHERE] and the limited range of characters used in the IDN domain name definition [REFHERE] are candidates. draft-alvestrand-i18n-howto-01.txt [Page 12] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 4.4 Comparator functions As alluded to above, deciding how to compare two strings is a hard task. What's more, the number of ways in which people want to compare strings is growing, not shrinking. This means that within a protocol that is intended to serve many purposes, you may need a means to name ways of comparing strings. This need has been seen before, and attempts to fill it include: . The ACAP/IMAP comparator registry [REFHERE] . The ISO Cultural Conventions registry [REFHERE] The target of the latter is far wider than the issue of string comparators, but it also includes this. 5. Tricks to shoehorn stuff into older protocols Very rarely is the protocol redesigner given a clear slate, upon which he can deploy properly targeted internationalization. Most of his effort must be spent in figuring out how to create modifications to the protocol that allow the protocol to offer the features requested by the international user community, while still not causing undue disruption for users who use older versions of the protocol. 5.1 Redefining "text" as UTF-8 Most protocols with text in them defined without thought for internationalization have one of three definitions of text: . ASCII . Latin1 (ISO 8859-1) ? this is common for protocols developed in Western Europe . Unspecified octets said to carry text The last category may in practice be like the first, because nothing but ASCII was ever used, or the first may be like the last, because people were quietly ignoring the "ASCII only" requirement. In theory, one can shoehorn internationalized text into the first and last case by defining that any non-ASCII byte be considered part of an UTF-8 character (an extension), and into the last case by defining that draft-alvestrand-i18n-howto-01.txt [Page 13] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 only UTF-8 is legal to carry (a restriction). In practice, the issue is fuzzier. . What will be the reaction of old implementations on seeing extended characters? Ignore, barf or crash? . To what degree will old implementations send non-ASCII, non-UTF-8 data to new implementations? What will happen when they do? In protocols that do version negotiation, there is a theoretical answer that says that you "just" move to a new version of the protocol, and negotiation will take care of it. However, this is not trivial: . When version upgrade has never been done before, the negotation machinery is untested. . When version upgrade has been common, implementations may choose to ignore a "minor" version number difference. . When the strings involved are identifiers, communication between old and new versions is troublesome: what should one do when an identifier cannot be represented in the old version of the protocol, yet needs to be referred to? . When protocol violations, such as putting Latin-1 in an ASCII-only field, has been common in an old version, how should the new version behave when faced with such violations? The problems are endemic to any protocol with versions, but are often brought to the fore by internationalization. This has tempted many to go the route of "just" declaring a different interpretation of strings, without changing the protocol version number or doing option negotiation to enable the feature. The case of Latin-1 (or, equivalently, Shift-JIS) is especially troublesome, because there are byte sequences that can be interpreted either as UTF-8 or as Latin-1. This means that even implementations ready to tackle both encodings can be "fooled" into displaying incorrect text to their users. This is worrying. In protocols with "feature negotiation", such as SMTP or LDAP, the problem of versioning grows more complex: Any extension must be considered for its interaction with any other extension ? does the "character set" option interact with the "regexp search" option? With the "return results later" option? With the "foobar" ooption? The effort of evaluating ? and implementing ? an option can quickly turn into a function of the square of the number of options. draft-alvestrand-i18n-howto-01.txt [Page 14] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 5.2 Example retrofits More examples of protocol internationalization can be found in [I18N- CASES]. 5.2.1 MIME The Multipurpose Internet Mail Extensions are probably the most widely deployed set of retrofits of internationalization in a preexisting protocol. It delivered: . The ability to have multiple character sets in mail bodies . The ability to have multiple character sets in parts of mail headers It failed to deliver: . An unique encoding of a text to a transferred string; the sender can make multiple encodings from the same message body. This has implications for attempts to use digital signatures, among other things. . A language tagging ability for mail header components. A later attempt to add this has failed to see visible deployment. In some areas, it seems that MIME has delivered "labeled non- interoperability", giving senders the ability to specify what it sends, but not providing a means to fit that to a generally accepted subset, or to limit the sending to what can usefully be understood by the recipient. But it has been very widely deployed, and has improved interoperability among internationalized mail software enormously. A more thorough analysis is given in [I18N-CASES]. 5.2.2 SNMP version 3/SMI version 2 In the original Simple Network Management Protocol, a lot of fields were labeled "text". In the US context, these were naturally considered ASCII; in other contexts, usually ASCII was used, but on occasion, other charsets such as Latin-1 or iso-2022-jp could be found. In the course of moving SMI version 2 to Draft and Standard, two considerations were added: . A DISPLAY-HINT called "u" was added, indicating that the expected display format of the variable was as an UTF-8 string. draft-alvestrand-i18n-howto-01.txt [Page 15] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 . An understanding that putting text that was neither ASCII nor UTF- 8 into a text variable was not consistent with the protocol In the course of updating older MIBs, there was extensive discussion about whether to add new variables with display-hint UTF-8 or to redefine variables that had previously been understood as "text, any charset" or "ascii" to be UTF-8. <> <> STILL BRAIN DUMP: Beware of third answers to what has previously been binary questions (history: NIS yes/no hostname answer did really rotten job on TEMPFAIL) Undisplayable characters ? hieroglyphs at the user interface. Both in names and other contexts ?names are worse. The copy/paste problem ? including where the paste buffer is in the brain of the user. "there are things better done in protocol/servers, and things better done in UI/client software/user brain, and the harder problem is realizing which category they belong to". 6. Security Considerations The security implications of improperly done internationalizations can be considerable. For instance: . If one does not specify whether input lengths are counted in characters or octets, buffer overflows are likely. . If multiple representations of the same character are allowed, multiple items can appear to the user to have the same name, even though they are distinct. This can be used as an attack. (Note that this is hard to avoid ? see section 4.2 for more on this) . Signature failures (erroneous success or erroneous failure) due to improper canonicalization are a security problem, too; a server canonicalizing a name before comparing will never be able to match on a certificate containing an uncanonicalized name, for instance. draft-alvestrand-i18n-howto-01.txt [Page 16] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 . Code being forced down "interesting" code paths because a string is used in normalized form in part of the code and unnormalized elsewhere. (example: the overlong UTF-8 code sequence, where one encodes leading zeroes so that (for instance) a carriage return can be slipped past the code that checks that a command line is just one line. This was the reason for outlawing overlong UTF-8 sequences in the Unicode Standard, version 3.1, section D.36) 7. Acknowledgements This document has benefited from many rounds of review and comments in various fora of the IETF and the Internet working groups. Any list of contributors is bound to be incomplete; please regard the following as only a selection from the group of people who have contributed to make this document what it is today. In alphabetical order: Martin Duerst (apologies for the lack of internationalization) Patrik Faltstrom (aftloi) Paul Hoffman, John Klensin, James Seng (aftloi) 8. Author's Address Harald Tveit Alvestrand Cisco Systems Weidemanns vei 27 7043 Trondheim NORWAY EMail: Harald@Alvestrand.no Phone: +47 73 50 33 52 9. References [ISO 639] ISO 639:1988 (E/F) - Code for the representation of names of languages - The International Organization for Standardization, 1st edition, 1988-04-01 Prepared by ISO/TC 37 - Terminology (principles and coordination). Note that a new version (ISO 639-1:2000) is in preparation at the time of this writing. draft-alvestrand-i18n-howto-01.txt [Page 17] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 [ISO 639-2] ISO 639-2:1998 - Codes for the representation of names of languages -- Part 2: Alpha-3 code - edition 1, 1998-11-01, 66 pages, prepared by a Joint Working Group of ISO TC46/SC4 and ISO TC37/SC2. [ISO 3166] ISO 3166:1988 (E/F) - Codes for the representation of names of countries - The International Organization for Standardization, 3rd edition, 1988-08-15. [RFC 1521] Borenstein, N., and N. Freed, "MIME Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, September 1993. [RFC 2026] The Internet Standards Process -- Revision 3. S. Bradner. October 1996. [RFC 2028] The Organizations Involved in the IETF Standards Process. R. Hovey, S. Bradner. October 1996. [RFC 2119] Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. March 1997. [RFC 2234] Augmented BNF for Syntax Specifications: ABNF. D. Crocker, Ed., P. Overell, November 1997. [RFC 2616] Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee. June 1999. [RFC 2860] Memorandum of Understanding Concerning the Technical Work of the Internet Assigned Numbers Authority. B. Carpenter, F. Baker, M. Roberts. June 2000. draft-alvestrand-i18n-howto-01.txt [Page 18] Guidelines for protocol internationalization Harald Alvestrand draft-alvestrand-i18n-howto-01.txt Expires May 2002 [IRI-BIDI] Internet Identifiers and Bidirectionality. Martin Duerst. Work In Progress (draft-duerst-iri-bidi-00.txt) draft-alvestrand-i18n-howto-01.txt [Page 19]