Internet Draft Deuk-kul Jang draft-dkjang-idn-00.txt So-myung Ind June 2, 2000 Expires in six months Multilingual domain name divided by characters key Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This description addresses the method for using multilingual (internationalized) domain names under the current DNS. Further, the method for converting a multilingual domain name expressed in a native language, into a traditional US-ASCII domain name compatible in current DNS, is addressed. Contents 1. Introduction 1.1. Definitions and Conventions 1.2. Summary 2. Multilingual key 3. Language key 4. Character substitute 5. Composition of Character substitute 5.1. In case the number of characters is below 36 characters; 5.1.1. When the number of characters is below 9 characters; 5.1.2. When the number of characters is above 9 and below 26; 5.1.3. When the number of characters is above 26 and below 35; Expires 3rd of Jan 2001 [Page 1] Internet Draft June 2, 2000 Multilingual domain name divided by characters key 5.2. In case the number of characters is above 36 characters; 5.2.1. Digits, alphabets and hyphen 5.2.2. Multilingual character 5.2.2.1. When the number of characters is above 35 characters and below 1260 characters; 5.2.2.2. When the number of characters is above 1260 characters and below 1296 characters; 5.2.2.3. When the number of characters is above 1296 characters and below 45,360 characters; 5.2.2.4. When the number of characters is above 45,360 characters and below 47,952 characters; 5.3. Characters in the plane besides BMP 6. TLD (Top level domain) 7. Conversion and display 7.1 Converting multilingual domain names into the traditional ones. 7.2 Display of multilingual domain name 8. Different language 9. References 10. Author's address 1. Introduction Under the current DNS (domain name system), the IP address (which is a combination of numbers) and the domain name are used. The purpose of the domain name is to use more familiar and memorable names than the IP address. Nevertheless, because of the restriction of using only US-ASCII characters in domain names, and although some persons don't speak English, they have to use unfamiliar English domain names. For them, it may not be much different from an IP address. As a result, it is difficult to find home pages of even famous companies without knowing their English domain names in advance. The top level domains are designated in English for international recognition. As for the second level domain, we have also used English letters by using abbreviated English words which almost seem to be secret codes (for example, ac, co, go, or, re, nm, etc.). We have to write as a 2LD `.Seoul.kr' instead of Korean Seoul. Furthermore, in order to control computers and use the Internet with voice orders in the future, multilingual domain names are indispensable. 1.1. Definitions and Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", Expires 3rd of Jan 2001 [Page 2] Internet Draft June 3, 2000 Multilingual domain name divided by characters key "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. A multilingual domain name means the domain name expressed in the native language in the user's interface. A traditional domain name means the domain name compatible with the current DNS. A converted domain name is the same as the traditional domain name but it is converted from a multilingual one to be compatible with the current DNS. A character substitute is a 'string of ASCII characters' that replaces a native character in a multilingual domain name when the name is converted into a traditional one. A multilingual key is a character assembly located at a specific position of the converted domain name and represents that it is the converted domain name from a multilingual one. A language key represents the language from which the domain has been converted into the converted domain names. 1-2. Summary According to [RFC1034], domain names must start with a letter (a through z), end with a letter or digit (0 through 9), and have only letters, digits, and hyphen as interior characters in the current DNS. Therefore native characters in a multilingual domain name must be converted into US-ASCII characters in order to register or log in. Hence, 'character substitutes' are made to express native characters (in the multilingual domain name) with US-ASCII characters. To distinguish converted domain names from traditional ones, a 'multilingual key' is defined and added to every converted domain name. To represent the language from which the domain has been converted, the 'language keys' are assigned for every language In this system, when a multilingual domain name is converted to the traditional domain name; 1) The multilingual key will be included automatically in All converted domain names. 2) The language key will be included automatically in the converted domain name. 3) All characters will be replaced by 'character substitutes'. The conversion from a multilingual name to a traditional one, or the reverse conversion from a converted name to the original one Expires 3rd of Jan 2001 [Page 3] Internet Draft June 3, 2000 Multilingual domain name divided by characters key (multilingual), is performed by the conversion program installed on the user's computer. The conversion program will run when the user inputs the domain name including his/her own language and begins to use the Internet services. For the display on the screen, by the multilingual key included in the domain name, the program will be run. Therefore, users can use all the Internet services in their own language at their convenience. 2. Multilingual key By placing `a certain specific ASCII characters' in a certain `specific location' in the domain name, a "multilingual key" that represents a converted domain name from a multilingual one, can be made with this `specific character in the specific location'. When a domain includes this `multilingual key' at the specific location, the `multilingual key' indicates that the domain name was converted from the multilingual one. When a user inputs a multilingual domain name that includes his/her native language in order to log in, the system adds this multilingual key with a language key first and converts it into a traditional domain name. For display, the system checks whether the multilingual key is contained. If so, according to the language key, the system converts that domain name to a multilingual one. Note. To avoid confusion, the multilingual key should be a character(s) that is not commonly used. Further, it is recommended that the domain name, containing the same character(s) in the same location, not be registered as the traditional domain name. 3. Language key One or two US-ASCII characters are assigned as a 'language key' for every language currently defined by ISO 10646/Unicode. This language key follows the 'multilingual key' in the converted domain names and represents the kind of language from which the domain has been converted. By separating domains according to language, multilingual characters can be expressed in minimum numbers of ASCII characters. If the language key is not for the user's native language, but for other languages, the system shall not convert and display it to the monitor as ASCII characters. Expires 3rd of Jan 2001 [Page 4] Internet Draft June 3, 2000 Multilingual domain name divided by characters key As a language key, if two ASCII characters are given for every language (like kr, jp, cn), we can manage 1,296 languages (=36x36) theoretically. 4. Sequence of 36 characters In order to substitute all characters (used in multilingual names) by the least number of US-ASCII when the multilingual domain name converts into a traditional name, the 'sequence of 36 characters' is made with letters(a-z) and digits(0-9) of 36 characters. They 36 characters may be used in the middle or at the end of the domain name. 5. Composition of Character substitute Character substitute consists of the following: a. Separate characters currently defined (and being added) by ISO10646/Unicode according to native languages. b. For each language, all of native characters as well as alphabets and digits are arranged in the 'sequence of 36 characters'. c. Set the arranged 'sequence of 36 characters' as the 'character substitute' for the character. According to the entire numbers of characters of one language, assign as follows: 5.1. In case the number of characters is below 36; 5.1.1. When the number of characters is below 9; 5.1.1.1. Assign multilingual characters 1-9, and digits (0-9) to 00-09. The alphabets (a-z) are used as they are without any character substitutes. 5.1.1.2. As an alternative to 5.1.1.1 this arrangement may be used: Assign multilingual characters to the 01-09 of the `sequence of 36 characters' and the number '0(zero)' to '00'. digits (1-9) and the alphabets (a-z) are used as they are without any character substitutes. 5.1.2. When the number of characters is above 9 and below 26; Expires 3rd of Jan 2001 [Page 5] Internet Draft June 2, 2000 Multilingual domain name divided by characters key 5.1.2.1. Assign multilingual characters to a-z, and the alphabet a-z to 0a-0z, and digits (0-9) are used as they are without any character substitutes. 5.1.2.2. As an alternative to 5.1.2.1 this arrangement may be used; Assign multilingual characters to 0a-0z, and digits (0-9) and the alphabets (a-z) are used as they are without any character substitutes 5.1.3. When the number of characters is above 26 and below 35; 5.1.3.1. Assign multilingual characters to 1-z in the order, and digits (0-9) and alphabets (a-z) to 00-0z. 5.1.3.2. As an alternative to 5.1.3.1 this arrangement may be used; Assign multilingual characters to 01-0z, digit 0 to 00 and digits of 1-9 and alphabets (a-z) are used as they are without any substitution. 5.2. In case the number of characters is above 36 characters; 5.2.1. digits, alphabets and hyphen. Assign digits (0-9) and alphabets (a-z) to 00-0z of the 'sequence of 36 characters'. A hyphen is used as it is without substitution (however, in case of 5.2.2.2. and 5.2.2.4 below, assign it to `0-'). 5.2.2. Multilingual character According to the numbers of characters, multilingual characters are assigned as follows: 5.2.2.1. When the number of characters is above 35 characters and below 1260 characters; Assign multilingual characters to 10-zz (35x36=1260) of the 'sequence of 36 characters' in order. 5.2.2.2. When the number of characters is above 1260 characters and below 1296 characters; By attaching the letters using a hyphen from `-0' to`-z' to the end of the sequence of `zz', the representation range of two ASCII characters is extended to 1296 multilingual characters. 5.2.2.3. When the number of characters is above 1296 characters and below 45,360 characters; Expires 3rd of Jan 2001 [Page 6] Internet Draft June 3, 2000 Multilingual domain name divided by characters key Assign multilingual characters to the three digits of '36 characters sequence', 100-zzz, in order. 5.2.2.4. When the number of characters is above 45,360 characters and below 47,952 characters; As stated above in 5.2.2.2, the three ASCII characters using a hyphen, can extend the representation range to 47,952 (36x37x36) multilingual characters. 5.3. Characters in the plane besides BMP Same as characters in BMP, those characters in other plane of Canonical form are divided according to the language and arranged in '36-characters sequence' independently. Therefore, which plane those characters are located in makes no difference. Table. Number of US-ASCII needed to represent characters in multilingual domain name ----------------------------------------------------------------------- Kind of Number of Characters Character 0-9 10-26 27-35 36-1296 296-47952 ----------------------------------------------------------------------- Native 1 1 1 2 3 Alphabets 1 2 2 2 2 Digit 2 1 2 2 2 ----------------------------------------------------------------------- 6. TLD (Top level domain) The top level domains are limited in numbers (.com, .org, .net,ير). Thus, instead of a substitute, they are directly translated. 7. Conversion and display 7.1 Converting multilingual domain names into the traditional ones. When a user tries to enter a multilingual domain name into an application to use an Internet service, the conversion program runs by multilingual character(s) included in the domain name. Then, the program converts the multilingual domain name into the traditional one by including the 'multilingual key' and the 'language key' mentioned above, and by replacing each character with its 'character Expires 3rd of Jan 2001 [Page 7] Internet Draft June 3, 2000 Multilingual domain name divided by characters key substitute', and hands over the converted domain name to the application handling the Internet service. 7.1.1 Example When, for example, a multilingual domain name composed with Korean, alphabets and digits, "0xb300/0xd55c/0xb9c7/0xad09/ks-23.0xd68c/0xc0ab" or "0xb300/0xd55c/0xb9c7/0xad09/ks-23.com", is entered, the following exemplary result can be given as a model. "0xb300/0xd55c/0xb9c/0xad09" are 4 characters of Korean meaning Korea. And "0xd68c/0xc0ab" means company. (Assumption): The multilingual key is z-, and the language key for Korean is k. digits 1-9 ===> 00-09 of 36-characters sequence The Korean characters (11,172 characters) correspond to 5.2.2.3 above. Therefore, The 11,172 Korean characters at position 0xac00-d7a3 (hexadecimal) in the ISO 10646/unicode are arranged in 100-9mb of the 36-characters sequence in the order. 0xb300 (the 1,793th from the 0xac00) ===> It corresponds to "2ds" of the 36 sequence. 0xd55c (the 10,589th from 0xac00) ===> It corresponds to "964" of the 36 sequence. 0xb9c7 (the 3,528th from 0xac00) ===> It corresponds to "38z" of the 36 sequence. 0xad09 (the 366th from 0xac00) ===> It corresponds to "1a5" of the 36 sequence. (Conversion) Under user interface: "0xb300/0xd55c/0xb9c7/0xad09/ks-23.0xd68c/0xc0ab" is entered, and it is converted to the traditional name following. multilingual key: z- language key: kr character substitutes for Korean 0xb300: 2ds 0xd55c: 964 0xad097: 38z 0xad09: 1a5 character substitutes for alphabets and hyphen k: 0k s : 0s hyphen - : - character substitutes for digits 2: 02 3: 03 Expires 3rd of Jan 2001 [Page 8] Internet Draft June 2, 2000 Multilingual domain name divided by characters key direct translation . 0xd68c/0xc0ab : .com (Final result): converted domain : z-kr2ds96438z1a50k0s-0203.com 7.2 Display of multilingual domain name When a domain name includes the multilingual key, and the language key in that domain conforms to the language in the text editor, the program converts, (reverse of 7.1), by deleting the multilingual key and language key, and by replacing the rest of the ASCII characters with native characters. And then the multilingual domain name is displayed to the monitor in the native language. But if the domain does not contain a multilingual key, or the language key does not conform to the language in the text editor of the user's computer, the domain name is displayed to the monitor as it is, without any conversion. 8. Different language When a user logs in another multilingual domain name in a different language zone, (e.g., Japanese user tries to log in the Korean domain); If the user does not have a text editor for its language, he/she types and logs in the domain name as it is. 9. References RFC1034 P. Mockapetris "DOMAIN NAMES - CONCEPTS AND FACILITIES" November 1987 10. Author's address Deuk-kul Jang So-myung Ind. Postal address: Kyunggido namyangjushi jingunmyun songnungri 178-6 Republic of Korea Telephone number; 502-3030-308 Fax. Number ; 346-573-6849 E-mail ; dkjang@smind.co.kr Expires 3rd of Jan 2001 [Page 9]