Internet Draft Authors: Xiang Deng CNNIC September , 2001 Expires in six months The Selective Module for The Conversion Between Traditional/Simplified Characters in DNS Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Terminology The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Abstract This document puts forward a practical scheme on implementing the selective module of the Conversion between Traditional/Simplified characters in DNS through setting up postulated conditions and consequence analyzing process. 1. The Origin of the Issue The original intention for establishing the IDN WG is to permit the Internet users, making no distinguish of the nationality and race, to locate and access the Internet resources by using the languages they familiar with. Therefore, the common dream of all the members of such IDN WG is to achieve this object. The problem of Traditional/Simplified Chinese Conversion ("TSconv" for short) is not simply the issue of the application of Chinese characters in DNS. The essential point is how to truly embody the intention for establishing the IDN WG. >From the angle of character encoding, the Conversions are dealing with how to build up a mapping relationship between different glyphs of one single character in certain language environment (e.g. Chinese Language environment). 2. The Language Relativity of the Internationalized Domain Name Whatever the ACE-Nameprep-IDNA ("ANI" for short) solution adopted by IETF IDN WG, or the solutions of UTF-8 or UNICODE, we must properly resolve the problem of converting the native Language that users type into the standard format (e.g. ACE, UTF-8, UNICODE, etc.). In the ANI solution , the flow chart is supposed to be: +-----------------------------------------+ | User Typing the Code of Native Language | +-----------------------------------------+ | V +----------------------------------------------+ | the Code of Native Language -> UNICODE (NtoU)| +----------------------------------------------+ | V +-----------------------------------------+ | +------------+ | | IDNA | Nameprep | | | +------------+ | | | | | V | | +------------+ | | | Resolver | | | +------------+ | | | | | V | | +------------+ | | | DNS | | | +------------+ | +-----------------------------------------+ Although each of the languages in the world has their distinguishing specificality, it still cannot change the goal of the IDN WG ¿C to bring all the languages in the world into the Internet. Therefore, the WG shall firstly accomplish the course of "NtoU" (Native Language to UNICODE Conversion) in both technical and strategic term before carrying out the IDNA. Suppose that the NtoU will be accomplished by user's operating system, engineers should firstly know the coding type of the language (i.e. Language coding type) that the user input, then they will be able to convert such language into UNICODE accurately. Since the language coding type dose exist, it is possible for us to define the specific NtoU in accordance with the characteristics of certain language. It is also the point in which the IDN technology can fully demonstrate its creativity. Examples presented below are concerning about the T/S Conversion. We are aiming at illustrating different technical solutions. 3. The orientation of the language specificality in the ANI solution (1). Situation A: Define the Language specificality in NtoU, and accomplish NtoU by application programs Supposes: a. NtoU does not belong to the Nameprep, NtoU is accomplished by application programs; b. If one registers a domain name of ".xx" domain, the system will strictly follows the rule of TSconv. All the data loaded in the DNS server will be Simplified Chinese Domain Names complying with this rule; c. If one registers a domain name of ".yy" domain, the system will strictly follows the "yyRule". All the data loaded in the DNS server will be Domain Names complying with the yyRule; d. The domain ".zz" does not comply with any convertible regulation; e. The application 1 (App.1) complies with the TSconv rule in NtoU; the App.2 complies with the yyRule in NtoU; the App.3 does not comply with any rules. Conclusion: a. Users of App.1 can access domain names of domain ".xx",".yy",".zz" accurately by typing simplified Chinese characters ("SC" for short); The users can access ".xx" domain accurately by typing Traditional Chinese characters ("TC" for short). But the TC domain names in the domain ".yy" and ".zz" can not be accessed by the users of App.1 permanently. b. Users of App.2 can access domain names of domain ".xx",".yy" and ".zz" accurately by typing characters comply with yyRule. Users can access ".yy" domain accurately by typing characters which do not comply with yyRule; But domain names in the domain ".xx" and ".zz" that do not comply with yyRule can not be accessed by the users of App.2 permanently. c. Users of App.3 can access domain names of domain ".xx", ".yy" and ".zz" by typing characters that comply with both TSconv Rule and yyRule Rule. Users can not access ".xx" domain names by typing characters that do not comply with the TSconv rule. Users can not access ".yy" domain names by typing characters that do not comply with yyRule. Users can access domain names in domain ".zz" by typing characters that do not comply with either TSconv or yyRule. Summary: Adopting this method that separate NtoU from Nameprep will bring about variance of the applications and chaos among users. (2). Situation B: Define the Specificality of the language in the Nameprep Supposes: a. NtoU belongs to the Nameprep, NtoU is accomplished in the Nameprep. b. If one registers a domain name of domain ".xx", the system will strictly follows the TSconv. The data loaded in the DNS server will be only the Simplified Chinese domain names that comply with TSconv. c. If one registers a domain name of domain ".yy", the system will strictly follows yyRule, the data loaded in the DNS server will be only the domain names that comply with yyRuleú© d. Domain ".zz" does not comply with any rules. Conclusions: a. User can access domain names of domain ".xx", ".yy" and ".zz" accurately by typing SC that comply with yyRule; Users can access ".xx" domain accurately by typing TC; Users can access ".yy" domain accurately by typing characters that do not comply with yyRule; But domain names in domain ".xx" and ".zz"" that do not comply with yyRule can not be accessed by users permanently; Domain names in domain ".yy" and ".zz" that do not comply with TSconv rule can not be accessed by users permanently; Summary: a. achieved the coherence of the applications b. Domain names that do not follow the rules can not be accessed by users. (3). Situation C: Register Solution of language encoding tag Supposes: a. NtoU belongs to the Nameprep, NtoU is accomplished in the Nameprep; d. If one registers a Chinese domain name of domain ".xx" or ".zz", the system will strictly require user to register two domain name according to the language encoding tag and two different rules (TSconv and STconv). Conclusions: a. Two records will be created once a Chinese domain name being registered. In the domain ".xx" or ".zz", one record complies with TSconv and the other one complies with STconv. b. Users can access domain names in the domain ".xx" and ".zz" by typing SC or TC whether they comply with TSconv or not. Users can access domain names in the domain ".xx" and ".zz" by typing SC or TC whether they comply with STconv or not. c. On the analogy of this, other language rules can be applied. Summary: a. achieved the coherence of the applications b. achieved the coherence of the access results 3. Realizing Scheme (1). Registration Scheme +--------+ +--------+ | .xx | | .zz | +--------+ +--------+ ^ ^ ^ ^ | \ / | | / \ | +------------+-----------+ | TSconv | STconv | +------------+-----------+ | Register simultaneously| +------------------------+ ^ ^ \ / +------------+-----------+ | User (SC)|(TC) | +------------+-----------+ (2). Resolution process +-------+ +-------+ +-------+ | .xx | | .zz | | .yy | +-------+ +-------+ +-------+ ^ ^ ^ | | | +------------+------------+ ^ ^ ^ | | | +--------+ +--------+ +--------+ | TSconv | | STconv | | yyRule | +--------+ +--------+ +--------------------+ |user(SC)| |user(TC)| |user(other language)| +--------+ +--------+ +--------------------+ 4. Authors' Address Xiang Deng China Internet Network Information Center NO.4 South 4th ST. Beijing, P.R.China, 100080, PO BOX 349 Tel: +86-10-62619750 5. Acknowledgement Yang Yu YunGang Chen Yanfeng WANG XiaoDong Li GuoNian Sun 6. References [IDNREQ] Requirements of Internationalized Domain Names, Zita Wenzel, James Seng, draft-ietf-idn-requirements [NAMEPREP] Paul Hoffman & Marc Blanchet, Preparation of Internationalized Host Names, draft-ietf-idn-nameprep [RFC2119] Scott Bradner, Key words for use in RFCs to Indicate Requirement Levels, March 1997, RFC 2119. [STD13] Paul Mockapetris, Domain names - implementation and specification, November 1987, STD 13 (RFC 1034 and 1035). [UNAME] Internationalized Domain Names and Unique Identifiers/Names Li Ming TSENG, Jan Ming HO, Hua Lin QIAN, Kenny HUANG draft-ietf-idn-uname [TSCONV] Traditional and Simplified Chinese Conversion Xiao Dong Lee, Nai Wen Hsu, Erin Chen, Guo Nian Sun draft-ietf-idn-tsconv [ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane. [Unicode3] The Unicode Consortium, "The Unicode Standard -- Version3.0", ISBN 0-201-61633-5.