Network Working Group K. Davies
Internet-Draft ICANN
Intended status: Informational February 29, 2012
Expires: September 1, 2012
Representing registration policy for IDNs using XML
draft-davies-idntables-00
Abstract
This memo describes a method of representing the registration policy
that a zone administrator uses for registering Internationalised
Domain Names using Extensible Markup Language (XML). These registry
policies, commonly known as "IDN tables", are used to enforce and
share policy on which specific code-points are permitted for
registrations, and which alternative code-points are considered
variants.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 1, 2012.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
Davies Expires September 1, 2012 [Page 1]
Internet-Draft IDN Table XML representation February 2012
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. IDN Table XML Format . . . . . . . . . . . . . . . . . . . . . 6
3.1. Basic structure . . . . . . . . . . . . . . . . . . . . . 6
3.2. The meta element . . . . . . . . . . . . . . . . . . . . . 6
3.2.1. The version element . . . . . . . . . . . . . . . . . 6
3.2.2. The date element . . . . . . . . . . . . . . . . . . . 7
3.2.3. The language element . . . . . . . . . . . . . . . . . 7
3.2.4. The domain element . . . . . . . . . . . . . . . . . . 7
3.2.5. The description element . . . . . . . . . . . . . . . 8
3.2.6. The variant-classes element . . . . . . . . . . . . . 8
3.3. The data element . . . . . . . . . . . . . . . . . . . . . 8
3.3.1. Variants . . . . . . . . . . . . . . . . . . . . . . . 9
3.4. Example table . . . . . . . . . . . . . . . . . . . . . . 10
4. Processing a label against a table . . . . . . . . . . . . . . 12
4.1. Determining eligibility for a label . . . . . . . . . . . 12
4.2. Determining variants for a label . . . . . . . . . . . . . 12
5. Conversion between other formats . . . . . . . . . . . . . . . 13
5.1. RFC 3743 Language Variant Table . . . . . . . . . . . . . 13
5.2. RFC 4290 Model Table Format . . . . . . . . . . . . . . . 13
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
7. Security Considerations . . . . . . . . . . . . . . . . . . . 15
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Appendix A. RelaxNG Schema . . . . . . . . . . . . . . . . . . . 17
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 20
Appendix C. Editorial Notes . . . . . . . . . . . . . . . . . . . 21
C.1. Known Issues and Future Work . . . . . . . . . . . . . . . 21
C.2. Sample tables and running code . . . . . . . . . . . . . . 21
C.3. Change History . . . . . . . . . . . . . . . . . . . . . . 21
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22
Davies Expires September 1, 2012 [Page 2]
Internet-Draft IDN Table XML representation February 2012
1. Introduction
This memo describes how to use Extensible Markup Language (XML) to
describe the list of permissible code points and variants used in a
zone administrator's policies.
Historically, zone administrators - such as top-level domain
registries - have published their policies using text and HTML based
formats loosely based around the format used to describe a Language
Variant Table in [RFC3743]. [RFC4290] further posts a "Model table
format" that describes a similar set of functionality.
Through the first decade of IDN deployment, experience has shown that
these table formats are difficult to consistently implement and
compare due to their different formats. A more universal format,
such as one using a structured XML format, will assist by improving
machine-readability, consistency, and maintainability of IDN tables.
It will also provide for more complex conditional implementation of
variants that reflects the known requirements of current zone
administrator policies.
Davies Expires September 1, 2012 [Page 3]
Internet-Draft IDN Table XML representation February 2012
2. Design Goals
The following items are explicit design goals of this format:
o MUST be in a format that can be implemented in a reasonably
straightforward manner in software;
o The format SHOULD be able to be checked for formatting errors,
such that common mistakes can be caught;
o An IDN Table MUST be able to express the set of valid code points
that are allowed for registration under a specific zone
administrator's policies;
o MUST be able to express computed alternatives to a given domain
name based on a one-to-one, or one-to-many relationship. These
computed alternatives are commonly known as "IDN variants";
o IDN Variants SHOULD be able to be tagged with specific categories,
such that the categories can be used to support registry policy
(such as whether to list the computed variant in the zone, or to
merely block it from registration);
o IDN Variants MUST be able to stipulated based on contextual
information. For example, specific variants may only be
applicable when they follow another specific code point, or when
the code point is displayed in a specific presentation form;
o The data contained within the table MUST be unambiguous, such that
independent implementations that utilise the contents will arrive
at the same results;
o IDN Tables SHOULD be suitable for comparison and re-use, such that
one could easily compare the contents of two or more to see the
differences, to merge them, and so on.
o As many existing IDN Tables are practicable SHOULD be able to be
migrated to the new format with all applicable logic retained.
It is explicitly NOT the goal of this format to:
o Stipulate what code points should be listed in an IDN Table by a
zone administrator. What registration policies are used for a
particular zone is outside the scope of this memo.
o Stipulate what a consumer of an IDN Table must do when they
determine a particular domain is valid or invalid; or arrive at a
set of computed IDN variants. IDN Tables are only used to
Davies Expires September 1, 2012 [Page 4]
Internet-Draft IDN Table XML representation February 2012
describe rules for computing code points, but does not prescribe
how registries and other parties utilise them.
Davies Expires September 1, 2012 [Page 5]
Internet-Draft IDN Table XML representation February 2012
3. IDN Table XML Format
3.1. Basic structure
The basic XML framework of the document is as follows:
...
Within the "idntable" element rests two sub-elements. Firstly is a
"meta" element that contains all meta-data associated with the IDN
table, such as its authorship, what it is used for, implementation
notes and references. This is followed by a "data" element that
contains the substantive code-point data.
...
...
A document should contain exactly one "idntable" element, and within
that optionally one "meta" element and exactly one "data" element.
3.2. The meta element
The "meta" element is used to express meta-data associated within the
IDN table. It can be used to explain the author or relevant contact
person, explain what the usage of the IDN table is, provide
implementation notes as well as references. The data contained
within is not required by software consuming the IDN table in order
to calculate valid IDN labels, or to calculate variants.
3.2.1. The version element
The "version" element is used to uniquely identify each version of
the table being represented. No specific format is required, but it
is RECOMMENDED that it be a numerical positive integer, which is
incremented with each revision of the file.
An example of a typical first edition of a document:
Davies Expires September 1, 2012 [Page 6]
Internet-Draft IDN Table XML representation February 2012
1
A common alternative is to use a major-minor number scheme, where two
decimal numbers are used to represent major and minor changes to the
table. For example, "1.0" would be the first major release, "1.1"
would be a minor update to that, and "2.0" would represent a major
revision.
3.2.2. The date element
The "date" element is used to identify the date the table was
written. The contents of this element MUST be a valid ISO 8601 date
string as described in [RFC3339].
Example of a date:
2009-11-01
3.2.3. The language element
The "language" element signals that the table is associated with a
specific language or script. The value of the language element must
be a valid language tag as described in [RFC5646]. The tag may
simply refer to a script if the table is not referring to a specific
language. There may be multiple language elements for a table if the
table spans multiple languages and/or scripts.
Example of an English language table:
en
If the table applies to a specific script, rather than a language,
the "und" language tag should be used followed by the relevant
[RFC5646] script subtag. For example, for a Cyrillic script table:
und-Cyrl
3.2.4. The domain element
This optional element refers to a domain to which this policy is
applied.
example
There may be multiple tags used to reflect a list of
domains.
Davies Expires September 1, 2012 [Page 7]
Internet-Draft IDN Table XML representation February 2012
3.2.5. The description element
The "description" element is a free-form element that contains any
additional relevant description. Typically, this field contains
authorship information, as well as additional context on how the
table was formulated (such as with references), and how it has been
applied.
The element has an optional "type" attribute, which refers to the
MIME-type of the enclosed data. If the description lacks a type
field, it will be assumed to be plain text.
The description elements describe information relating to the IDN
table that is useful for the user of the table in its interpretation.
This may explain the history, the rationale, reference sources etc.
It may also contain authorship information.
The "type" attribute may be used to specify the encoding within
description element. The attribute should be a valid MIME type. If
supplied, it will be assumed the contents is a single CDATA element
of that encoding. Typical types would be "text/plain" or "text/
html". "text/plain" will be assumed if no type attribute is
specified.
3.2.6. The variant-classes element
Consumers of the IDN table may classify the generated variants into
different classes using the class attribute, discussed below. This
class attribute allows the registry to apply different policy (for
example, whether the block or register specific generated strings).
The variant-classes block provides human-readable explanations of the
meaning of the classes used in the IDN table.
3.3. The data element
The "data" element contains the code point data the comprises the
registry policy. It describes registry policy using a series of XML
elements that either represent individual code points, or ranges of
code points.
The data may use the "char" and "range" elements to specify code
points, and code ranges.
Discrete permissable code points may be stipulated with a "char"
element, e.g.
Davies Expires September 1, 2012 [Page 8]
Internet-Draft IDN Table XML representation February 2012
Ranges of permissable code points may be stipulated with a "range"
element, e.g.
Codepoints must be expressed in hexadecimal, i.e. according to the
standard Unicode convention without the prefix "U+". The rationale
for not allowing other encoding formats, including native Unicode
encoding in XML, is explored in [UAX42]. The XML conventions used in
this format, including the element and attribute names, mirror this
document where practical and reasonable to do so.
3.3.1. Variants
While most tables typically only determine code point eligibility,
others additionally specify a mapping of code points to other code
points, known as "variants". What constitutes a variant is a matter
of policy, and varies for each implementation.
3.3.1.1. Basic variants
Variants are specified as one of more children of a "char" element.
For example, to map "v" as a variant of "u":
A sequence of multiple code points can be specified as a variant of a
single code point. For example, the sequence of "o" then "e" can be
specified as a variant for an "o with umlaut" (U+00F6) as follows:
It is not possible to specify variants for ranges.
3.3.1.2. Null variants
To specify a null variant, which is a variant string that maps to no
codepoint, use the null codepoint 0000. For example, to mark a
string with a zero width non-joiner to the same string without the
zero width non-joiner:
Davies Expires September 1, 2012 [Page 9]
Internet-Draft IDN Table XML representation February 2012
3.3.1.3. Conditional variants
At its basis, generation of variants are conditional on a specific
code point or set of code points. However, in some instances
registries perform control based on other attributes that can not
solely be determined based on simple code point comparisons. For
example, in some tables utilising the Arabic script, the Arabic
contextual form is a determinant in which variants are used. The
contextual form can not be derived solely from the code point, as the
code point is the same for the various forms.
The IDN table provides for conditioning generation variants on
specific instances as follows, using the "when" attribute.
arabic-initial Based on context, the code point would be presented
in its Arabic Initial form.
arabic-isolated Based on context, the code point would be presented
in its Arabic Isolated form.
arabic-medial Based on context, the code point would be presented in
its Arabic Medial form.
arabic-final Based on context, the code point would be presented in
its Arabic Final form.
For example, to mark U+0673 as a variant of U+0625, but only when it
appears in isolated or final forms:
3.4. Example table
A sample complete XML IDN table is as follows.
Davies Expires September 1, 2012 [Page 10]
Internet-Draft IDN Table XML representation February 2012
1
2010-01-01
sv
example
This language table was developed with the
Swedish
examples institute.
Davies Expires September 1, 2012 [Page 11]
Internet-Draft IDN Table XML representation February 2012
4. Processing a label against a table
4.1. Determining eligibility for a label
In order to use a table to test a specific IDN table for membership
in the table, a consumer of an IDN table must iterate through each
code point within a given U-label, and test that each code point is a
member of the IDN table. If any code point is not a member of the
IDN Table, it shall be deemed as not eligible in accordance with the
table.
A code point is deemed a member of the table when it is listed with
the element, and all necessary condition listed in "when"
attributes are correctly satisfied.
4.2. Determining variants for a label
For a given eligible label, the set of variants is deemed to be each
possible permutation of elements, whereby all "when" attributes
are correctly satisfied for each code point in the given permutation.
Davies Expires September 1, 2012 [Page 12]
Internet-Draft IDN Table XML representation February 2012
5. Conversion between other formats
5.1. RFC 3743 Language Variant Table
All attributes can be retained in conversion from an [RFC3743]
language variant table to this XML format.
This XML format can be converted to the format described in
[RFC3743], with the following caveats:
o Version numbers not expressed as integers will not satisfy the
ABNF formatting for [RFC3743].
o Much of the additional meta data can not be expressed in the text
format (although can be supplied as comments in the text file).
o The [RFC3743] format only allows for two variant classes, those
that are preferred and those that are regular. Other distinctions
will be lost.
o No ability to retain conditional variants.
5.2. RFC 4290 Model Table Format
All attributes can be retained in conversion from the [RFC4290] model
table format to this XML format.
Tables similarly can be converted to the format described in
[RFC4290] with the same caveats as the [RFC3743] format, and
additionally the inability to classify variants into groups such as
"preferred".
Davies Expires September 1, 2012 [Page 13]
Internet-Draft IDN Table XML representation February 2012
6. IANA Considerations
This document does not specify any IANA actions.
Davies Expires September 1, 2012 [Page 14]
Internet-Draft IDN Table XML representation February 2012
7. Security Considerations
There are no security considerations for this memo.
Davies Expires September 1, 2012 [Page 15]
Internet-Draft IDN Table XML representation February 2012
8. References
[RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the
Internet: Timestamps", RFC 3339, July 2002.
[RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
Engineering Team (JET) Guidelines for Internationalized
Domain Names (IDN) Registration and Administration for
Chinese, Japanese, and Korean", RFC 3743, April 2004.
[RFC4290] Klensin, J., "Suggested Practices for Registration of
Internationalized Domain Names (IDN)", RFC 4290,
December 2005.
[RFC5564] El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman,
"Linguistic Guidelines for the Use of the Arabic Language
in Internet Domains", RFC 5564, February 2010.
[RFC5646] Phillips, A. and M. Davis, "Tags for Identifying
Languages", BCP 47, RFC 5646, September 2009.
[UAX42] Unicode Consortium, "Unicode Character Database in XML".
Davies Expires September 1, 2012 [Page 16]
Internet-Draft IDN Table XML representation February 2012
Appendix A. RelaxNG Schema
Davies Expires September 1, 2012 [Page 17]
Internet-Draft IDN Table XML representation February 2012
Davies Expires September 1, 2012 [Page 18]
Internet-Draft IDN Table XML representation February 2012
Davies Expires September 1, 2012 [Page 19]
Internet-Draft IDN Table XML representation February 2012
Appendix B. Acknowledgements
This format builds upon the work on documenting IDN tables by a
number of other parties, most significantly that of the the Joint
Engineering Team published as [RFC3743], and [RFC5564] published by
the Arabic-language community.
Contributions that have helped shape this document have been
contributed by Francisco Arias, Nicholas Ostler, Steve Sheng and
Andrew Sullivan.
Davies Expires September 1, 2012 [Page 20]
Internet-Draft IDN Table XML representation February 2012
Appendix C. Editorial Notes
This appendix to be removed prior to final publication.
C.1. Known Issues and Future Work
o The text does not currently provide a mechanism for deriving
variants based on a sequence of two or more code points. Such a
mechanism would be required to perform an inverse mapping from one
provided in this document, namely, mapping the sequence of "o" and
"e" to an "o with umlaut"
o A mechanism for a specific code point only being eligible when
preceded or followed by a specific sequence of characters is not
provided. Such a mechanism is needed to support the contextual
rule required by the .CAT top-level domain, which supports the
middle dot (U+00B7) only when both preceded and followed by the
letter "l" (U+006C).
o An optional mechanism for explicitly nominating the registry
action associated with a computed variant could be added. For
example, an "action" attribute to the element could specify
one of the following: "allocate", "block", "delegate", "mirror" or
"withhold". Each of these actions would need to be formally
defined.
o The tables may benefit from a unique identifier, such as an "id"
attribute on the element.
o A more formal step-wise description of how variants are computed
needs to be supplied.
C.2. Sample tables and running code
Some sample tables using this format, as well as a basic
implementation of this specification, is posted at
https://github.com/kjd/idntables
C.3. Change History
-00 Initial draft.
Davies Expires September 1, 2012 [Page 21]
Internet-Draft IDN Table XML representation February 2012
Author's Address
Kim Davies
Internet Corporation for Assigned Names and Numbers
4676 Admiralty Way
Suite 330
Marina del Rey, CA 90292
US
Phone: +1 310 823 9358
Email: kim.davies@icann.org
URI: http://www.iana.org/
Davies Expires September 1, 2012 [Page 22]