Network Working Group K. Davies
Internet-Draft ICANN
Intended status: Informational A. Freytag
Expires: January 10, 2014 ASMUS Inc.
July 9, 2013
Representing Label Generation Rulesets using XML
draft-davies-idntables-03
Abstract
This memo describes a method of representing the domain name
registration policy for a zone administrator using Extensible Markup
Language (XML). These policies, known as "Label Generation Rulesets"
(LGRs), are particularly used for the implementation of
Internationalised Domain Names (IDNs). The rulesets are used to
implement and share policy on which specific Unicode codepoints are
permitted for registrations, which alternative codepoints are
considered variants, and what actions may be performed on those
variants.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 10, 2014.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
Davies & Freytag Expires January 10, 2014 [Page 1]
Internet-Draft Label Generation Rulesets in XML July 2013
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5
4. LGR Format . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1. Namespace . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2. Basic structure . . . . . . . . . . . . . . . . . . . . . 6
4.3. Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3.1. The version element . . . . . . . . . . . . . . . . . 7
4.3.2. The date element . . . . . . . . . . . . . . . . . . . 7
4.3.3. The language element . . . . . . . . . . . . . . . . . 7
4.3.4. The domain element . . . . . . . . . . . . . . . . . . 8
4.3.5. The description element . . . . . . . . . . . . . . . 8
4.3.6. The validity-start and validity-end elements . . . . . 8
4.3.7. The unicode-version element . . . . . . . . . . . . . 8
4.4. Codepoint Rules . . . . . . . . . . . . . . . . . . . . . 9
4.4.1. Sequences . . . . . . . . . . . . . . . . . . . . . . 9
4.4.2. Variants . . . . . . . . . . . . . . . . . . . . . . . 10
4.4.3. Result tagging . . . . . . . . . . . . . . . . . . . . 11
4.5. Whole Label Evaluation Rules . . . . . . . . . . . . . . . 12
4.5.1. Basic concepts . . . . . . . . . . . . . . . . . . . . 12
4.5.2. Character Classes . . . . . . . . . . . . . . . . . . 12
4.5.3. Context rules . . . . . . . . . . . . . . . . . . . . 14
4.5.4. Action elements . . . . . . . . . . . . . . . . . . . 15
4.6. Example table . . . . . . . . . . . . . . . . . . . . . . 16
5. Processing a label against an LGR . . . . . . . . . . . . . . 18
5.1. Determining eligibility for a label . . . . . . . . . . . 18
5.2. Determining variants for a label . . . . . . . . . . . . . 18
6. Conversion between other formats . . . . . . . . . . . . . . . 19
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20
8. Security Considerations . . . . . . . . . . . . . . . . . . . 21
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Appendix A. RelaxNG Schema . . . . . . . . . . . . . . . . . . . 23
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 28
Appendix C. Editorial Notes . . . . . . . . . . . . . . . . . . . 29
C.1. Known Issues and Future Work . . . . . . . . . . . . . . . 29
C.2. Sample tables and running code . . . . . . . . . . . . . . 29
C.3. Change History . . . . . . . . . . . . . . . . . . . . . . 29
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30
Davies & Freytag Expires January 10, 2014 [Page 2]
Internet-Draft Label Generation Rulesets in XML July 2013
1. Introduction
This memo describes a method of using Extensible Markup Language
(XML) to describe the algorithm used to determine whether a given
domain label is permitted, and under which circumstances. These
algorithms are comprised of a list of permissible codepoints,
variants, and a number of conditions where certain relationships are
applied. These algorithms form part of a zone administrator's
policies, and can be referred to as Label Generation Rulesets (LGRs),
or IDN tables.
Administrators of the zones for top-level domain registries have
historically published their LGRs using ASCII text or HTML. The
formatting of these documents has been loosely based on the format
used for the Language Variant Table in [RFC3743]. [RFC4290] also
provides a "model table format" that describes a similar set of
functionality.
Through the first decade of IDN deployment, experience has shown that
LGRs derived from these formats are difficult to consistently
implement and compare due to their different formats. A universal
format, such as one using a structured XML format, will assist by
improving machine-readability, consistency, reusability and
maintainability of LGRs. It also provides for more complex
conditional implementation of variants that reflects the known
requirements of current zone administrator policies.
While the predominant usage of this specification is to represent IDN
label policy, the format may also be used for describing ASCII domain
name label rulesets.
Davies & Freytag Expires January 10, 2014 [Page 3]
Internet-Draft Label Generation Rulesets in XML July 2013
2. Design Goals
The following items are explicit design goals of this format:
o MUST be in a format that can be implemented in a reasonably
straightforward manner in software;
o The format SHOULD be able to be checked for formatting errors,
such that common mistakes can be caught;
o An LGR MUST be able to express the set of valid codepoints that
are allowed for registration under a specific zone administrator's
policies;
o MUST be able to express computed alternatives to a given domain
name based on a one-to-one, or one-to-many relationship. These
computed alternatives are commonly known as "variants";
o Variants SHOULD be able to be tagged with specific categories,
such that the categories can be used to support registry policy
(such as whether to list the computed variant in the zone, or to
merely block it from registration);
o Variants MUST be able to stipulated based on contextual
information. For example, specific variants may only be
applicable when they follow another specific codepoint, or when
the codepoint is displayed in a specific presentation form;
o The data contained within an LGR MUST be unambiguous, such that
independent implementations that utilise the contents will arrive
at the same results;
o LGRs SHOULD be suitable for comparison and re-use, such that one
could easily compare the contents of two or more to see the
differences, to merge them, and so on.
o As many existing IDN tables are practicable SHOULD be able to be
migrated to the LGR format with all applicable logic retained.
It is explicitly NOT the goal of this format to stipulate what
codepoints should be listed in an LGR by a zone administrator. Which
registration policies are used for a particular zone is outside the
scope of this memo.
Davies & Freytag Expires January 10, 2014 [Page 4]
Internet-Draft Label Generation Rulesets in XML July 2013
3. Requirements
To be able to fulfil the known utilisation of LGRs, the existing
corpus of published IDN tables were reviewed to prepare this
specification.
In addition, the requirements of ICANN's work to implement an LGR for
the DNS Root Zone [LGR-PROCEDURE] were also considered. In Section B
of that document, five specific requirements for an LGR methodology
were identified:
o The ability to identify a set of codepoints that are permitted.
o The ability to represent a list of variants, if any, for each
codepoint.
o A method of identifying codepoints that are related, using a tag.
o The ability to describe rules regarding the possible actions that
may be performed on the resulting label (such as blocked,
allocatable, etc.)
o The ability to describe rules that check for ill-formed
combinations across the whole label.
Davies & Freytag Expires January 10, 2014 [Page 5]
Internet-Draft Label Generation Rulesets in XML July 2013
4. LGR Format
An LGR is expressed as a well-formed XML Document [XML].
4.1. Namespace
The XML Namespace URI is [TBD].
4.2. Basic structure
The basic XML framework of the document is as follows:
...
Within the "lgr" element rests several sub-elements. Firstly is a
"meta" element that contains all meta-data associated with the IDN
table, such as its authorship, what it is used for, implementation
notes and references. This is followed by a "data" element that
contains the substantive codepoint data. Finally, an optional
"rules" element contains information on whole-label evaluation rules,
if any, along with any specific rules regarding the disposition of
computed variants.
...
...
...
A document should contain exactly one "lgr" element, and within that
optionally one "meta" element and exactly one "data" element.
4.3. Metadata
The "meta" element is used to express meta-data associated within the
LGR. It can be used to explain the author or relevant contact
person, explain what the usage of the IDN table is, provide
implementation notes as well as references. The data contained
Davies & Freytag Expires January 10, 2014 [Page 6]
Internet-Draft Label Generation Rulesets in XML July 2013
within is not required by software consuming the LGR in order to
calculate valid labels, or to calculate variants.
4.3.1. The version element
The "version" element is used to uniquely identify each version of
the LGR being represented. No specific format is required, but it is
RECOMMENDED that it be a numerical positive integer, which is
incremented with each revision of the file.
An example of a typical first edition of a document:
1
A common alternative is to use a major-minor number scheme, where two
decimal numbers are used to represent major and minor changes to the
LGR. For example, "1.0" would be the first major release, "1.1"
would be a minor update to that, and "2.0" would represent a major
revision.
4.3.2. The date element
The "date" element is used to identify the date the LGR was written.
The contents of this element MUST be a valid ISO 8601 date string as
described in [RFC3339].
Example of a date:
2009-11-01
4.3.3. The language element
The "language" element signals that the LGR is associated with a
specific language or script. The value of the language element must
be a valid language tag as described in [RFC5646]. The tag may
simply refer to a script if the LGR is not referring to a specific
language. There may be multiple language elements for a LGR if it
spans multiple languages and/or scripts.
Example of an English language LGR:
en
If the LGR applies to a specific script, rather than a language, the
"und" language tag should be used followed by the relevant [RFC5646]
script subtag. For example, for a Cyrillic script LGR:
und-Cyrl
Davies & Freytag Expires January 10, 2014 [Page 7]
Internet-Draft Label Generation Rulesets in XML July 2013
4.3.4. The domain element
This optional element refers to a domain to which this policy is
applied.
example.com
There may be multiple tags used to reflect a list of
domains.
4.3.5. The description element
The "description" element is a free-form element that contains any
additional relevant description. Typically, this field contains
authorship information, as well as additional context on how the LGR
was formulated (such as with references), and how it has been
applied.
The element has an optional "type" attribute, which refers to the
media type of the enclosed data. If the description lacks a type
field, it will be assumed to be plain text.
The description elements describe information relating to the LGR
that is useful for the user of the LGR in its interpretation. This
may explain the history, the rationale, reference sources etc. It
may also contain authorship information.
The "type" attribute may be used to specify the encoding within
description element. The attribute should be a valid MIME type. If
supplied, it will be assumed the contents is content of that
encoding. Typical types would be "text/plain" or "text/html". "text/
plain" will be assumed if no type attribute is specified.
4.3.6. The validity-start and validity-end elements
The "validity-start" and "validity-end" elements are optional
elements that describe the time period from which the contents of the
LGR become valid (i.e. are used in registry policy), and the contents
of the LGR cease to be used.
The times should conform to the format described in section 5.6 of
[RFC5646]. It may be comprised of a date, or a date and time stamp.
4.3.7. The unicode-version element
If a given table is dependent on certain characters or functionality
from a given version of the Unicode standard, the minimum version
number MUST be listed. If any software processing the table does not
Davies & Freytag Expires January 10, 2014 [Page 8]
Internet-Draft Label Generation Rulesets in XML July 2013
have the minimum requisite version, it MUST NOT perform any
operations relating to whole-label evaluation. This is because the
Unicode properties for the codepoints may have changed in subsequent
versions.
6.2
4.4. Codepoint Rules
The bulk of a label generation ruleset is a description of which set
of codepoints are eligible for a given label. For rulesets that
perform operations that result in potential variants, the codepoint-
level relationships between variants need to also be described.
The codepoint data is collected within a "data" element. Within this
element, a series of "char" and "range" elements describe eligible
codepoints, or ranges of codepoints, respectively.
Discrete permissible codepoints or codepoint sequences may be
stipulated with a "char" element, e.g.
Ranges of permissible codepoints may be stipulated with a "range"
element, e.g.
The range is inclusive of the first and last codepoints.
Codepoints must be expressed in hexadecimal, i.e. according to the
standard Unicode convention without the prefix "U+". The rationale
for not allowing other encoding formats, including native Unicode
encoding in XML, is explored in [UAX42]. The XML conventions used in
this format, including the element and attribute names, mirror this
document where practical and reasonable to do so.
4.4.1. Sequences
A sequence of two or more codepoints may be specified in a LGR, when
the exact sequence of codepoints is required to occur in order for
the consituent elements to be eligible. This approach allows
representation of policy where a specific codepoint is only eligible
when preceded or followed by another codepoint. For example, in
order to represent the eligibility of the MIDDLE DOT (U+00B7) only
when both preceded and followed by the LATIN SMALL LETTER L (U+006C):
Davies & Freytag Expires January 10, 2014 [Page 9]
Internet-Draft Label Generation Rulesets in XML July 2013
4.4.2. Variants
While most LGRs typically only determine codepoint eligibility,
others additionally specify a mapping of codepoints to other
codepoints, known as "variants". What constitutes a variant is a
matter of policy, and varies for each implementation.
4.4.2.1. Basic variants
Variants are specified as one of more children of a "char" element.
For example, to map LATIN SMALL LETTER V (U+0076) as a variant of
LATIN SMALL LETTER U (U+0075):
A sequence of multiple codepoints can be specified as a variant of a
single codepoint. For example, the sequence of LATIN SMALL LETTER O
(U+006F) then LATIN SMALL LETTER E (U+0065) can be specified as a
variant for an LATIN SMALL LETTER O WITH DIAERESIS (U+00F6) as
follows:
Variants are specified in only one direction. For symmetric
variants, the inverse of the variant must be explicitly specified:
Both the south and target of a variant mapping may be sequences. It
is not possible to specify variants for ranges.
4.4.2.2. Null variants
To specify a null variant, which is a variant string that maps to no
codepoint, use an empty cp attribute. For example, to mark a string
with a ZERO WIDTH NON-JOINER (U+200C) to the same string without the
ZERO WIDTH NON-JOINER:
Davies & Freytag Expires January 10, 2014 [Page 10]
Internet-Draft Label Generation Rulesets in XML July 2013
4.4.2.3. Conditional variants
Fundamentally, variants are mappings between two sequences of
codepoints. However, in some instances for a variant relationship to
exist, some context external to the codepoint sequence must be
considered. For example, in some cases the positional context
determines whether two code point sequences are variants of each
other. This is because Arabic characters can have different forms
based on position. This position context cannot be solely derived
from the codepoint, as the code point is the same for the various
forms.
To specify a conditional variant relationship the "when" attribute is
used. The variant relationship exists when the condition in the
"when" attribute is satisfied.
arabic-initial The codepoint is in a context where it would be
presented in its Arabic Initial form.
arabic-isolated The codepoint is in a context where it would be
presented in its Arabic Isolated form.
arabic-medial The codepoint is in a context where it would be
presented in its Arabic Medial form.
arabic-final The codepoint is in a context where it would be
presented in its Arabic Final form.
For example, to mark ARABIC LETTER ALEF WITH WAVY HAMZA BELOW
(U+0673) as a variant of ARABIC LETTER ALEF WITH HAMZA BELOW
(U+0625), but only when it appears in isolated or final forms:
Only a single context attribute can be applied to any "var" element,
however, multiple "var" elements using the same mapping, but
different "when" attributes may be specified.
4.4.3. Result tagging
Typically, LGRs are used to explicitly designate allowable
codepoints, with any label with a codepoint not explicitly listed in
the LGR being considered an ineligible label according to the
ruleset.
Davies & Freytag Expires January 10, 2014 [Page 11]
Internet-Draft Label Generation Rulesets in XML July 2013
For more complex registry rules, there may be a need to discern
codepoints and variants of certain types. This can be accomplished
by applying a "tag" attribute, and then filtering on results based on
the tag using whole label evaluation.
A tag may be of any value, but the following tags are pre-defined to
encourage common conventions in their application. If these tags can
represent registry policy, they SHOULD be used.
4.5. Whole Label Evaluation Rules
4.5.1. Basic concepts
The codepoints in a label sometimes need to satisfy context-based
rules, in order for the label to be considered valid. Whole Label
Evaluation Rules (WLE) can be specified to support this validation.
The same validation can be applied to variants created by applying
the variant mapping.
The whole label evaluation rules are contained in an "wle" element,
which contains character class, rule and action elements. These are
described below.
A Whole Label Evaluation Rule describes a complete label. The
elements of the "rule" element are:
o character classes, which defines sets of codepoints to be used for
context comparisons;
o context operators, which define when character classes may appear;
and
o actions, which define what actions to take based on the context.
4.5.2. Character Classes
Character classes are named sets of characters that share a
particular property. They can be defined in several ways.
1. Define the property via matching a tag in the codepoint data.
All characters with the same tag attribute are part of the same
class.
2. Reference one of the Unicode character properties defined in the
Unicode Character Database (UCD).
3. Explicitly list all the codepoints in the class.
Davies & Freytag Expires January 10, 2014 [Page 12]
Internet-Draft Label Generation Rulesets in XML July 2013
4. Define a class as a combination of any number of these
definitions or other classes
4.5.2.1. Tag-based classes
If tags are defined using the "tag" attribute, classes are defined
based upon the names of the tags used. From these classes, further
operations may be performed by context operators and actions.
4.5.2.2. Unicode property based classes
A class is defined in terms of Unicode properties by giving the
Unicode property alias and the property value or property value
alias.
The example above selects all characters for which the Unicode
canonical combining class (ccc) value is 9. This value of the ccc is
assigned in the UCD to all characters that are viramas. The string
"ccc" is the short-alias for the canonical combining class, as
defined in the file PropertyAliases.txt in the UCD. [[Possibly
change those to the labels used by the XML format of the UCD -- per
UAX42]]
Unicode properties may, in principle, change between versions of the
Unicode Standard. However, the values assigned for a given version
are fixed. If Unicode Properties are used, they MUST be declared in
the header, and the Unicode Version must be defined. (Note, some
Unicode properties are stable across versions and do not change, once
assigned. Nevertheless, in order to make sure the UCD version covers
all the characters in the codepoint tables, it is necessary to give
version number in the header.).
4.5.2.3. Explicitly declared classes
A class of codepoints may also be declared by listing the codepoints
that are a member of the class. This is useful when tagging can not
be used because codepoints are not part of the eligible set of
codepoints for the given LGR.
To define a class in terms of an explicit list of codepoints:
Davies & Freytag Expires January 10, 2014 [Page 13]
Internet-Draft Label Generation Rulesets in XML July 2013
This defines a class named "abc" containing the codepoints for
characters "a", "b" and "c". The ordering of the codepoints is not
material, but it is RECOMMENDED to list them in ascending order.
Range operators may also be used to represent a series of consecutive
codepoints. The same declaration can be made as follows:
4.5.2.4. Combined classes
Classes may be combined using logical operators for inversion, union,
intersection and exclusive-or.
4.5.3. Context rules
Context rules are comprised of a series of logical conditions that
must be satisfied in order to determine a label meets a given
context. These rules relate to the appearance of character classes
defined elsewhere in the table.
4.5.3.1. The rule element
A matching rule is defined by a "rule" element, which combines
character classes with context operators.
A simple rule to match a label where all characters are members of
the class "preferred":
To provide more specificity on the number of times a specific
character class may appear, the "count" attribute allows you to
specify the number of times. This number should be an integer of 0
or higher. If it is followed by a plus character (+), this means it
can be higher that the number stated. Therefore, "1" would mean
exactly one occurrence, whereas "1+" would indicate one or more
occurrences.
Davies & Freytag Expires January 10, 2014 [Page 14]
Internet-Draft Label Generation Rulesets in XML July 2013
For cases where several alternates could be chosen, the
element can encode a list of choices:
For cases when a match may occur against any codepoint, use any "any"
element:
By default Whole Label Evaluation Rules always match the entire
label. Use attribute "match" with values "start", "anywhere" and
"end" to define rules that need to match in specific positions of the
label.
Rules are named and can be nested by reference.
Here's an example of a rule requiring that all labels be letters
(optionally followed by combining marks) and possibly digits.
4.5.4. Action elements
The purpose of a rule is to trigger a specific action. Often, the
action simply results in blocking a label that does not match a rule.
Davies & Freytag Expires January 10, 2014 [Page 15]
Internet-Draft Label Generation Rulesets in XML July 2013
blocking rule
<
An action may contain precisely one "match" or "not-match" attribute,
but not both. Because rules may be compound rules that contain other
rules, only a single rule may be named as the value of the "match" or
"not-match" attrbute.
The precise action taken and the name of the corresponding "action"
attribute are not defined here. It is strongly RECOMMENDED to use
the following actions only with their conventional sense.
block The resulting string should be blocked from registration.
This would typically apply for a derived variant that has no
practical use, such as blocking confusingly similar by
undesirable variants.
allocate The resulting string should be reserved for use by the same
operator of the origin string, but not automatically allocated
for use.
activate The resulting string should be activated for use. (This is
the typical default action if no tagging is used, and is known
as a "preferred" variant in [RFC3743])
4.6. Example table
A sample complete XML LGR is as follows.
1
2010-01-01
sv
example
Swedish
examples institute.
]]>
Davies & Freytag Expires January 10, 2014 [Page 16]
Internet-Draft Label Generation Rulesets in XML July 2013
Davies & Freytag Expires January 10, 2014 [Page 17]
Internet-Draft Label Generation Rulesets in XML July 2013
5. Processing a label against an LGR
5.1. Determining eligibility for a label
In order to use a table to test a specific domain label for
membership in the LGR, a consumer of the LGR must iterate through
each codepoint within a given U-label, and test that each codepoint
is a member of the LGR. If any codepoint is not a member of the LGR,
it shall be deemed as not eligible in accordance with the table.
A codepoint is deemed a member of the table when it is listed with
the element, and all necessary condition listed in "when"
attributes are correctly satisfied.
5.2. Determining variants for a label
For a given eligible label, the set of variants is deemed to be each
possible permutation of elements, whereby all "when" attributes
are correctly satisfied for each codepoint in the given permutation.
Davies & Freytag Expires January 10, 2014 [Page 18]
Internet-Draft Label Generation Rulesets in XML July 2013
6. Conversion between other formats
Both [RFC3743] and [RFC4290] provide different grammars for IDN
tables. These formats are unable to fully cater for the increased
requirements of contemporary IDN variant policies.
This specification is a superset of functionality provided by these
IDN table formats, thus any table expressed in those formats can be
expressed in this format. Automated conversion can be conducted
between tables conformant with the grammar specified in each
document.
Davies & Freytag Expires January 10, 2014 [Page 19]
Internet-Draft Label Generation Rulesets in XML July 2013
7. IANA Considerations
This document does not specify any IANA actions.
Davies & Freytag Expires January 10, 2014 [Page 20]
Internet-Draft Label Generation Rulesets in XML July 2013
8. Security Considerations
There are no security considerations for this memo.
Davies & Freytag Expires January 10, 2014 [Page 21]
Internet-Draft Label Generation Rulesets in XML July 2013
9. References
[LGR-PROCEDURE]
Internet Corporation for Assigned Names and Numbers,
"Procedure to Develop and Maintain the Label Generation
Rules for the Root Zone in Respect of IDNA Labels".
[RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the
Internet: Timestamps", RFC 3339, July 2002.
[RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
Engineering Team (JET) Guidelines for Internationalized
Domain Names (IDN) Registration and Administration for
Chinese, Japanese, and Korean", RFC 3743, April 2004.
[RFC4290] Klensin, J., "Suggested Practices for Registration of
Internationalized Domain Names (IDN)", RFC 4290,
December 2005.
[RFC5564] El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman,
"Linguistic Guidelines for the Use of the Arabic Language
in Internet Domains", RFC 5564, February 2010.
[RFC5646] Phillips, A. and M. Davis, "Tags for Identifying
Languages", BCP 47, RFC 5646, September 2009.
[UAX42] Unicode Consortium, "Unicode Character Database in XML".
[XML] "Extensible Markup Language (XML) 1.0".
Davies & Freytag Expires January 10, 2014 [Page 22]
Internet-Draft Label Generation Rulesets in XML July 2013
Appendix A. RelaxNG Schema
Davies & Freytag Expires January 10, 2014 [Page 23]
Internet-Draft Label Generation Rulesets in XML July 2013
Davies & Freytag Expires January 10, 2014 [Page 24]
Internet-Draft Label Generation Rulesets in XML July 2013
Davies & Freytag Expires January 10, 2014 [Page 25]
Internet-Draft Label Generation Rulesets in XML July 2013
Davies & Freytag Expires January 10, 2014 [Page 26]
Internet-Draft Label Generation Rulesets in XML July 2013
Davies & Freytag Expires January 10, 2014 [Page 27]
Internet-Draft Label Generation Rulesets in XML July 2013
Appendix B. Acknowledgements
This format builds upon the work on documenting IDN tables by many
different registry operators. Notably, a comprehensive language
table for Chinese, Japanese and Korean was developed by the "Joint
Engineering Team" [RFC3743] that is the basis of many registry
policies; and a set of guidelines for Arabic script registrations
[RFC5564] was published by the Arabic-language community.
Contributions that have shaped this document have been provided by
Francisco Arias, Mark Davis, Nicholas Ostler, Thomas Roessler, Steve
Sheng and Andrew Sullivan.
Davies & Freytag Expires January 10, 2014 [Page 28]
Internet-Draft Label Generation Rulesets in XML July 2013
Appendix C. Editorial Notes
This appendix to be removed prior to final publication.
C.1. Known Issues and Future Work
o A default set of actions should be defined if they are not
explicitly accounted for in the table.
o A method of specifying the origin URI for a table, and an
expiration or refresh policy, as meta-data may be a useful way to
declare how the table will be updated.
C.2. Sample tables and running code
Some sample tables using this format, as well as a basic
implementation of this specification, is posted at
https://github.com/kjd/idntables
C.3. Change History
-00 Initial draft.
-01 Add an XML Namespace, and fix other XML nits. Add support for
sequences of codepoints. Improve on consistently using Unicode
nomenclature.
-02 Add support for validity periods.
-03 Incorporate requirements from the Label Generation Ruleset
Procedure for the DNS Root Zone. These requirements include a
detailed grammar for specifying whole-label variants, and the
ability to explicitly declare of the actions associated with a
specific variant. The document also consistently applies the
term "Label Generation Ruleset", rather than "IDN table", to
reflect the policy term now being used to describe these.
Davies & Freytag Expires January 10, 2014 [Page 29]
Internet-Draft Label Generation Rulesets in XML July 2013
Authors' Addresses
Kim Davies
Internet Corporation for Assigned Names and Numbers
12025 Waterfront Drive
Los Angeles, CA 90094
US
Phone: +1 310 301 5800
Email: kim.davies@icann.org
URI: http://www.iana.org/
Asmus Freytag
ASMUS Inc.
Email: asmus@unicode.org
Davies & Freytag Expires January 10, 2014 [Page 30]