1. Introduction

In many cases, user input of internationalized strings is generated by input method editor ("IME") or copy-and-paste from free text. Usually users do not care case and/or width of input characters because they are identical for users' eyes. Further, users rarely switch IME state to input special characters such as protocol elements. For Internationalized Domain Names ("IDNs"), IDNA Mapping [RFC5895] describes methods to treat these issues. For precis strings, case mapping is defined as a process in precis framework [I-D.ietf-precis-framework], but width mapping, delimiter mapping and/or special mapping are not defined. Handling of mappings other than case is also important to increase chance of strings match as users expect. This document is a guideline for authors of protocol profiles of precis framework and describes the mappings that must be considered between receiving user input and passing permitted code points to internationalized protocols.

2. Types of mapping

This document defines two types of mapping. One is protocol independent mapping that doesn't depend on protocol rules and the other is protocol dependent mapping that depend on protocol rules. This document defines some mappings in these mapping types. Authors of protocol profiles of precis framework should need to give careful consideration to choice of mappings.

Each mapping type is described in following sections.

3. Protocol independent mapping

Protocol independent mapping is a mapping that doesn't depend on protocol rules.

3.1. Width mapping

Fullwidth and halfwidth characters (those defined with Decomposition Types <wide> and <narrow>) are mapped to their decomposition mappings as shown in the Unicode character database [Unicode].

Width mapping will increase backward compatibility with Stringprep [RFC3454] and precis framework [I-D.ietf-precis-framework]. Because in a Stringprep profile which specifies Unicode normalization form KC (NFKC) for normalization method, fullwidth/halfwidth characters are mapped into its compatible form. If a precis framework profile specified NFKC (which is not recommended), width mapping might not be useful.

4. Protocol dependent mapping

Protocol dependent mapping is a mapping that depend on protocol rules.

4.1. Delimiter mapping

Definitions of delimiters in certain protocols are differ from each other. Therefore, delimiter mapping table should be based on well defined mapping table for each protocol.

One of the most useful case of delimiter mapping is when FULL STOP character (U+002E) is a delimiter as well as domain name. Some of IME generates FULL STOP compatible characters such as IDEOGRAPHIC FULL STOP (U+3002) when users type FULL STOP on the keyboard.

4.2. Special mapping

Certain protocols have characters which need to map different character from precis framework defined mapping rule other than delimiter characters. In this document, these mappings are named special mapping. They are differ from each protocol. Therefore, special mapping table should be based on well defined mapping table for each protocol. Examples of special mapping are following; [RFC4518] defines the rule that some codepoints(Appendix B.4) are mapped to SPACE (U+0020).


4.3. Local case mapping

Local case mapping is case folding that depend on language context. For example, given there is upper case I in a user ID strings, you should care what's language context that this user ID depend on when this character is mapped into lower case character. And if this depends on Turkish, the character should be mapped into LATIN SMALL LETTER DOTLESS I (U+0131) as this character's lower case.

This document defines characters that need local case mapping based on the Specialcasing.txt [Specialcasing] in section 3.13 of The Unicode Standerd [Unicode] to solve such a problem. Local case mapping targets only characters that get two different results to perfom just casefolding that is defined in the Casefolding.txt [Casefolding] and perfom special casefolding that is defined in the Specialcasing.txt then casefolding, because precis framework have casefolding.

There are two types casefoldings defined as Unconditional Mappings and Conditional Mappings in the Specialcasing.txt. Conditional mappings have Language-Insensitive Mappings that targets characters whose full case mappings do not depend on language, but do depend on context and Language-Sensitive Mappings that these are characters whose full case mappings depend on language and perhaps also context.

Of these mappings, characters that Unconditional Mappings and Language-Insensitive Mappings in Conditional Mappings target are mapped into same codepoint(s) with just casefolding and special casefolding then casefolding. But characters that Language-Sensitive Mappings in Conditional Mappings targets are mapped into different codepoint with them. Therefore this document defined characters that are a part of characters of Lithuanian(lt), Turkish(tr) and Azerbaijanian(az) that Language-Sensitive Mappings targets as targets for local case mapping.

A list of characters that need Local case mapping are as follows. Section 6 "IANA Considerations" contains a template to registry these characters to IANA as precis local case mapping registry.

5. Applying order of mapping

Basically, applying order of mapping that this document describes aren't sensitive. This section defines applying order of mapping to minimize effect of codepoint change by mappings. This mapping order is very general and was designed to be acceptable to the widest user community.

  1. width mapping
  2. delimiter mapping
  3. special mapping
  4. local case mapping
  5. precis framework

Mappings that this document describes should be performed before precis framework.

6. IANA Considerations

6.1. precis local case mapping registry

IANA is requested to create a registry of precis local case mapping. In accordance with [RFC5226], the registration policy is "RFC Required".

6.2. Template for precis local case mapping registry

The following information is to be given when a new precis local case mapping rule is created. The registration template is as follows: Appendix C contains further discussion and a table from which that registry can be initialized.

7. Security Considerations


8. Acknowledgment

Martin Dürst suggested a need for the case folding about the mapping(map final sigma to sigma, German sz to ss,.).

Pete Resnick et al. gave important suggestion for this document during at WG meeting.

Appendix A. Mapping type list each protocol

A.1. Mapping type list for each protocol

This table is the mapping type list for each protocol. Values marked "o" indicate that the protocol use the type of mapping. Values marked "-" indicate that the protocol doesn't use the type of mapping.

|    \ Type of mapping |    Width    | Delimiter | Case | Special |
| RFC \                |    (NFKC)   |           |      |         |
|         3490         |      -      |     o     |   -  |    -    |
|         3491         |      o      |     -     |   o  |    -    |
|         3722         |      o      |     -     |   o  |    -    |
|         3748         |      o      |     -     |   -  |    o    |
|         4013         |      o      |     -     |   -  |    o    |
|         4314         |      o      |     -     |   -  |    o    |
|         4518         |      o      |     -     |   o  |    o    |
|         6120         |      -      |     -     |   o  |    -    |

Appendix B. Codepoints which need special mapping

B.1. RFC3748

Non-ASCII space characters [StringPrep, C.1.2] that can be mapped to SPACE (U+0020).

B.2. RFC4013

Non-ASCII space characters [StringPrep, C.1.2] that can be mapped to SPACE (U+0020).

B.3. RFC4314

Non-ASCII space characters [StringPrep, C.1.2] that can be mapped to SPACE (U+0020).

B.4. RFC4518

Codepoints mapped to SPACE (U+0020) are following;

U+0085 (NEXT LINE (NEL))
U+0020 (SPACE)
U+2000 (EN QUAD)
U+2001 (EM QUAD)
U+2002 (EN SPACE)
U+2003 (EM SPACE)
U+2028 (Line Separator)
U+2029 (Paragraph Separator)

All other control code (e.g., Cc) points or code points with a control function (e.g., Cf) are mapped to nothing. Codepoints mapped to nothing that aren't specified by Stringprep are following;


Appendix C. The initial precis local case mapping registrations

C.1. Lithuanian

language: Lithuanian

C.2. Turkish

language: Turkish

C.3. Azerbaijanian

language: Azerbaijanian

