Internet Draft M. Duerst University of Zurich Expires 30 February 1997 30 August 1996 Ruby in the Hypertext Markup Language Status of this Memo This document is an Internet-Draft. Internet-Drafts are working doc- uments of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute work- ing documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet- Drafts as reference material or to cite them other than as a "working draft" or "work in progress". To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Distribution of this document is unlimited. Please send comments to the author at . This document is intended to become an informational RFC, and its contents is designed for adop- tion in other standards and specifications. Abstract The Hypertext Markup Language (HTML) is a markup language used to create hypertext documents that are platform independent. Initially HTML was designed primarily for Western European languages; most of the issues of basic internationalization to make HTML better usable for other languages have in the meantime been addressed. Ruby are importannt phonetic annotations used mainly for ideographic charac- ters in East Asia. This document proposes markup for ruby in HTML and explains its usage. Expires 28 February 1997 [Page 1] Internet Draft Ruby in HTML 28 August 1996 Table of contents 1. Introduction ................................................... 2 1.1 General ......................................................2 1.2 Notational Conventions .......................................3 2. Syntax ......................................................... 3 2.1 The RUBY Attribute ...........................................3 2.2 Usage Limitations ............................................3 2.3 Changes to the DTD ...........................................4 2.4 Nested Attributes ............................................4 3. Guidelines for Implementation .................................. 5 4. Design Considerations .......................................... 5 Bibliography .......................................................6 Author's Address ...................................................7 1. Introduction 1.1 General The Hypertext Markup Language (HTML) [RFC1866] is a simple markup language used to create hypertext documents that are platform inde- pendent. The main features for full international use of HTML are described in [HTML-I18N]. This draft describes markup for an addi- tional feature needed for international HTML, namely ruby. Ruby are short phonetic annotations for ideographic characters used throughout East Asia. Ruby are placed at the right side of their base characters for verti- cal text, and atop for horizontal text. They are rendered with about half the size of their base characters. Ruby are used frequently in Japan in most kinds of publications, such a books and magazines, but also in China, especially in schoolbooks. With the increasing international use of the WWW, new and very bene- ficial uses of ruby can also appear. In texts stored electoronically and enriched with structural markup, ruby can be very convenient for other applications than rendering. In particular, they should be of immense value for searching, indexing, and text-to-speach conversion. The name "ruby" is the name of the 5.5 point type size in British terminology; this was the size most used for ruby. In Japan, the term "furigana" is also used. Expires 28 February 1997 [Page 2] Internet Draft Ruby in HTML 28 August 1996 1.2 Notational Conventions In the examples in this document, ideographic characters are denoted as space-separated strings of uppercase letters. Annotation charac- ters are denoted by lowercase letters. 2. Syntax 2.1 The RUBY Attribute A ruby annotation is a string of ruby characters associated with a string of characters from the base text. This association is expressed by introducing an attribute RUBY to the inline elements of HTML. Examples of inline elements are , , , and . is the generic phrase-level element. Other than car- rying attributes, it does not have any particular semantics. As ruby usually are not combined with other kinds of markup, will be used most of the time to place ruby on base characters. This is an examlpe: KO HAYASHI 2.2 Usage Limitations The length of a group of base characters or the number of ruby char- acters per base character are not limited by this specification. However, authors and tools are requested to keep these numbers rea- sonably low. Otherwise, it will be very difficult even for a sophis- ticated renderer to construct an nice display. Also, this specifica- tion does not limit the types of base characters to which ruby can be attached, or of the types of characters that can be used as ruby. The length of a group of base characters, in the case of Japanese, will have an average of about two, with four or five characters still being common. For the number of ruby per base character, five is a number for which examples are known, but here also the average will be close to two. For both linguistic and typographic reasons, it is not possible to limit ruby to associate to single base characters. For Chinese texts annotated with Pinyin romanization, the average number of ruby per base character is closer to four; for Chinese texts with bopomofo annotations, the average number of ruby per base character is again around two. For other combinations of base charac- ters and ruby, these numbers can be different. Expires 28 February 1997 [Page 3] Internet Draft Ruby in HTML 28 August 1996 2.3 Changes to the DTD This section describes the changes to the HTML DTD necessary to include the RUBY attribute. The description is based on the DTD in [HTML-I18N]. In this case, the only change necessary is to add the following text to the "attrs" DTD "Macro": RUBY CDATA #IMPLIED -- phonetic annotation for ideographs -- For other versions of HTML, other changes may be necessary. 2.4 Nested Attributes If RUBY attributes are present on several levels of nested in-line elements, then these attributes are to be considered as alternatives, and not in a cumulative way. Thus for examlpe KO HAYASHI could be interpreted as KO HAYASHI to distribute the ruby evenly over the base characters, or as KO HAYASHI to allow to split ruby correctly when breaking lines between KO and HAYASHI. NOTE -- the above is designed to allow extremely sophisti- cated renderers to do high quality line breaking. The author of this draft however does not know any display algorithm or software that currently is able to perform this function, and therefore does suggest to authors that they do not use this feature. Expires 28 February 1997 [Page 4] Internet Draft Ruby in HTML 28 August 1996 3. Guidelines for Implementation This document does not specify any particular implementation for the rendering of ruby. The following are some possibilities, listed by increasing typographic quality, with some comments. - Display ruby in-line, after their base charcaters, in parentheses. In this case, an option to switch off ruby display is almost mandatory, because texts with many ruby will otherwise be diffi- cult to read. For other implementations, an option to switch off ruby display may also be a good idea, but it is not as necessary as here. - Place ruby above their base characters, with half the hight of the base characters. Use fixed spacing. In case the ruby are longer than their corresponding base characters, leave some space blank after the base characters. Always keep a group of base characters and their ruby on the same line. - Same as last solution, but expand ruby proportionally in case they are shorter than their associated base characters. - In case the ruby are longer than their associated base characters, test if previous or following characters of the base text have associated ruby. If this is not the case (particularly if these characters are not ideographic), let the ruby overlap the base characters to avoid blank space. - Use nested ruby attributes for highest-quality rendering including line-breaks (very difficult to implement). More strict implementation specifications with examples can be found in [JIS95]. 4. Design Considerations Besides the solution proposed in this document, various alternatives for ruby markup were discussed. They all turned out to be more com- plex than having ruby as an attribute, without significant additional benefits. For some more details about these proposals, please see [DUR96]. Some solutions, defining one or more elements for base characters and ruby, would have made ruby visible even by browsers not aware of the new markup. However, to provide reasonable rendering in these cases, complicated rules about the removal of parentheses would have had to be introduced. Expires 28 February 1997 [Page 5] Internet Draft Ruby in HTML 28 August 1996 Using an attribute to indicate ruby also has the disadvantage that only the whole string of ruby, but not individual characters in it, can be given a special appearance. As it is highly unlikely that such a feature was ever used anywhere, this is not really a problem. Acknowledgements I am grateful in particular to the following persons for their advice and help: Junichiro Kida, Literary Critic, Japan; Yasuo Kida, Apple Japan; Tatsuo L. Kobayashi, Just Systems, Japan; Francois Yergeau, Alis Technology, Canada; Gavin Nicol, ETB, Tokyo; Martin Brian, The SGML Centre, UK; the organizers of the 8th Unicode conference; the participants of the I18N workshop at the 1996 WWW conference in Paris. Bibliography [DUR96] M.J. Duerst, "Ruby in HTML", , May 1996. [GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed., Oxford University Press, 1990. [HTML] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan- guage - 2.0" (RFC1866), MIT/W3C, November 1995. [HTML-I18N] F. Yergeau, G. Nicol, G. Adams, and M. Duerst, "Inter- nationalization of the Hypertext Markup Language", Work in progress (draft-ietf-html-i18n-05.txt), August 1996. [JIS95] Japanese Industrial Standards Committee, "Line compo- sition Rules in Japanese", Japanese Industrial Stan- dard JIS X 4051-1995 (in Japanese). Expires 28 February 1997 [Page 6] Internet Draft Ruby in HTML 28 August 1996 Author's Address Martin J. Duerst Multimedia-Laboratory Department of Computer Science University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Switzerland Tel: +41 1 257 43 16 Fax: +41 1 363 00 35 E-mail: mduerst@ifi.unizh.ch NOTE -- Please write the author's name with u-Umlaut wherever possible, e.g. in HTML as Dürst. Expires 28 February 1997 [Page 7]