Network Working Group J. Hildebrand, Ed.
Internet-Draft Cisco Systems, Inc.
Intended status: Informational H. Flanagan, Ed.
Expires: April 30, 2015 RFC Editor
October 27, 2014

HyperText Markup Language Request For Comments Format
draft-hildebrand-html-rfc-04

Abstract

This document defines the HTML format that will be rendered from the canonical XML format for an RFC. The HTML output will include a default CSS to enable page layout.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 30, 2015.

Copyright Notice

Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

As described in [I-D.flanagan-rfc-framework], the RFC Series is changing. One of those changes includes the RFC Editor publishing a non-canonical HTML version of RFCs.

This memo describes the HTML format that will be used as one of the publication formats for the RFC Series. It defines a strict subset of HTML appropriate for RFC Series documents. The visual layout of the document will be defined through a cascading style sheet (CSS) [W3C.REC-CSS2-20110607]. The CSS will be included in the HTML file but will be described in a separate document.

2. Requirements for the HTML Format

This section lists the design requirements used to create the HTML format described in this document. These requirements build on those found in [RFC6949].

The HTML has to render correctly on a list of browsers versions that the RFC Editor will keep up to date outside of this document.

These requirements are expected to change in the future to reflect the expectation that HTML rendering will be required for current versions of browsers and platforms, while ideally continuing to render correctly on earlier versions.

The HTML documents may be re-rendered from the canonical XML format in the future to ensure the ongoing readability of the documents. The intent is that any re-rendering would be due to exceptional circumstances rather than for minor annoyances.

The HTML must display adequately in at least one text-based browser. Some consumers of the RFC series can only access the series on text- based terminals.

The HTML document will be self-contained, without requiring external files for images, CSS, JavaScript, or the like. This will allow the HTML file to be moved over various non-HTTP transports (such as e-mail, FTP, and rsync) without breakage.

Any use of JavaScript in the HTML document will not negatively impact the ability to read the document. Some consumers of the RFC series routinely disable JavaScript for security purposes.

The HTML document will allow easy local override of the default CSS formatting. This will allow users who have a different visual style that they prefer to make RFCs display with that style without having to alter the contents of the HTML document. This might also be valuable for allowing people with specific accessibility needs to use a customized CSS.

HTML tags in documents will rarely have attributes whose only purpose is to affect the rendered styling, and those will only be used if it would not be possible to specify that styling in CSS.

Both user-defined and autogenerated anchors must be supported and linkable, with user-defined anchors appearing in an "id" attribute. Autogenerated anchors will be generated for every heading, paragraph, and so on, not just those that do not have user-defined anchors. User-defined anchors may, and autogenerated anchors will, appear next to paragraphs, figures, tables, blockquotes, and section titles.

All section, subsections, figures, and paragraphs should have stable numbered link anchors. Additionally, anchors expressed in the source XML should be exposed as anchors in the HTML as well.

The HTML must make it easy to separate sections along with all of their subsections into separate files. This will make creating EPUB documents easier in the future.

The abstract must be marked up or tagged in a way that popular search engines will extract it as a summary.

The format will consist of a subset of HTML deemed to be widely- implementd by common browsers at the time the specification is created, likely to continue to be widely-implemented, and unlikely to cause security issues. This will maximize the chances that future HTML renderers (such as new web browsers) will continue to produce readable text from the HTML format without the format needing to be changed frequently.

Normative information must be easily accessible to the following consumers:

Specific instances where goals for accessibility are important in the design choices of the format have been called out in the text.

NOTE: designing for these consumers does not preclude the use of features they cannot use, but does require that key semantic data is not lost when read using the tools and settings that are required by a given constituency.

3. HTML Version

The RFC Editor will periodically determine which version of the HTML specification will be referenced for tools generating the format defined in this document. The starting version will be that defined in [W3C.PR-html5-20140916], commonly known as "HTML5". Although the HTML specification mandates several of syntax and structure rules in this document, they are called out here for emphasis.

4. HTML Syntax

The processor emitting HTML from the XML source will follow these rules:

NOTE: none of these rules affect the rendered output of the HTML, but are intended to increase the chance that difference tools that operate on the HTML source easier to write.

5. Prologue

The front matter of the HTML format contains processing information, metadata of various types, and styling information that applies to the document as a whole. This section describes HTML that is not necessarily a direct transform from the XML format. For more details on each of the tags that generate content in this section, see Section 6.

5.1. DOCTYPE

The DOCTYPE of the document is "html", which declares that the document is compliant with HTML5. The document will start with exactly this string:

<!DOCTYPE html>

5.2. Root Element

The root element of the document is <html>. This element includes a lang attribute, whose value is a [RFC5646] language tag describing the natural language of the document. The language of the RFC Series is English and so the language tag to be included is 'en'.

5.3. Head Element

The root <html> will contain a <head> element that contains the following elements, as needed.

5.3.1. Charset Declaration

In order to be correctly processed by browsers that load the HTML using a mechanism that does not provide a valid MIME content-type or charset (such as from a local file system using a "file:" URL), the HTML <head> element contains a <meta> element, with charset attribute with value 'utf-8'.

5.3.2. Document Title

The contents of the <title> element from the XML source will be placed inside an HTML <title> element in the header.

5.3.3. Document metadata

The following <meta> elements will be included:

5.3.4. Style

The <head> element contains an embedded CSS stylesheet in a <style> element. The styles in the stylesheet are to be set consistently between documents by the RFC Editor, according to the best practices of the day.

To ensure consistent formatting, individual style attributes are not used in the main portion of the document source except in highly exceptional circumstances; each use of such attributes will be individually justified.

Different readers of a specification will desire different formatting when reading the HTML versions of RFCs. To facilitate this, the <head> element also includes a <link> to a stylesheet in the same directory as the HTML file, named "rfc-local.css". Any formatting in the linked stylesheet will override the formatting in the included stylesheet.

<style>
  body {}
  ...
</style>
<link rel="stylesheet" type="text/css" href="rfc-local.css">

5.4. Document Information

Information about the document as a whole. The <dl> element with id="identifiers" is the first child element of the HTML <body> element. The defined terms in the definition list are "Workgroup:", "Series:", "Status:", "Published:", and "Authors:".

<dl id="identifiers">
  <dt>Workgroup:</dt>
    <dd class="workgroup">rfc-interest</dd>
  <dt>Series:</dt>
    <dd class="series">Internet-Draft</dd>
  <dt>Status:</dt>
    <dd class="status">Informational</dd>
  <dt>Published:</dt>
    <dd><time datetime="2014-10-25"
              class="published">2014-10-25</time></dd>
  <dt>Authors:</dt>
    <dd class="authors">
      <div class="author">
        <span class="initial">J.</span>
        <span class="surname">Hildebrand</span>
        (<span class="organization">Cisco Systems, Inc.</span>)</div>
      <div class="author">
        <span class="initial">H.</span>
        <span class="surname">Flanagan</span>
        (<span class="organization">RFC Editor</span>)</div>
    </dd>
</dl>

5.5. Table of Contents

To be documented.

5.6. Index

To be documented.

5.7. Authors' Addresses

At the end of the document, author information will be included inside an HTML <address> element.

<address class="vcard">
  <span class="n hidden">
    <span class="family-name">Hildebrand</span>
    <span class="given-name">Joe</span>
  </span>
  <span class="nickname hidden">hildjj</span>
  <span class="fn">Joe Hildebrand</span>
  <span class="org">Cisco Systems, Inc.</span>
  <a class="email"
     href="mailto:jhildebr@example.com">jhildebr@example.com</a>
  <div class="adr">
    <div class="street-address">1899 Wynkoop St, Suite 600</div>
    <div>
      <span class="locality">Denver</span>,
      <span class="region">CO</span>
      <span class="postal-code">80202<span>
    </div>
    <div class="country-name">US</div>
  </div>
</address>

Figure 1: Sample author information

5.8. IDs

HTML elements that are generated from XML elements that include an anchor attribute will use the value of the anchor attribute (prepended by "#") as the id of the corresponding HTML element. If there is no anchor attribute, the slugifiedName attribute of the contained <name> element will be used. Otherwise, the partNumber attribute will be used, where it exists.

Some HTML constructs (such as <section> [element.section]) will use multiple of these identifiers.

5.9. Pilcrows

Each paragraph, artwork, or sourcecode segment outside of a <figure> or <table> element will be appended with a space and a "pilcrow" (U+00B6: PILCROW SIGN), otherwise known as a "paragraph sign". For the purposes of clarity, in this document pilcrows are rendered as "&para;".

The pilcrow will normally be invisible unless the element it is attached to is moused over. The pilcrow will be surrounded by a link that points to the element it is attached to.

Pilcrows are never included inside a <table> or <figure> elements, since the figure number or table number serve as adequate link targets.

Elements that might otherwise contain a pilcrow do not get marked with a pilcrow if they contain one or more child elements that are marked with a pilcrow.

<blockquote id="p-1.2-1">
  <p id="p-1.2-2">Four score and seven years ago our fathers brought
    forth on this continent, a new nation, conceived in Liberty, and
    dedicated to the proposition that all men are created equal.
    <a href="#p-1.2-2" class="pilcrow">&para;</a></p>
  <!-- NO pilcrow here -->
</blockquote>

6. Elements

This section describes how each of the XML elements from [I-D.hoffman-xml2rfc] is rendered to HTML.

6.1. <abstract>

The abstract is rendered similarly to a <section> [element.section] with anchor="abstract" and <name>Abstract</name>, but without a section number.

<section id="abstract">
  <h2><a href="#abstract" class="self-ref">Abstract</a></h2>
  <p id="p-abstract-1">This document defines...
    <a href="#p-abstract-1" class="pilcrow">&para;</a>
  </p>
</section>

6.2. <address>

To be documented.

6.3. <annotation>

To be documented.

6.4. <area>

Not currently rendered to HTML.

6.5. <artwork>

Artwork can either consist of inline text or SVG. If the artwork is not inside a <figure> element, a pilcrow [pilcrows] is included. Inside a <figure> element, the figure title serves the purpose of the pilcrow.

6.5.1. Text Artwork

Text artwork is rendered inside an HTML <pre> element. Note that CDATA blocks do not work consistently in HTML, so all <, >, and & must be escaped as &lt;, &gt;, and &amp;, respectively.

The <pre> element will have CSS classes of "artwork" and "art-" prepended to the value of the <artwork>'s "type" attribute, if it exists.

<pre class="artwork art-ascii-art" id="p-2-16">
 ______________
&lt; hello, world &gt;
 --------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
<a href="#p-2-16" class="pilcrow">&para;</a></pre>
</pre>

6.5.2. SVG Artwork

SVG artwork MUST be included inline. The SVG is wrapped in a <div> element with CSS classes "artwork" and "art-svg".

<div class="artwork art-svg" id="p-2-17">
  <svg width="100" height="100">
    <circle
      cx="50" cy="50" r="40"
      stroke="green" stroke-width="4" fill="yellow" />
  </svg>
  <a href="#p-2-17" class="pilcrow">&para;</a></pre>
</div>

6.6. <aside>

This element is rendered as an HTML <aside> element, with a pilcrow [pilcrows] added.

<aside id="p-1.2-6">A little more than kin, and less than kind.
  <a href="#p-1.2-6" class="pilcrow">&para;</a>
</aside>

6.7. <author>

The <author> element from the <front> of the document is rendered into the Section 5.4, the Section 5.3.3, and the Section 5.7. See each of those sections for details.

6.8. <b>

This element is directly rendered as its HTML counterpart.

6.9. <back>

This element does not add any direct output to HTML.

6.10. <bcp14>

This element marks up words like MUST and SHOULD with an HTML <span> element with the CSS class "bcp14".

You <span class="bcp14">MUST</span> be joking.

6.11. <blockquote>

This element renders as the similar HTML <blockquote> element. If there is a "cite" attribute, it is copied to the HTML cite attribute. If there is a "quoteFrom" attribute, it is placed inside a <cite> element at the end of the quote, with an <a> element surrounding it (if there is a "cite" attribute), linking to the "cite" URL.

If the blockquote does not contain another element that gets a pilcrow [pilcrows], a pilcrow is added.

<blockquote id="p-1.2-1"
  cite="http://...">
  <p id="p-1.2-2">Four score and seven years ago our fathers
    brought forth on this continent, a new nation, conceived
    in Liberty, and dedicated to the proposition that all men
    are created equal.
    <a href="#p-1.2-2" class="pilcrow">&para;</a>
  </p>
  <cite>— <a href="http://...">Abraham Lincoln</a></cite>
</blockquote>
  

6.12. <boilerplate>

The IPR boilerplate for the document appears directly after the Abstract. The childern of the input <boilerplate> element are treated similarly to sections.

<section id="status-of-this-memo">
  <h2 id="s-boilerplate-1">
    <a href="#status-of-this-memo" class="self-ref">
      Status of this Memo</a>
  </h2>
  <p id="p-boilerplate-1-1">This Internet-Draft is submitted in full
    conformance with the provisions of BCP 78 and BCP 79.
    <a href="#p-boilerplate-1-1" class="pilcrow">&para;</a>
  </p>
...
  

6.13. <city>

This element is rendered as a <span> element with CSS class "locality".

<span class='locality'>Denver</span>

6.14. <code>

This element is rendered as a <span> element with CSS class "postal-code".

<span class="postal-code">80202<span>

6.15. <country>

This element is rendered as a <div> element with CSS class "country-name".

<div class="country-name">US</div>

6.16. <cref>

This element is rendered as a <span> element with CSS class "cref".

<span class="cref">This is a comment</div>

6.17. <date>

This element is rendered as the HTML <time> element. If the "year", "month", or "day" attribute is included on the XML element, an appropriate "datetime" element will be generated in HTML.

If this date is a child of the <front> element, it gets the CSS class "published".

<time datetime="2014-10" class="published">October 2014</time>

6.18. <dd>

This element is directly rendered as its HTML counterpart.

6.19. <displayreference>

To be documented, pending RFC editor guidance on desired semantics.

6.20. <dl>

This element is directly rendered as its HTML counterpart.

6.21. <dt>

This element is directly rendered as its HTML counterpart.

6.22. <em>

This element is directly rendered as its HTML counterpart.

6.23. <email>

This element is rendered as an HTML <a> element, with "href" attribute set to the equivalent "mailto:" URI, and CSS class of "email".

<a class="email"
   href="mailto:jhildebr@example.com">jhildebr@example.com</a>

6.24. <eref>

This element is rendered as HTML <a> element, with the "href" attribute set to the value of the "target" attribute, and the CSS class of "eref"

<a href="https://..." class="eref">the text</a>

6.25. <figure>

This element renders as the HTML <figure> element, containing the artwork or sourcecode indicated and an HTML <figcaption> element. The <figcaption> will contain an <a> element with CSS class "self-ref" around the figure number. It will also contain another <a> element with CSS class "self-ref" around the figure name, if a name was given.

<figure id="f-1">
  ...
  <figcaption>
    <a href="#f-1" class="self-ref">Figure 1.</a>
    <a href="#n-it-figures" class="self-ref">It figures</a>
  </figcaption>
</figure>

6.26. <front>

This element does not add any direct output to HTML.

6.27. <i>

This element is directly rendered as its HTML counterpart.

6.28. <iref>

To be documented, once the <iref> element is better documented.

6.29. <keyword>

Each of these elements renders its text into the <meta> keywords in the document's header, separated by commas.

<meta name="keywords" content="html,css,rfc">

6.30. <li>

This element is rendered as its HTML counterpart, however if there is no contained element that had a pilcrow [pilcrows] attached, a pilcrow is added.

<li id="p-2-7">Item <a href="#p-2-7" class="pilcrow">&para;</a></li>

6.31. <link>

To be documented, pending RFC editor guidance on desired semantics.

6.32. <middle>

This element does not add any direct output to HTML.

6.33. <name>

This element is never rendered directly, but instead when considering its parent element, such as <section> [element.section].

6.34. <note>

This element is rendered similarly to a <section> [element.section], but without a section number, and with the CSS class of "note. If the "removeInRFC" attribute is set to "yes", the generated div will also include the CSS class "rfceditor-remove".

<section id="s-note-1" class="note rfceditor-remove">
  <h2>
    <a href="#n-editorial-note" class="self-ref">Editorial Note</a>
  </h2>
  <p id="p-note-1-1">
    Discussion of this draft takes place...
    <a href="#p-note-1-1" class="pilcrow">&para;</a>
  </p>
</section>

6.35. <ol>

This element is directly rendered as its HTML counterpart, with the minor exception that if the spacing attribute has the value "compact", a CSS class of "olcompact" will be added.

6.36. <organization>

This element is rendered as an HTML <span> tag with CSS class "org".

<span class="org">Cisco Systems, Inc.</span>

6.37. <phone>

6.38. <postal>

This element renderes as an HTML <div> with CSS class "adr".

<div class="adr">...</div>

6.39. <postalLine>

This element renders as an HTML <div> with CSS class "street-address".

<div class="street-address">1899 Wynkoop St, Suite 600</div>

6.40. <refcontent>

This element renders as an HTML <span> with CSS class "refcontent".

<span class="refcontent">Self-published pamphlet</span>

6.41. <reference>

This element will render as a <dt> <dd> pair, with the defined term being the reference "anchor" attribute surrounded by square brackers, and the definition including the correct set of bibliographic information as specified by [RFC7322]. The <dt> element will have an "id" attribute of the reference anchor.

<dl class="reference">
  <dt id="RFC5646">[RFC5646]</dt>
  <dd><span class="refauthor">Phillips, A.</span> ...</dd>
</dl>

6.42. <referencegroup>

To be documented, pending RFC editor guidance on desired semantics.

6.43. <references>

If there is at least one <references> element, a "References" section is added to the document, continuing with the next major section number after the last <section> [element.section].

Each references element will be added to that "References" section as if it were a section itself.

<section id="n-references">
  <h2 id="s-3">
    <a href="#s-3" class="self-ref">3.</a>
    <a href="#n-references" class="self-ref">References</a>
  </h2>
  <section id="n-informative-references">
    <h3 id="s-3.1">
      <a href="#s-3.1" class="self-ref">3.1.</a>
      <a href="#n-informative-references" class="self-ref">
        Informative References</a></h3>
    <dl class="reference">...
    </dl>
  </section>
</section>

6.44. <region>

This element is rendered as a <span> element with CSS class "region".

<span class="region">CO<span>

6.45. <rfc>

Various attributes of this element are represented in the HTML document.

6.46. <section>

This element is rendered as an HTML <section> element, containing an appropriate level HTML heading element (<h2>-<h6>). That heading element contains a <a> element around the sectionNumber, if applicable (e.g. <abstract> does not get a section number). Another <a> element is included with the section's name.

<section id="intro">
  <h2 id="s-1">
    <a href="#s-1" class="self-ref">1.</a>
    <a href="#intro" class="self-ref">Introduction</a>
  </h2>
  <p id="p-1-1">Paragraph <a href="#p-1-1" class="pilcrow">&para;</a>
  </p>
</section>

6.47. <seriesInfo>

This element is rendered in an HTML <span> element with CSS name "seriesInfo".

<span class="seriesInfo">RFC 5646</span>

6.48. <sourcecode>

This element is rendered in an HTML <pre> with a CSS class of "sourcecode". Note that CDATA blocks do not work consistently in HTML, so all <, >, and & must be escaped as &lt;, &gt;, and &amp;, respectively. If the input XML has a "type" attribute, another CSS class of "lang-" and the type is added.

If the sourcecode is not inside a <figure> element, a pilcrow [pilcrows] is included. Inside a <figure> element, the figure title serves the purpose of the pilcrow.

<pre class="sourcecode lang-c">
#include &lt;stdio.h&gt;

int main(void)
{
    printf(&quot;hello, world\n&quot;);
    return 0;
}
</pre>

6.49. <street>

This element renders as an HTML <div> with CSS class "street-address".

<div class="street-address">1899 Wynkoop St, Suite 600</div>

6.50. <strong>

This element is directly rendered as its HTML counterpart.

6.51. <sub>

This element is directly rendered as its HTML counterpart.

6.52. <sup>

This element is directly rendered as its HTML counterpart.

6.53. <t>

This element is rendered as an HTML <p> element. A pilcrow [pilcrows] is included.

<p id="p-1-1">A paragraph.
  <a href="#p-1-1" class="pilcrow">&para;</a></p>

6.54. <table>

This element is directly rendered as its HTML counterpart.

6.55. <tbody>

This element is directly rendered as its HTML counterpart.

6.56. <td>

This element is directly rendered as its HTML counterpart.

6.57. <tfoot>

This element is directly rendered as its HTML counterpart.

6.58. <th>

This element is directly rendered as its HTML counterpart.

6.59. <thead>

This element is directly rendered as its HTML counterpart.

6.60. <title>

The title of the document appears in an <h1> element, and follows directly after the Document Information. The <h1> element has an id attribute with value "title". For example:

<h1 id="title">HyperText Markup Language Request For
    Comments Format</h1>

6.61. <tr>

This element is directly rendered as its HTML counterpart.

6.62. <tt>

This element is directly rendered as its HTML counterpart.

6.63. <ul>

This element is directly rendered as its HTML counterpart, with the minor exception that if the spacing attribute has the value "compact", a CSS class of "ulcompact" will be added.

6.64. <uri>

This element is rendered as an HTML <a> element, containing the URI as botht the "href" attribute and the linked text.

6.65. <workgroup>

This element does not add any direct output to HTML.

6.66. <xref>

This element is rendered as an HTML <a> element containing an appropriate local link as the "href" attribute.

7. IANA Considerations

This document contains no actions for IANA

8. Security Considerations

Since RFCs are sometimes exchanged outside the normal Web sandboxing mechanism (e.g., rsync to a mirror) then loaded from a local file, more care must be taken with the HTML than is ordinary on the web.

9. Acknowledgments

The authors gratefully acknowledge the contributions of: Patrick Linskey, and the members of the RFC Format Design Team (Nevil Brownlee, Ted Lemon, Paul Hoffman, Julian Reschke, Adam Roach, Alice Russo, Robert Sparks, Dave Thaler).

10. References

10.1. Normative References

[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003.
[W3C.PR-html5-20140916] Berjon, R., Faulkner, S., Leithead, T., Navara, E., O&#039;Connor, E. and S. Pfeiffer, "HTML5", World Wide Web Consortium PR PR-html5-20140916, September 2014.
[W3C.REC-CSS2-20110607] Bos, B., Celik, T., Hickson, I. and H. Lie, "Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification", World Wide Web Consortium Recommendation REC-CSS2-20110607, June 2011.
[I-D.flanagan-rfc-framework] Flanagan, H., "RFC Format Framework", Internet-Draft draft-flanagan-rfc-framework-01, September 2014.

10.2. Informative References

[RFC5646] Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 5646, September 2009.
[RFC6949] Flanagan, H. and N. Brownlee, "RFC Series Format Requirements and Future Development", RFC 6949, May 2013.
[RFC7322] Flanagan, H. and S. Ginoza, "RFC Style Guide", RFC 7322, September 2014.
[I-D.hoffman-xml2rfc] Hoffman, P., "The 'XML2RFC' version 3 Vocabulary", Internet-Draft draft-hoffman-xml2rfc-08, May 2014.

Authors' Addresses

Joe Hildebrand (editor) Cisco Systems, Inc. EMail: jhildebr@cisco.com
Heather Flanagan (editor) RFC Editor EMail: rse@rfc-editor.org