INTERNET DRAFT Davide Musella draft-musella-html-metatag-03.txt Institute for Multimedia Technologies National Research Council 24 March 1997 Expires in six months The META Tag of HTML Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in pro gress.'' To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Cost) or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Please send comments to: Davide Musella (e-Mail) davide@itim.mi.cnr.it (voice) +39.(0)2.70643271 (fax) +39.(0)2.70643292 Abstract This document defines a strict synopsis to catalogue an HTML document using the META tag of HTML. The given definition wants to define a base subset of cataloguing keys to provide a preliminary classification method. 1 - Introduction Now the synopsis of the META HTTP-EQUIV Tag is not severe, allowing so the use of different key words to define the same thing. The functions like this: or could represent the same concepts with two different syntax. The aim of this Draft is to define the words which define the content of an HTML document, without excluding a more specific classification realized with different techniques. The method used to accomplish this has been defined at the "Distributed Indexing/Searching Workshop" [http://www.w3.org/pub/www/Searching/9605- Indexing-Workshop/index.html] and foresees to use a defined prefix to indicate which is the cataloguing method used to describe a classification key. 2 - The META Tag The META element is used within the HEAD element to embed documents meta- information not defined by other HTML elements. Such information can be extracted by servers/clients for use in identifying, indexing and cataloguing specialized document meta-information. It is generally preferable to use named elements that have well defined semantics for each type of meta-information. The Meta element is provided for situations here strict SGML parsing is necessary and the local DTD is not extensible. In addition, HTTP servers can read the content of the document head to generate response headers corresponding to any element defining a value for the attribute HTTP-EQUIV. This provides document authors with a mechanism (not necessarily the preferred o ne) for identifying information that should be included in the response headers of an HTTP request. The META element has three attributes: NAME HTTP-EQUIV CONTENT It's possible to use the META tag everywhere in the HEAD part. Mor eMETA tags referring to the same string must be considered tied, combining contents (concatenated as a comma-separated list). 3 - NAME This attribute can be used to define some properties such as "number of pages" or "preferred browser" or any information an author wants to insert in his document. An example: or Do not use the META element to define information that should be associated with an existing HTML element. 4 - HTTP-EQUIV This attribute binds the element to an HTTP response header. If the semantics of the HTTP response header named by this attribute is known, then the contents can be processed based on a well defined syntactic mapping, whether or not the DTD includes anyth ing about it. An HTTP server must process these tags for a HEAD HTTP request. Do not name an HTTP-EQUIV attribute the same as a response header that should typically only be generated by the HTTP server. Some inappropriate names are "Server", "Date", and "Last-Modified". Wether a name is inappropriate depends on the particular ser ver implementation. It is recommended that servers ignore any META element that specifies HTTP equivalents (case insensitively) to their own reserved response headers. The HTTP-EQUIV attribute has the same semantic value as the NAME attribute with the only exception of the HTTP repercussions. 5 - CONTENT Used to supply a value for a named property. It can contain more than one single information. 6 - Cataloguing an HTML document To classify an HTML document it's possible to use the META tag; using this method the author can control how his document is indexed. The intention is to define a base set of meta information "normal_user oriented". The idea is that most of the authors of HTML documents have no specialist background: they are not librarian nor Internet specialists so their knowledge of the cataloguing problems is really low. A normal behavior of an Internet-user is avoiding the use of what he does not know, therefore, to improve the use of the meta information, I have defined the following keys to do a first rough catalogue of a HTML document: Author: to indicate the author/s of the document, Ex: To differentiate the name from the surname it is required to separate them with an underscore character "_" (ASCII [95]), using first the name/s and then the surname; so an example could be: Description: used to indicate the description of the document contents. It must be rationally shorter than the whole document. Ex: Keywords: to indicate the keywords of the document. It's a sequence of comma separated phrases. To represent this concept with a boolean logic, we can say that the AND operator will be represented by the SPACE (ASCII[32]) and the OR operator by the COMMA (ASCII[44]). The AND operator is processed before the OR operator. So a string like this: "Red ball, White pen" means :"(Red AND ball) OR (White AND pen)". Ex: The spaces between a comma and a word or vice versa are ignored. Language: its content specifies the language in which the document is written: it is composed by two or three language-code letters, based on ISO-639 or ISO639/2 respectively, optionally followed both by a dash (ASCII[45]) and a ISO-3166 two country -code letters to represent the national variants. Ex: Publisher: to indicate the organization responsible of the document publishing in the actual form. Ex: Timestamp: to indicate when the document is authored (HTTP date format). Ex: . The TITLE information (concerning the title of the document) is considered given by the TITLE tag content to avoid useless redundancies. It's highly recommended to use the HTTP-EQUIV properties instead of the NAME so to give the possibility to an agent to have these meta information without requiring the full document. A more complex description of the text content could be added, without erasing these meta information, using more specific techniques, like the Dublin Core or the MCF. Appendix 1 HTTP date format The HTTP date format is defined as: HTTP-date = rfc1123-date | rfc850-date | asctime-date where rfc1123-date = wkday "," SP date1 SP time SP "GMT" rfc850-date = weekday "," SP date2 SP time SP "GMT" asctime-date = wkday SP date3 SP time SP 4DIGIT but the RFC850 format and the asctime format are obsolete (they are used for backward compatibility), so it is highly recommended to use the rfc1123 format: rfc1123-date = [wkday "," SP ] date SP time date1 = 1*2DIGIT SP month SP 4DIGIT (day month year) Ex: 25 Feb 1997 time = hour zone hour = 2DIGIT ":" 2DIGIT [":" 2DIGIT] (hours:minutes[:seconds]) Ex: 22:55:30 wkday = "Mon" | "Tue" | "Wed" | "Thu" | "Fri" | "Sat" | "Sun" month = "Jan" | "Feb" | "Mar" | "Apr" | "May" | "Jun" | "Jul" | "Aug" | "Sep" | "Oct" | "Nov" | "Dec" zone = "UT" | "GMT" ; Universal Time ; North American : UT | "EST" | "EDT" ; Eastern: - 5 | - 4 | "CST" | "CDT" ; Central: - 6 | - 5 | "MST" | "MDT" ; Mountain: - 7 | - 6 | "PST" | "PDT" ; Pacific: - 8 | - 7 | 1ALPHA ; Military: Z = UT; | ( ("+" | "-") 4DIGIT ) ; Local differential ; hours+min. (HHMM) rfc1123-date examples: 28 Apr 1997 19:30 GMT Mon, 28 Apr 1997 19:30:00 GMT 28 Apr 1997 20:30 +0100