INTERNET-DRAFT Vancouver Webpages July 2003 (Expires Feb 2004) Geographic registration of HTML documents Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This memo describes a method of registering HTML documents with a specific geographic location through means of embedded META tags. The content of the META tags gives the geographic position of the resource described by the HTML document in terms of Latitude, Longitude, and optionally Elevation in a simple, machine-readable manner. This information may be used for automated resource discovery by means of an HTML indexing agent or search engine. 1. Introduction Many resources described by HTML documents on the World-Wide-Web are associated with a particular place on the Earth's surface. While resource discovery on the Web has thus far focussed on document title and open-text keyword searching, in these cases it may be beneficial to facilitate geographic searching. Examples of this kind of resource include pages describing restaurants, shipwrecks, retail stores etc. Consumers may use this information in order to select the closest facility, and in order to navigate towards a resource by road, on Daviel,Kaegi [Page 1] July 2003 (Expires Feb 2004) foot or by other means. This draft describes a method of adding static location data to legacy HTML documents using a construct that is familiar to many HTML authors. It is intended to be concise, unambiguous, simple to use and compatible with existing editing tools. The intended use is to provide location data to Web robots that typically revisit pages every few weeks. It is anticipated that in many cases this location data will be added manually by persons unfamiliar with GIS terminology or metadata standards. For this reason a minimal data set with few options is preferred over a more complex and extensible one. The method described in this draft is not intended to preempt existing or future metadata encapsulation schemes which may better serve the needs of a particular community, such as geographic information systems (GIS). 2. Coordinate Systems Resource positions on the Earth's surface should be expressed in degrees North of Latitude, degrees East of Longitude as signed decimal numbers. Where the precision of the coordinates is such that the datum used is significant, typically more precise than one kilometre distance, positions should be converted to the WGS 84 datum [3]. Elevations, if given, should be in metres above datum. Positions given by a GPS set [4] with datum set to "WGS 84" will in most cases be adequate, of the order of 15 metres accuracy in horizontal position and 25 metres in elevation. It should be noted that elevations referred to the WGS 84 geoid will in some areas differ appreciably from those measured with respect to local datum in coastal regions, which may be Mean High Water Springs, Mean Sea Level, Higher High Water or a similar reference level, and will differ substantially from "ground level". Use of elevation is not recommended unless its value may be reliably determined. 3. Implementation HTML markup should be added to the document in the form of a META statement. This should be placed in the document head in accordance with the HTML 4 specification [1]. There are three GEO identifiers: The identifier "geo.position" is used for Latitude, Longitude and optionally Elevation data. Daviel,Kaegi [Page 2] July 2003 (Expires Feb 2004) The identifier "geo.region" is used for the country subdivision code from ISO 3166-2 [10]. The optional identifier "geo.placename" is used for a free text representation of the position, for example "city, province" or "town, county, state". For resources within the United States and Canada, the "geo.region" identifier as given by ISO 3166-2 is typically constructed from the 2-character country code [5] as used in Internet domain names, and the common 2-character State/Province codes [8][9], joined with a hyphen, for example "CA-BC" for British Columbia, Canada. Where the official subdivision code is unknown, the 2-character country code alone may be used in "geo.region", for example "DE" for Germany. The "geo.placename" identifier should not be used for indexing purposes, due to possible ambiguities in naming convention, language, word ordering and placename duplicates. It may be used for descriptive purposes. If the resource described is localized to a country or region, but not to a single point, the "geo.region" identifier may be used alone without a corresponding "geo.position" identifier. It is the intention of this draft to provide a means to associate a single point with an HTML document. Some consideration should be given to the choice of location when describing a resource, given that positioning mechanisms may provide an accuracy of the order of ten metres in horizontal position. For instance, when describing a retail store or small business, it may be more meaningful to give the position of the street entrance rather than the centroid of the property. Although the HTML specification [1] states that the name field is in general case-sensitive, these GEO tags should be recognized by compliant agents regardless of case. Coordinates should be ordered (Latitude ; Longitude) as for RFC 2426, RFC 2445 (vCard and iCal specifications) [6][7]. If elevation is given, coordinates should be ordered (Latitude ; Longitude ; Elevation). (This is at variance with common GIS practice, but better matches the intended audience of this Draft.) The Metadata Profile "http://geotags.com/geo" may be used as defined in [1] to define the geo tag properties. 4. Examples Daviel,Kaegi [Page 3] July 2003 (Expires Feb 2004) describes a resource 115 metres above datum at position 48.54 degrees North, 123.84 degrees West, while describes a resource at position 10 degrees South, 60 degrees East. describes a resource in London, Ontario, Canada, while describes a resource in London, England (Great Britain). The HTML attributes "lang", "dir" may be used to define the language and directionality for the "geo.placename" identifier as defined in [1], for instance 5. Semantics Values for latitude and longitude shall be expressed as decimal fractions of degrees. Whole degrees of latitude shall be represented by a decimal number ranging from 0 through 90. Whole degrees of longitude shall be represented by a decimal number ranging from 0 through 180. When a decimal fraction of a degree is specified, it shall be separated from the whole number of degrees by a decimal point (the period character, "."). Decimal fractions of a degree should be expressed to the precision available, with trailing zeroes being used as placeholders if required. A decimal point is optional where the precision is less than one degree. Some effort should be made to preserve the apparent precision when converting from another datum or representation, for example 41 degrees 13 minutes should be represented as 41.22 and not 41.21666, while 41 13' 11" may be represented as 41.2197. Latitudes north of the equator MAY be specified by a plus sign (+), or by the absence of a minus sign (-), preceding the designating degrees. Latitudes south of the Equator MUST be designated by a minus sign (-) preceding the digits designating degrees. Latitudes Daviel,Kaegi [Page 4] July 2003 (Expires Feb 2004) on the Equator MUST be designated by a latitude value of 0. Longitudes east of the prime meridian shall be specified by a plus sign (+), or by the absence of a minus sign (-), preceding the designating degrees. Longitudes west of the prime meridian MUST be designated by a minus sign (-) preceding the digits designating degrees. Longitudes on the prime meridian MUST be designated by a longitude value of 0. A point on the 180th meridian shall be taken as 180 degrees West, and shall include a minus sign. Any spatial address with a latitude of +90 (90) or -90 degrees will specify a position at the True North or True South Poles, respectively. The component for longitude may have any legal value. The vertical coordinate (Elevation) must be expressed in meters above WGS-84 datum. Points having zero elevation must not have a negative sign. 5.1 Interpretation Whitespace within a position value shall be ignored. An interpreting agent shall internally mark position values either valid or invalid. If a position is marked invalid, it shall not be used to index or qualify the containing document. A position having a Latitude greater than 90 degrees, or less than -90 degrees, shall be marked invalid. A position having a Longitude greater than 180 degrees, or less than -180 degrees, shall be marked invalid. Where a value is given for geo.region, and the latitude and longitude values given for geo.position fall outside the recognized boundaries of this region, the position may be marked invalid. For example, if a region of "US" is given for a location in the US mainland, the position may be marked invalid if the Latitude is negative or the Longitude is positive. No formal reliance shall be placed on the precision implicit in position data. It is likely that few content providers are qualified to determine reliable precision or accuracy data, and may use position data from other sources which does not give the datum. 6. Formal Syntax Daviel,Kaegi [Page 5] July 2003 (Expires Feb 2004) DIGIT = %x30-39 ; 0-9 PLUS = %x2B ; + MINUS = %x2D ; - DECIMAL = %x2E ; . SEMI = %x3B ; ; CRLF = %x0D.%x0A ; return, linefeed SP = %x20 ; space HTAB = %x09 ; tab WSP = SP / HTAB ; LWSP = (WSP / CRLF WSP) ; linear whitespace UCASE = %x41-5A ; A-Z HYPHEN = %x2D ; - USCORE = %x5F ; _ country = 2UCASE ; 2-letter code from ISO3166 region = 1*3UCASE / 2DIGIT ; region code from ISO3166-2 TEXT = placename = 1*TEXT delimiter = SEMI latitude = [ MINUS / PLUS ] 0*2DIGIT [ DECIMAL *DIGIT] longitude = [ MINUS / PLUS ] 0*3DIGIT [ DECIMAL *DIGIT] elevation = [ MINUS / PLUS ] 0*DIGIT [ DECIMAL *DIGIT] position = latitude longitude [ elevation ] georegion = country [ HYPHEN / USCORE region ] HTML syntax: 7. Applicability As stated in the introduction, certain HTML documents may be associated with a geographic position, while other documents are not. For proper use of the GEO tags as described in this draft, the resource described in an HTML document should be associated with a particular geographic location for the lifetime of the document. The tags may thus be properly used to describe an object fixed on the surface of the earth (or more properly, fixed in position relative to the surface of the earth) such as a retail store, a mountain peak or a railway station. They may not be used to describe a non-localized, moving, or intangible object such as a multinational company, river, aircraft or mathematical theory. The geographic position given is associated with the resource Daviel,Kaegi [Page 6] July 2003 (Expires Feb 2004) described by the HTML document, not with the physical location of the document [2], or the location of the company responsible for publishing or hosting the document. Thus, in some cases the country code used in "geo.region" may differ from the country code forming part of the host address in the document URL. Since the position given is associated with the content of the document, not the author, publishing and document conversion tools should not cache position data or store it in a template. In cases where the object being described is an area, such as a lake or a building, the position of the object should not in general be given to greater precision than the width of the object. If desired, features within the object may be described in another page and their position given with greater precision. In the case of an object such as a place of business, where only one page exists, the position of the entrance may be given rather than the position of the centroid. 8. Security Considerations This draft raises no security issues. The intended use of GEO metadata as described in this draft raises no privacy issues beyond those associated with normal use of the Web. Concern for privacy requires that personal information, such as a private address or location, not be published without the consent of the subject, and that due care be taken in the design of access control mechanisms when such personal information is present on an Internet-connected data storage system. It is axiomatic that information including location data published on a public Web page is public, and that location-based queries may suggest the present or future location of the person making them in the same manner that text queries may suggest personal interests or plans. It is suggested that publishing tools clearly indicate when potentially sensitive metadata that is normally not visible, such as position, author's name or address, is published to a public area. Use of GEO metadata in an incorrect manner or in a manner other than that described may raise privacy issues. For instance, a publishing system that incorrectly places the author's location on every page, and a mobile device which transmits its current location, both raise potential privacy issues. An example of such a mobile device is an embedded diagnostic system in an automobile. Automatic inclusion of position data may lead to the users location being determined remotely. In such a case, the device Daviel,Kaegi [Page 7] July 2003 (Expires Feb 2004) should be equipped with appropriate encryption and access controls to ensure the privacy of the user. Specification of such access controls is outside the scope of this draft. 9. Internationalization considerations The "geo.placename" tag content is free text, and should obey the internationalization rules of HTML 4. "lang" and "dir" modifiers may be used to specify the language of the content. Multiple instances of geo.placename may be used with different "lang" modifiers. Geo.placename content is coded using the character set of the containing document. Geo.position and geo.region tag content should use US-ASCII or UTF-8. 10. References [1] Raggett, Le Hors, Jacobs, "HTML 4.01 Specification", http://www.w3.org/TR/html4/ , W3C, December 1999 [2] Davis et al., "A Means for Expressing Location Information in the Domain Name System", RFC 1876, January 1996 http://www.ietf.org/rfc/rfc1876.txt [3] United States Department of Defense; DoD WGS-1984 - Its Definition and Relationships with Local Geodetic Systems; Washington, D.C.; 1985; Report AD-A188 815 DMA; 6127; 7-R- 138-R; CV, KV; [4] ARINC Research Corporation, "Navstar GPS Space Segment / Navigation User Interfaces", IRN-200C-002, September 1997 [5] International Organization For Standardization / Organisation Internationale De Normalisation (ISO), "Standard ISO 3166-1:1997: Codes for the Representation of Names of Countries and their subdivisions -- Part 1: Country codes", 1997. [6] Dawson & Stenerson, Internet Calendaring and Scheduling Core Object Specification (iCalendar), RFC 2445, November 1998 http://www.ietf.org/rfc/rfc2445.txt [7] Dawson & Howes, vCard MIME Directory Profile, RFC 2426, September 1998 http://www.ietf.org/rfc/rfc2426.txt Daviel,Kaegi [Page 8] July 2003 (Expires Feb 2004) [8] United States Postal Service, Official Abbreviations - States and Possessions, http://www.usps.gov/ncsc/lookups/abbr_state.txt [9] Canada Postal Guide, Province and Territory Symbols http://www.canadapost.ca/tools/pg/manual/b03-e.asp [10] International Organization For Standardization / Organisation Internationale De Normalisation (ISO), "Standard ISO 3166-2:1998: Codes for the Representation of Names of Countries and their subdivisions -- Part 2: Country subdivision code", 1998. 11. Acknowledgments Rohan Mahy and Patrik F"altstr"om of Cisco Systems, for semantics. 12. Author's Address Andrew Daviel, BSc. Vancouver Webpages, Box 357 185-9040 Blundell Rd Richmond BC V6Y 1K3 Canada Tel. (604)-377-4796 Fax. (604)-270-8285 mailto:andrew@vancouver-webpages.com Felix A. Kaegi Dipl.Informatik Ing. ETH (M.Sc.) Friedensgasse 51 CH-4056 Basel SWITZERLAND +41 61 383 10 01 felix_kaegi@hotmail.com 13. Full Copyright Statement Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are Daviel,Kaegi [Page 9] July 2003 (Expires Feb 2004) included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Daviel,Kaegi [Page 10]