Internet Engineering Task Force L. Masinter
Internet-Draft Adobe
Intended status: Informational January 11, 2011
Expires: July 15, 2011
MIME and the Web
draft-masinter-mime-web-info-02
Abstract
This document describes some of the ways in which parts of the MIME
system, originally designed for electronic mail, have been used in
the Web, and some of the ways in which those uses have resulted in
difficulties. Given this background and justification, this document
then goes on to outline requirements for changes to MIME registries
and practices for their use within W3C and IETF, in order to address
those difficulties. Within IETF, it is expected that a companion
Best Current Practice document will make specific changes to the
Internet Media Types and Charset registries, among others.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 15, 2011.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
Masinter Expires July 15, 2011 [Page 1]
Internet-Draft MIME and the Web January 2011
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Origins of MIME . . . . . . . . . . . . . . . . . . . . . 3
2.2. Introducing MIME into the Web . . . . . . . . . . . . . . 4
2.3. Distributed Extensibility . . . . . . . . . . . . . . . . 5
3. Problems with application to the Web . . . . . . . . . . . . . 5
3.1. Lack of clarity . . . . . . . . . . . . . . . . . . . . . 5
3.2. Differences between email and Web delivery . . . . . . . . 6
3.3. The Rules Weren't Quite Followed . . . . . . . . . . . . . 7
3.4. Consequences . . . . . . . . . . . . . . . . . . . . . . . 9
3.5. The Down Side of Extensibility . . . . . . . . . . . . . . 9
4. Additional considerations . . . . . . . . . . . . . . . . . . 9
4.1. There are related problems with charsets . . . . . . . . . 10
4.2. Embedded, downloaded, launch independent application . . . 10
4.3. Additional Use Cases: Polyglot and Multiview . . . . . . . 10
4.4. Evolution, Versioning, Forking . . . . . . . . . . . . . . 11
4.5. Content Negotiation . . . . . . . . . . . . . . . . . . . 12
4.6. Fragment identifiers . . . . . . . . . . . . . . . . . . . 12
5. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 12
5.1. Internet Media Type registration . . . . . . . . . . . . . 13
5.1.1. MIME registry magic numbers for sniffing . . . . . . . 13
5.1.2. Scripting and scriptable content safety . . . . . . . 13
5.1.3. Fragment identifiers . . . . . . . . . . . . . . . . . 13
5.1.4. Application info . . . . . . . . . . . . . . . . . . . 13
5.1.5. File extensions in registry . . . . . . . . . . . . . 14
5.2. Sniffing . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2.1. Sniffing uses Media Type magic number . . . . . . . . 14
5.2.2. Sniffing when there are multiple different
definitions . . . . . . . . . . . . . . . . . . . . . 14
5.2.3. Sniffing charsets . . . . . . . . . . . . . . . . . . 14
5.2.4. Sniffing security uses scriptability info . . . . . . 14
5.3. Changes to IANA processes for MIME registries . . . . . . 15
5.4. FTP specification . . . . . . . . . . . . . . . . . . . . 15
5.5. Update some URI definitions . . . . . . . . . . . . . . . 15
5.6. Changes to W3C findings, processes . . . . . . . . . . . . 15
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
8. Security Considerations . . . . . . . . . . . . . . . . . . . 16
9. Informative References . . . . . . . . . . . . . . . . . . . . 16
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 17
Masinter Expires July 15, 2011 [Page 2]
Internet-Draft MIME and the Web January 2011
1. Introduction
This document was prompted by discussions about Web architecture and
the difficulties surrounding evolution of the Web, Internet Media
types, multiple specifications for a single media type, and related
discussions.
The document gives some of the history of MIME and its introduction
and use in the Web Section 2. It then describes some of the current
difficulties with the use of MIME in the Web context Section 3. This
background and context is then followed by a description of changes
which would reduce some of those difficulties; the changes involve
specifications, practices, and registries within IETF and W3C
Section 5. In particular, changes to the registry and maintenance
procedures for MIME-related registries maintained by IANA are
describes.
Currently, discussion of this document is suggested on the mailing
list www-tag@w3c.org (mailing list open for subscription to all),
archives at http://lists.w3.org/Archives/Public/www-tag/.
2. History
2.1. Origins of MIME
MIME ("Multipurpose Internet Mail Extensions") was invented
originally for email, based on general principles of "messaging" (a
foundational architecture framework). The role of MIME was to extend
Internet email messaging from ASCII-only plain text, to include other
character sets, images, rich documents, etc.) [RFC1521], [RFC1522].
The basic architecture of complex content messaging is:
o Message sent from A to B.
o Message includes some data. Sender A includes standard 'headers'
telling recipient B enough information that recipient B knows how
sender A intends the message to be interpreted.
o Recipient B gets the message, interprets the headers for the data
and uses it as information on how to interpret the data.
MIME is a "tagging and bagging" specification:
tagging: How to label content so the intent of how the content
should be interpreted is known.
Masinter Expires July 15, 2011 [Page 3]
Internet-Draft MIME and the Web January 2011
bagging: How to wrap the content so the label is clear, or, if there
are multiple parts to a single message, how to combine them.
"MIME types" (renamed "Internet Media Types" in later specs
[RFC2046]) are part of the "tagging" -- a way to describe the content
of a message so that it could be used to initiate interpretation of a
message. The "Internet Media Type registry" (MIME type registry) is
where someone can tell the world what a particular label means, as
far as the sender's intent of how recipients should process a message
of that type, and the description of a recipients capability and
ability for senders.
2.2. Introducing MIME into the Web
The original World Wide Web (the 0.9 version of HTTP, see [RFC1945])
didn't have "tagging and bagging" -- everything sent via HTTP was
assumed to be HTML. However, at the time (early 1990's) other
distributed information access systems, including Gopher (distributed
menu system) and WAIS (remote access to document databases) were
adding capabilities for accessing many things other text and
hypertext and the WWW folks were considering type tagging. It was
agreed that HTTP should use MIME as the vocabulary for talking about
file types and character sets. The result was that HTTP 1.0 added
the "content-type" header, following (more or less) MIME. Later, for
content negotiation, additional uses of this technology (in 'Accept'
headers) were also added.
The differences between the use of Internet Media Types between email
and HTTP have minor:
o default charset: HTTP originally specified ISO-8859-1 as the
default character set, not US-ASCII ((NEED REF TO HTTP ISSUE see
http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20; the text
that it refers to currently is here: http://greenbytes.de/tech/
webdav/draft-ietf-httpbis-p3-payload-11.html#rfc.section.2.3.1 ))
o requirement for CRLF in plain text: in practice, Web clients
didn't restrict content to use CRLF in text/* MIME bodies.
These minor differences have caused a lot of trouble.
Masinter Expires July 15, 2011 [Page 4]
Internet-Draft MIME and the Web January 2011
2.3. Distributed Extensibility
The real advantage of using Internet Media Types to label content
meant that the Web was no longer restricted to a single format. This
one addition meant expanding from Global Hypertext to Global
Hypermedia (as suggested in a 1992 email [connolly92])
+-------------------------------------------------------------------+
| The Internet currently serves as the backbone for a global |
| hypertext. FTP and email provided a good start, and the gopher, |
| WWW, or WAIS clients and servers make wide area information |
| browsing simple. These systems even interoperate, with email |
| servers talking to FTP servers, WWW clients talking to gopher |
| servers, on and on. |
| This currently works quite well for text. But what should WWW |
| clients do as Gopher and WAIS servers begin to serve up pictures, |
| sounds, movies, spreadsheet templates, postscript files, etc.? |
| It would be a shame for each to adopt its own multimedia typing |
| system. |
| If they all adopt the MIME typing system (and as many other |
| features from MIME as are appropriate), we can step from global |
| hypertext to global hypermedia that much easier. |
+-------------------------------------------------------------------+
The fact that HTTP could reliably transport images of different
formats, for example, allowed NCSA to add
to HTML. MIME
allowed other document formats (Word, PDF, Postscript) and other
kinds of hypermedia, as well as other applications, to be part of the
Web. MIME was arguably the most important extensibility mechanism in
the Web.
3. Problems with application to the Web
Unfortunately, while the use of Internet Media Types for the Web
added incredible power, a number of problems have arisen.
3.1. Lack of clarity
Many people are confused about the purpose of MIME in the Web, its
uses, the meaning of Internet Media Types. Many W3C specifications
TAG findings and Internet Media Type registrations make what are
incorrect assumptions about the meaning and purposes of a Internet
Media Type registration.
Masinter Expires July 15, 2011 [Page 5]
Internet-Draft MIME and the Web January 2011
3.2. Differences between email and Web delivery
Some of the differences between the application contexts of email and
Web delivery determine different requirements:
o In the Web, the transfer of data is initiated differently than in
email: the "messages" with labeled content are usually HTTP
responses to a specific (GET) request (although the request is
itself a message, GET has no content). In the most common case,
then, the receiver knows more about the data before it has been
sent.
o Clients would like to know more about the content before they
retrieve it. The "tagging" is often not sufficient to know, for
example, "can I interpret this if I retrieve it", because of
versioning, capabilities, or dependencies on things like screen
size or interaction capabilities of the recipient.
o Some content isn't delivered over the HTTP (files on local file
system), or there is no opportunity for tagging (data delivered
over FTP) and in those cases, some other ways are needed for
determining file type.
Operating systems use (and continued to evolve) different systems to
determine the 'type' of something, different from the MIME tagging
and bagging:
o 'magic numbers': in many contexts, file types can be guessed by
looking for some unique string, number or pattern, which only
appears in files of that type. In circumstances where this was a
unique number, it was called a "magic number", although this
concept has been extended to other textual patterns.
o Originally MAC OS had a 4 character 'file type' and another 4
character 'creator code' for file types.
o Windows evolved to use the "file extension" -- 3 letters (and then
more) at the end of the file name -- as the initial determination
of the oveall type of a file. This practice has now extended to
other systems.
Information about these other ways of determining type (rather than
by the content-type label) were gathered for the Internet Media Type
registry; those registering types are encouraged to also describe
'magic numbers', Mac file type, common file extensions. However,
since there was no formal use of that information, the quality of
that information in the registry is haphazard.
Masinter Expires July 15, 2011 [Page 6]
Internet-Draft MIME and the Web January 2011
Finally, there was the fact that tagging and bagging might be OK for
unilaterally initiated (one-way) messaging, you might want to know
whether you could handle the data before reading it in and
interpreting it, but the Internet Media Types weren't enough to tell.
3.3. The Rules Weren't Quite Followed
The behavior of the community when the Internet Media Type registry
was designed hasn't matched expectations:
o Lots of file types aren't registered (no entry in IANA for file
types)
o For many file types that are registration, the registration is
incomplete or incorrect (people doing registration didn't
understand 'magic number' or other fields).
o The actual content deployed or created by deployed software
doesn't match the registration.
These problems arise for various reason, for example:
o The benefit of registration to the organization that designed the
file type is unclear compared to the overhead of sheperding the
registration through the process.
o Registration requires announcing product plans in advance of
product release.
o Orgnaizations are unaware of the registration process or
misinformed.
In particular, Web implementations of Internet Media Types diverged
from expected behavior:
o Browser implementors would be liberal in what they accepted, and
use what looked like a file extension in the URL and/or magic
number or other "sniffing" techniques to decide file type, without
assuming content-label was authoritative. This was necessary
anyway for files that weren't delivered by HTTP.
o HTTP server implementors and administrators didn't supply ways of
easily associating the 'intended' file type label with the file,
resulting in files frequently being delivered with a label other
than the one they would have chosen if they'd thought about it,
and if browsers *had* assumed content-type was authoritative.
Some popular servers had default configuration files that treated
any unknown type as "text/plain" (plain ext in ASCII). Since it
Masinter Expires July 15, 2011 [Page 7]
Internet-Draft MIME and the Web January 2011
didn't matter (the browsers worked anyway), it was hard to get
this fixed.
Thus, in many situations, because of poor control over server
administration or weak file-type detection in popular web server
technology, receivers might find that 'magic number' scanning was
more reliable than the actual labeled content-type.
Incorrect senders coupled with liberal readers wind up feeding a
negative feedback loop based on the robustness principle
([WikiRobust], [RFC3117]).
In addition, since the "magic number" technology is heuristic, it is
possible to have different formats all with the same "magic number"
or more generally, more than one different format that might be
reasonably "sniffed".
For example, there are cases where the reuse of one file type's magic
number for another file type is intentional -- deliberate "puns",
attempts to usurp ownership of another vendor, group, or standards
organization's control over a file format, for example.
Secondly, there are cases where a single file might match more than
one 'magic number' or recognition pattern, and different recievers
apply heuristics differently.
Finally, there are simple cases where the labeled type (text/plain,
application/octet-stream) is more general and could reasonably be
used with content which might otherwise match other patterns.
For example, the sniffing that's done by some web browsers text/
plain. If you serve it the perfectly valid text file with the
content:
Rufus
Kitty
the browser will not display it (there are intentionally mismatched
tags on the 3rd line). Something like this might come up, for
example, if you had a bug database, with links to the text of
documents that caused problems. This buggy XML, served as text/
plain, should render, but it does not in browsers that incorrectly
guess "application/xml".
The ".
[RFC1521] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet
Mail Extensions) Part One: Mechanisms for Specifying and
Describing the Format of Internet Message Bodies",
RFC 1521, .
[RFC1522] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
Part Two: Message Header Extensions for Non-ASCII Text",
RFC 1522, September 1993,
.
[RFC1945] Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext
Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996,
.
[RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046,
November 1996, .
[RFC3117] Rose, M., "On the Design of Application Protocols",
RFC 3117, November 2001,
.
[Widgets] Caceres, M., "Widget Packaging and Configuration",
.
Masinter Expires July 15, 2011 [Page 16]
Internet-Draft MIME and the Web January 2011
[WikiRobust]
"Robustness principle", 2010,
.
[connolly92]
Connolly, D., "Global Hypermedia", Oct 1992, .
[mime-sniff]
Barth, A. and I. Hickson, "Media Type Sniffing",
December 2010,
.
Author's Address
Larry Masinter
Adobe
345 Park Ave.
San Jose, 95110
USA
Phone: +1 408 536 3024
Email: masinter@adobe.com
URI: http://larry.masinter.net
Masinter Expires July 15, 2011 [Page 17]