Network Working Group Jacob Palme Internet Draft Stockholm University/KTH draft-palme-text-html-issues-00.txt Sweden Category-to-be: None December 1995 Expires June 1996 Issues on sending HTML documents via MIME e-mail Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind, since this document is mainly a compilation of information taken from other RFC-s.. Distribution of this memo is unlimited. Abstract This memo discusses some issues raised by draft-palme-text-html-00.txt "The Text/HTML content type and the Content-Location MIME header or Sending HTML documents via MIME e-mail" and tries to summarize the discussion on this document. Palme [Page 1] draft-palme-text-html-issues-00.txt December 1995 Table of contents Issue 1: Syntax of embedding URL-s in message headers Issue 2: Allow hyperlinks outside multipart/related or not? Issue 3: Is multipart/related to be used at all Issue 4: Name of the Content-Location header Issue 5: Should the Base header (defined in RFC 1808) be renamed to "Content-Base"? Issue 6: Relative URL-s and the Content-Base header Issue 7: Relative URL-s referring to body parts within the same message Issue 8: The message itself as a Base URL Issue 9: Priority of bases Issue 10: Ambiguity of "Content-Base" Issue 11: Giving body parts names to be used in URL-s Issue 12: How to indicate that relative URL-s refer to body parts? Issue 13: Combination of Text/html and Multipart/Alternative Issue 14: What should "start" refer to Issue 15: Including remotely available objects in a message Issue 16: The use of mid-s and cid-s Issue 17: Content-Location or Content-Disposition Issue 1: Syntax of embedding URL-s in message headers Several different ietf memos require the embedding of URL-s in message headers: (i) The Content-Location as defined in "draft-palme-text- html-01.txt". (ii) The Base or Content-Base as defined in RFC 1808 (iii) In definition of the URL access-type in draft-ietf- mailext-acc-url-01.txt Obviously we should agree on one common way of encoding URL-s in all cases where URL-s will appear in message headers. The syntax problems are (a) Which characters need encoding, and if so which encoding scheme should be used? (b) How to handle line folding when URL-s can be very long, and blanks are not allowed in URL-s. draft-ietf-mailext-acc-url-01.txt defines this as follows: URL-parameter := <"> URL-word *(*LWSP-char URL-word) <"> URL-word := token ; Must not exceed 40 characters in length Palme [Page 2] draft-palme-text-html-issues-00.txt December 1995 The syntax of an actual URL string is given in RFC 1738. URL strings can be of any length and can contain arbitrary character content. This presents problems when URLs are embedded in MIME body part headers that are wrapped according to RFC 822 rules. For this reason they are transformed into a URL-parameter for inclusion in a message/external-body content-type specification as follows: (1) A check is made to make sure that all occurrences of SPACE, CTLs, double quotes, backslashes, and 8-bit characters in the URL string are already encoded using the URL encoding scheme specified in RFC 1738. Any unencoded occurrences of these characters must be encoded. Note that the result of this operation is nothing more than a different representation of the original URL. (2) The resulting URL string is broken up into substrings of 40 characters or less. (3) Each substring is placed in a URL-parameter string as a URL-word, separated by one or more spaces. Note that the enclosing quotes are always required since all URLs contain one or more colons, and colons are tspecial characters [RFC 1521]. Extraction of the URL string from the URL-parameter is even simpler: The enclosing quotes and any linear whitespace are removed and the remaining material is the URL string. RFC 1808 uses the following definition: base-header = "Base" ":" "" where "Base" is case-insensitive and any whitespace (including that used for line folding) inside the angle brackets is ignored. For example, the header field Which characters need encoding? Obviously any eight-bit characters in the URL must be encoded. But must ":" and "/" be encoded? Or is it enough to require <"> before and after the URL? Should <"> or "<" and ">" be used to surround the URL string? Issue 2: Allow hyperlinks outside multipart/related or not? Issue specification: Should a text/html be allowed to contain hyperlinks to any other part of the same message, or only to other parts within the same multipart/related? Palme [Page 3] draft-palme-text-html-issues-00.txt December 1995 Opinion A: The multipart/related header tells the mailer that "here comes some body parts which are to be treated together in a special way", and as a consequence that a text/html should only be allowed to refer to other body parts which are within this multipart/related group of body parts. Opinion B: A text/html body part should be allowed to contain hyperlinks to any other body part in this message (or, if CID or MID is used, any body part in any other message). Arguments for opinion A is that this makes it simpler for the mail receiving agent: When it gets a multipart/related it knows that the body parts within it are to be treated in a special way (usually stored as files, and the start object turned over to a Web browser as a helper application). The majority seems to be for opinion A. Issue 3: Is multipart/related to be used at all Some people in the discussions have proposed that just plain multipart/mixed could be used instead of multipart/related for a set of objects with hyperlinks between them. The rough consensus seems to be however that a multipart/related should be used. Issue 4: Name of the Content-Location header Opinion A: Its name should be Content-Location Opinion B: Its name should be only Location or only URL The rough consensus seems to be that its name should be Content- Location, since this is required by MIME. MIME requires that all Content headers begin with the string "Content-". Issue 5: Should the Base header (defined in RFC 1808) be renamed to "Content-Base"? Based on the discussion about the Content-Location header, it seems as if the next revision of RFC 1808 should rename the Base header into Content-Base. Issue 6: Relative URL-s and the Content-Base header Issue specification: Under which circumstances should relative URL-s be allowed in text/html body parts, and how should such relative URL-s be resolved? Palme [Page 4] draft-palme-text-html-issues-00.txt December 1995 Relative URL-s should only be allowed if their base is known. The base can be made known in either of two ways: (a) There is a BASE element in the HTML document which resolves the relative URL into a non-relative URL. (b) There is a Content-Location of the Text/HTML which can then serve as the base. (c) There is a Content-Base header (as defined in RFC 1808), giving the base to be used. Issue 7: Relative URL-s referring to body parts within the same message The base for relative URL-s can either be an external base (for example an HTTP base) in which the relative URL-s are resolved according to the scheme for the base URL, or the base can be the multipart/related set of objects within the MIME message. Issue 8: The message itself as a Base URL When the Text/HTML uses "cid" URL-s, these might be relative to the message itself. A "Content-Base: CID:://." header might be used to indicate this. Someone suggested that the relative URL-s would then be "../cid:xxx@foo.org" instead of just "cid:xxx@foo.org". Question: Does this mean that Content-ID-s need not be globally unique? If that is what it means, I am very much against it. Or is it just a way of indicating that this message contains hyperlinks of the "CID" scheme, and that these hyperlinks refer to objects in the current message, using CID URL-s? Issue 9: Priority of bases Bases for relative URL-s in Text/HTML bodies may be defined in three ways: (a) There is a BASE element in the HTML document which resolves the relative URL into a non-relative URL. (b) There is a Content-Location of the Text/HTML which can then serve as the base. (c) There is a Content-Base header (as defined in RFC 1808), giving the base to be used. Question: Suppose more than one of these three methods are used in the same message, then which of them should be used by the recipient? Palme [Page 5] draft-palme-text-html-issues-00.txt December 1995 Suggested: Priority as listed above, if more than one Base is specified, BASE elements should be used in preference of Content- Location (since this is the way HTML normally works) and Content- Location should be used in preference of Content-Base (is this the way HTTP works?? when HTTP uses the Base/Content-Base header??) Issue 10: Ambiguity of "Content-Base" Some people have pointed out in the discussion that "Content-Base" is ambiguous in a message, since it might either refer to the situation as seen by the sender or as seen by the recipient. This does not seem to me to be any problem. A Content-Base should of course have a scheme. If the scheme is for example "HTTP", then this is a base for HTTP retrieval, if the scheme is "LOCAL-FILE", then this is a base for retrieval of local files in the recipients mailbox (probably files created by saving other body parts of the same message in files). Issue 11: Giving body parts names to be used in URL-s If the text/html can contain hyperlinks referring to other body parts, then we need a way to give names to these body parts. Choice A: Use the file names in "Content-Disposition: inline/filename=" headers in the body parts. Choice B: Use the Content-ID of the body parts. Discussion: The advantage with using file names is that most Web browsers are already capable of interpreting relative URL-s which refer to file names. In fact, most Web browsers, when asked to display a file, will assume that relative URL-s within that file refer to other files in the same folder as the file to be displayed. Thus, use of file names means that existing Web browser can be made to display the text/html object if the mailer just saves the various parts of the multipart/related into files in a common folder and then turns the start object over to the Web browser. The use of Content-ID could be allowed as an alternative, but the use of file names seems to be the easiest choice. The syntax of these file names should be the subset of file name syntaxes for most platforms, which is eight characters, followed by an extension with a period and three more characters. The characters should only be Latin letters and digits, and the first character should be a letter. Palme [Page 6] draft-palme-text-html-issues-00.txt December 1995 Issue 12: How to indicate that relative URL-s refer to body parts? draft-palme-text-html-00.txt proposed a new parameter "linking" to the "Content-Type: Text/HTML" header, with the values "external", "filename", "location" and "cid" to indicate various ways of interpreting URL-s in the Text/HTML body. I was not aware, at that time, of the proposal for the "Base/Content-Base" header in RFC 1808. When the base for relative URL-s are the file names in the Content- Disposition of the referred to objects, then this should in some way be shown in the Content-Base header. I suggest the following syntax: Content-Base: "LOCAL-FILE://." where "LOCAL-FILE" is taken from RFC 1521, and "//." is taken from RFC 1808. (Check that I have correctly understood what RFC 1808 means with "//.".) Issue 13: Combination of Text/html and Multipart/Alternative When a Text/html is sent, many recipients will not be capable of displaying the html text, at least not directly, since their mailers do not support Text/html. There is therefore a need to use Multipart/Alternative. This can however be done in many ways. Choice a: The construct shown by the following example was proposed in "draft- palme-text-html-00.txt": Content-Type: Multipart/related; boundary="boundary-example-1"; type=Text/HTML; start=content-id-example@example.host --boundary-example 1 Content-Type: MULTIPART/ALTERNATIVE Boundary: boundary-example-2 --boundary-example-2 Content-Type: Text/plain ... plain text version of the document for recipients whose mailers cannot handle Text/HTML ... --boundary-example-2 Content-Type: Text/HTML Content-ID: content-id-example@example.host ... text of the HTML document ... --boundary-example-2-- --boundary-example-1 Content-Type: Image/GIF ... a body part, to which the HTML document has a link ... --boundary-example-1-- Palme [Page 7] draft-palme-text-html-issues-00.txt December 1995 An abbreviated form of this, just as a notation within this issue document, is: Multipart/related; type=Text/HTML; start=foo@bar Multipart/alternative Text/plain Text/HTML (contains hyperlink to the Image/GIF object) Content-ID: start=foo@bar Image/GIF Choice b: Same as Choice a, but use Multipart/mixed instead of Multipart/related, see issue 2 above. Choice c: Multipart/alternative Multipart/mixed Text/Plain Image/GIF Multipart/Related; type=Text/HTML; start=foo@bar Text/HTML Content-ID: start=foo@bar Message/External-body; access-type=Content-ID (pointing to the Image/GIF object) Choice d: Multipart/alternative Multipart/mixed Text/Plain Image/GIF Multipart/Related; type=Text/HTML; start=foo@bar Text/HTML Content-ID: start=foo@bar Image/GIF Choice e: Multipart/related; type=Text/HTML; start=foo@bar Image/GIF Multipart/alternative Multipart/mixed Text/plain message/external-body; access-type=cid: (pointer to the image/GIF) Text/HTML (contains hyperlink to the Image/GIF object) Content-ID: start=foo@bar Palme [Page 8] draft-palme-text-html-issues-00.txt December 1995 Choice f: multipart/mixed (Message-ID: message-unique@node.net) 1: image/gif (Content-ID: Content-Disposition: attachment; uri=./neat.gif; base=file://localhost/anypath/to_here) 2: multipart/alternative text/plain (Content-Disposition: inline; including text reference to neat.gif and that the GIF is the first part of this MIME message) text/HTML (Content-disposition: inline; file=me.html; embeds URN of mid://node..net/message-unique?BL8V3T ; or whatever the cid URN syntax is) Issue 14: What should "start" refer to Which if the following two cases should be used: Multipart/related; type=Text/HTML; start=foo@bar Multipart/alternative Text/plain Text/HTML (contains hyperlink to the Image/GIF object) Content-ID: start=foo@bar Image/GIF Multipart/related; type=Text/HTML; start=foo@bar Multipart/alternative Content-ID: start=foo@bar Text/plain Text/HTML (contains hyperlink to the Image/GIF object) Image/GIF i.e. should "start" refer to the Text/HTML or to the Multipart/Alternative"?? Issue 15: Including remotely available objects in a message There are several reasons why a sender of a message, which contains a Text/HTML body part with externally resolvable hyperlinks, might still want to include some or all of these external objects in the message. Reason i: Because some recipients may have e-mail but not full Internet access. Reason ii: To make retrieval of the body parts safer and faster for the recipient. In "draft-palme-text-html-00.txt" a new header "Content-Location" was proposed for this. Palme [Page 9] draft-palme-text-html-issues-00.txt December 1995 The issue has been raised that this should be seen as a "cached" version of the original object, and that a parameter "validity" should maybe be added to indicate the maximum cache time. Note that this does not mean that the mailer should necessarily put something in the web caches of their web browser. That is a different issue. This is just a way of saying that "if you save this object locally, we recommend a maximum saving time". Example: Content-Location: "http://www.jazzie.com/ii/internet /mailnews.html"; LIFN: 1 month. Question: Has the syntax of such a parameter already been defined in some ietf-draft or RFC? Is LIFN defined in some RFC or internet-draft? If so, can someone refer me to this definition. Issue 16: The use of mid-s and cid-s There has been a long discussion in the ietf-types mailing list about how to use mid-s and cid-s, whether cid-s can be qualified by mid-s, whether a cid URL scheme is needed or not etc. I have not understood the whole of this discussion and am not sure whether it should influence the specifications in "draft-palme-text-html-00.txt" or not. If this discussion requires changes in "draft-palme-text-html-00.txt", could someone please enlighten me on how this should be done. Issue 17: Content-disposition inline or attachment Assume a construct such as this: Multipart/related; type=Text/HTML; start=foo@bar Content-Base: "LOCAL-FILE://." Text/HTML (contains hyperlink to the Image/GIF object) Content-ID: start=foo@bar Image/GIF Content-Disposition: inline/filename=foo.GIF Should the Content-Disposition above be "inline" or "attachment"? Discussion: A mailer which does not understand Multipart/related should treat Multipart/related in the same way as Multipart/mixed. From that viewpoint, the Content-Disposition should be "inline" in case the picture is to be shown at the same time as the root text. A mailer which understands Multipart/related should know that all body parts are to be saved as files, and then turned over to an interpreter for the type of the start object. Palme [Page 10] draft-palme-text-html-issues-00.txt December 1995 "Content-Disposition: attachment" is usually interpreted as "retrieve only if the recipient asks for it" and that is not correct in this case. A third possible value of "Content-Disposition:" might be "file" which would tell the mailer to store the object as a file. Issue 18: Content-Location or Content-Disposition Someone has suggested that instead of Content-Location: "url" we should write Content-Disposition: inline; uri="url". and instead of Content-Base: "base-url" we should write Content-Disposition: inline; base="base-url" Palme [Page 11]