Network Working Group M. Hardy
Internet-Draft L. Masinter
Obsoletes: 3778 (if approved) D. Markovic
Intended status: Informational Adobe Systems Incorporated
Expires: August 27, 2017 D. Johnson
PDF Association
M. Bailey
Global Graphics
February 23, 2017

The application/pdf Media Type
draft-hardy-pdf-mime-05

Abstract

The Portable Document Format (PDF) is an ISO standard (ISO 32000-1:2008) defining a final-form document representation language in use for document exchange, including on the Internet, since 1993. This document provides an overview of the PDF format and updates the media type registration of application/pdf. It obsoletes RFC 3778.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on August 27, 2017.

Copyright Notice

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

This document is intended to provide updated information on the registration of the MIME Media Type application/pdf for documents defined in the PDF [ISOPDF], "Portable Document Format", syntax. It obsoletes [RFC3778].

PDF was originally envisioned as a way to reliably communicate and view printed information electronically across a wide variety of machine configurations, operating systems, and communication networks.

PDF is used to represent "final form" formatted documents. PDF pages may include text, images, graphics and multimedia content such as video and audio. PDF is also capable of containing auxiliary structures including annotations, bookmarks, file attachments, hyperlinks, logical structure and metadata. These features are useful for navigation, building collections of related documents and for reviewing and commenting on documents. A rich JavaScript model has been defined for interacting with PDF documents.

PDF used the imaging model of the PostScript [PS] page description language to render complex text, images, and graphics in a device and resolution-independent manner.

PDF supports encryption and digital signatures. The encryption capability is combined with access control information to facilitate management of the functionality available to the recipient. PDF supports the inclusion of document and object-level metadata through the eXtensible Metadata Platform[XMP].

2. History

PDF is used widely in the Internet community. The first version of PDF, 1.0, was published in 1993 by Adobe Systems Incorporated. Since then PDF has grown to be a widely-used format for capturing and exchanging formatted documents electronically across the Web, via e-mail and virtually every other document exchange mechanism. In 2008, PDF 1.7 was published as an ISO standard [ISOPDF], ISO 32000-1:2008. It was adopted using ISO Fast-Track process and is technically identical to Adobe Portable Document Format version 1.7 [AdobePDF] referenced by [RFC3778].

The ISO TC-171 committee is presently working on a refresh of PDF, known as ISO 32000-2, with a version of PDF 2.0, expected to be published in 2017.

In addition to ISO 32000-1:2008 and 32000-2, several subset standards have been defined to address specific use cases and standardized by the ISO. These standards include PDF for Archival (PDF/A) [ISOPDFA], PDF for Engineering (PDF/E) [ISOPDFE], PDF for Universal Accessibility (PDF/UA) [ISOPDFUA], PDF for Variable Data and Transactional Printing (PDF/VT) [ISOPDFVT], and PDF for Prepress Digital Data Exchange (PDF/X) [ISOPDFX]. The subset standards are fully compliant PDF files capable of being displayed in a general PDF viewer.

3. Fragment Identifiers

Fragment identifiers appear at the end of a URI, and provide a way to reference an anchor to subordinate content within the target of the URI, or additional parameters to the process of opening the identified content. The syntax and semantics of fragment identifiers is referenced in the media type definition.

The specification of fragment identifiers for PDF appeared originally in [RFC3778], but now will be included in ISO 32000-2 [ISOPDF2]. This section is a summary of that material. Any disagreements between that document and this should be resolved in favor of the ISO 32000-2 definition, once that has been approved.

A fragment identifier for PDF has one or more parameters, separated by the ampersand (&) or pound (#) character. Each parameter consists of the parameter name, "=" (equal), and the parameter value; lists of values are comma-separated, and parameter value strings may be URI-encoded ([RFC3986]). Parameters are processed left to right.

Coordinate values (such as <left>, <right>, <width>) are expressed in the default user space coordinate system of the document: 1/72 of an inch measured down and to the right from the upper-left corner of the (current) page. ([ISOPDF2] 8.3.2.3 "User Space")

The following parameters identify subordinate content of a PDF file, but also may be used to set the document view to make the (start of) the identified content visible:

page=<pageNum>

Identifies a specified (physical) page; the first page in the document has a pageNum value of 1.
nameddest=<name>

Identifies a named destination ([ISOPDF2] 12.3.2.4 "Named destinations").
structelem=<structID>

structID is a byte string with URI encoding; identifies the structure element with ID key within a StructElem dictionary of the document.
comment=<commentID>

The commentID is the value of an annotation name, which is defined by the NM key in the corresponding annotation dictionary (of the selected page. ([ISOPDF2] 12.5.2 "Annotation dictionaries")
ef=<name>

Identifies the embedded file where the parameter string <name> matches a file specification dictionary in the EmbeddedFiles name tree. If the "ef" parameter is not at the end of the fragment identifier, then the rest of the fragment identifier (after the ampersand or hash delimiter) is applied to the embedded file according to its own media type. This allows identification of content within the embedded file (which itself might be a PDF file).
NOTE: When opening a PDF file that is not from a trusted source, processor may choose to prompt the user or even prevent opening of the file.

These parameters also operate on the view of the PDF document when it is opened.

zoom=<scale>,<left>,<top>

<scale> is the percentage to which the document should be zoomed, where a value of 100 correspond to a zoom of 100%. <left> and <top> are optional, but both must be specified if either is included.
view=<keyword>,<position>

The arguments correspond to those found in [ISOPDF2] 12.3.2.2 "Explicit destinations". keyword is one of the keywords defined in [ISOPDF2] "Table 149: Destination syntax" with appropriate position values.
viewrect=<left>,<top>,<width>,<height>

Set the view rectangle.
highlight=<left>,<right>,<top>,<bottom>

Highlight the specified rectangle.
search=<wordList>

Open the document and search for one or more words, selecting the first matching word in the document. wordList is a string enclosed in quotation marks where individual words are separated by the space character (or %20).
fdf=<URI>

Imports data into PDF form fields. The URI is either a relative or absolute URI to an FDF or XFDF file. The fdf parameter should be specified as the last parameter to a given URI.

4. Subset Standards

Several subsets of PDF have been published as distinct ISO standards:

All of these subset standards use application/pdf media type. The subset standards are generally not exclusive, so it is possible to construct a PDF file which conforms to, for example, both PDF/A-2b and PDF/X-4 subset standards.

PDF documents claiming conformance to one or more of the subset standards use XMP metadata to identify levels of conformance. PDF processors should examine document metadata streams for such subset standards identifiers and, if apropriate, label documents as such when presenting them to the user.

5. PDF Versions

PDF format has gone through several revisions, primarily for the addition of features. PDF features have generally been added in a way that older viewers "fail gracefully", because they can just ignore features they do not recognize. Even so, the older the PDF version produced, the more legacy viewers will support that version, but the fewer features will be enabled. See [ISOPDF] Annex I, "PDF Versions and Compatibility".

6. PDF Implementations

PDF files are experienced through a reader or viewer of PDF files. For most of the common platforms in use (iOS, OS X, Windows, Android, ChromeOS, Kindle) and for most browsers (Edge, Safari, Chrome, Firefox), PDF viewing is built-in. In addition, there are many PDF viewers available for download and install. The PDF specification was published and freely available since the format was introduced in 1993, so hundreds of companies and organizations make tools for PDF creation, viewing, and manipulation.

7. Security Considerations

PDF is certainly a complex media type as per Section 4.6 of [RFC6838], which sets requirements for security analysis of media type registrations. [RFC3778] (which this document obsoletes) contained a detailed analysis of some of the security issues for PDF implementations known at the time. While the analysis isn't necessarily wrong, the threat analysis is much too limited, and the mitigations somewhat out of date. There is now extensive literature on security threats involving PDF implementations and how to avoid them, consistent with broad implementation over decades. We are not registering a new media type but rather making a primarily administrative update. With those caveats:

The PDF file format allows several constructs which may compromise security if handled inadequately by PDF processors. For example:

PDF interpreters executing any scripts or programs related to these constructs must be extremely careful to insure that untrusted software is executed in a protected environment.

In addition, the PDF processor itself, as well as its plugins, scripts etc. may be a source of insecurity, by either obvious or subtle means.

8. IANA Considerations

This document updates the registration of application/pdf, a media type registration as defined in [RFC6838]:

Type name: application

Subtype name: pdf

Required parameters: none

Optional parameter: none

Encoding considerations: binary

Security considerations: See Section 7 of this document.

Interoperability considerations: See Section 5 of this document.

Published specification: ISO 32000-1:2008 (PDF 1.7) [ISOPDF]. ISO 32000-2 (PDF 2.0) [ISOPDF2] is currently under development.

Applications which use this media type: See Section 6 of this document.

Fragment identifier considerations: See Section 3 of this document.

Additional information:

Deprecated alias names for this type: none

Magic number(s): All PDF files start with the characters '%PDF-' followed by the PDF version number, e.g., "%PDF-1.7". These characters are in US-ASCII encoding.

File extension(s): .pdf

Macintosh file type code(s): "PDF "

Person & email address to contact for further information: Duff Johnson <duff@duff-johnson.com>, Peter Wyatt <Peter.wyatt@cisra.canon.com.au>, ISO 32000 Project Leaders

Intended usage: COMMON

Restrictions on usage: none

Author: Authors of this document

Change controller: ISO; in particular, ISO 32000 is by ISO/TC 171/SC 02/WG 08, "PDF specification". Duff Johnson <duff@duff-johnson.com> and Peter Wyatt <Peter.wyatt@cisra.canon.com.au are current ISO 32000 Project Leaders.

9. References

9.1. Normative References

[ISOPDF] ISO, "Document management -- Portable document format -- Part 1: PDF 1.7", ISO 32000-1:2008, 2008.

Also available free from Adobe.

[ISOPDF2] ISO, "Document management -- Portable document format -- Part 2: PDF 2.0", ISO 32000-2

Currently under development - publication expected in 2017. This becomes a Normative Reference on approval.

9.2. Informative References

[ISOPDFX] ISO, "Graphic technology -- Prepress digital data exchange using PDF -- Part 8: Partial exchange of printing data using PDF 1.6 (PDF/X-5)", ISO 15930-8:2008, 2008.
[ISOPDFA] ISO, "Document management -- Electronic document file format for long-term preservation -- Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3)", ISO 19005-3:2012, 2012.
[ISOPDFE] ISO, "Document management -- Engineering document format using PDF -- Part 1: Use of PDF 1.6 (PDF/E-1)", ISO 24517-1:2008, 2008.
[ISOPDFVT] ISO, "Graphic technology -- Variable data exchange -- Part 2: Using PDF/X-4 and PDF/X-5 (PDF/VT-1 and PDF/VT-2)", ISO 16612-2:2010, 2010.
[ISOPDFUA] ISO, "Document management applications -- Electronic document file format enhancement for accessibility -- Part 1: Use of ISO 32000-1 (PDF/UA-1)", ISO 14289-1:2014, 2014.
[XMP] ISO, "Extensible metadata platform (XMP) specification -- Part 1: Data model, serialization and core properties", ISO 16684-1, 2012.

Not available for free, but there are a number of descriptive resources, e.g.,

[PS] Adobe Systems Incorporated, "PostScript Language Reference, third edition", 1999.
[AdobePDF] Adobe Systems Incorporated, "PDF Reference, sixth edition", 2006.
[RFC6838] Freed, N., Klensin, J. and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, DOI 10.17487/RFC6838, January 2013.
[RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005.
[RFC3778] Taft, E., Pravetz, J., Zilles, S. and L. Masinter, "The application/pdf Media Type", RFC 3778, DOI 10.17487/RFC3778, May 2004.

Appendix A. Changes since RFC 3778

This specification replaces RFC 3778, which previously defined the application/pdf Media Type. Differences include:

Authors' Addresses

Matthew Hardy Adobe Systems Incorporated 345 Park Ave San Jose, CA 95110 USA EMail: mahardy@adobe.com
Larry Masinter Adobe Systems Incorporated 345 Park Ave San Jose, CA 95110 USA EMail: masinter@adobe.com URI: http://larry.masinter.net
Dejan Markovic Adobe Systems Incorporated 345 Park Ave San Jose, CA 95110 USA EMail: dmarkovi@adobe.com
Duff Johnson PDF Association Neue Kantstrasse 14 Berlin, 14057 Germany EMail: duff.johnson@pdfa.org
Martin Bailey Global Graphics 2030 Cambourne Business Park Cambridge, CB23 6DW UK EMail: martin.bailey@globalgraphics.com URI: http://www.globalgraphics.com