dispatch J. Yasskin
Internet-Draft Google
Intended status: Informational August 30, 2017
Expires: March 3, 2018

Use Cases and Requirements for Web Packages
draft-yasskin-webpackage-use-cases-00

Abstract

This document lists use cases for signing and/or bundling collections of web pages, and extracts a set of requirements from them.

Note to Readers

Discussion of this draft takes place on the ART area mailing list (art@ietf.org), which is archived at https://mailarchive.ietf.org/arch/search/?email_list=art.

The source code and issues list for this draft can be found in https://github.com/WICG/webpackage.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on March 3, 2018.

Copyright Notice

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

People would like to use content offline and in other situations where there isn’t a direct connection to the server where the content originates. However, it’s difficult to distribute and verify the authenticity of applications and content without a connection to the network. The W3C has addressed running applications offline with Service Workers ([ServiceWorkers]), but not the problem of distribution.

Previous attempts at packaging web resources (e.g. Resource Packages and the W3C TAG’s packaging proposal) were motivated by speeding up the download of resources from a single server, which is probably better achieved through other mechanisms like HTTP/2 PUSH, possibly augmented with a simple manifest of URLs a page plans to use. This attempt is instead motivated by avoiding a connection to the origin server at all. It may still be useful for the earlier use cases, so they’re still listed, but they’re not primary.

2. Use cases

These use cases are in rough descending priority order. If use cases have conflicting requirements, the design should enable more important use cases.

2.1. Essential

2.1.1. Offline installation

Alex can download a file containing a website (a PWA) including a Service Worker from origin O, and transmit it to their peer Bailey, and then Bailey can install the Service Worker with a proof that it came from O. This saves Bailey the bandwidth costs of transferring the website.

Associated requirements:

2.1.1.1. Online use

Bailey may have an internet connection through which they can, in real time, fetch updates to the package they received from Alex.

2.1.1.2. Fully offline use

Or Bailey may not have any internet connection a significant fraction of the time, either because they have no internet at all, because they turn off internet except when intentionally downloading content, or because they use up their plan partway through each month.

Associated requirements beyond Offline installation:

2.1.2. Offline browsing

Alex can download a file containing a large website (e.g. Wikipedia) from its origin, save it to transferrable storage (e.g. an SD card), and hand it to their peer Bailey. Then Bailey can browse the website with a proof that it came from O. Bailey may not have the storage space to copy the website before browsing it.

Associated requirements beyond Offline installation:

2.1.3. Save and share a web page

Casey is viewing a web page and wants to save it either for offline use or to show it to their friend Dakota. Since Casey isn’t the web page’s author, they don’t have the private key needed to sign the page. Browsers currently allow their users to save pages, but each browser uses a different format (MHTML, Web Archive, or files in a directory), so Dakota and Casey would need to be using the same browser. Casey could also take a screenshot, at the cost of losing links and accessibility.

Associated requirements:

2.2. Nice-to-have

2.2.1. Packaged Web Publications

The W3C’s Publishing Working Group, merged from the International Digital Publishing Forum (IDPF) and in charge of EPUB maintenance, wants to be able to create publications on the web and then let them be copied to different servers or to other users via arbitrary protocols. See their Packaged Web Publications use cases for more details.

Associated requirements:

Other requirements are similar to those from Offline installation:

2.2.2. Third-party security review

Some users may want to grant certain permissions only to applications that have been reviewed for security by a trusted third party. These third parties could provide guarantees similar to those provided by the iOS, Android, or ChromeOS app stores, which might allow browsers to offer more powerful capabilities than have been deemed safe for unaudited websites.

Binary transparency for websites is similar: like with Certificate Transparency [RFC6962], the transparency logs would sign the content of the package to provide assurance that experts had a chance to audit the exact package a client received.

Associated requirements:

2.2.3. Building packages from multiple libraries

Large programs are built from smaller components. In the case of the web, components can be included either as Javascript files or as <iframe>d subresources. In the first case, the packager could copy the JS files to their own origin; but in the second, it may be important for the <iframe>d resources to be able to make same-origin requests back to their own origin, for example to implement federated sign-in.

Associated requirements:

2.2.3.1. Shared libraries

In ecosystems like Electron and Node, many packages may share some common dependencies. The cost of downloading each package can be greatly reduced if the package can merely point at other dependencies to download instead of including them all inline.

Associated requirements:

2.2.4. CDNs

CDNs want to re-publish other origins’ content so readers can access it more quickly or more privately. Currently, to attribute that content to the original origin, they need the full ability to publish arbitrary content under that origin’s name. There should be a way to let them attribute only the exact content that the original origin published.

Web Packages would allow CDNs to publish content as another site as long as the user visited a URL explicitly mentioning the CDN.

CDNs want to serve only the bytes that most optimally represent the content the current user needs, even though the origin needs to provide representations for all users. Think PNG vs WebP and small vs large resolutions.

Associated requirements:

2.2.5. Installation from a self-extracting executable

The Node and Electron communities would like to install packages using self-extracting executables. The traditional way to design a self-extracting executable is to concatenate the package to the end of the executable, have the executable look for a length at its own end, and seek backwards from there for the start of the package.

Associated requirements:

2.2.6. Ergonomic replacement for HTTP/2 PUSH

HTTP/2 PUSH ([RFC7540], section 8.2) is hard for developers to configure, and an explicit package format might be easier.

Trying to bundle resources in order to speed up page loads has a long history, including Resource Packages from 2010 and the W3C TAG’s packaging proposal from 2015.

However, the HTTPWG is doing a lot of work to let servers optimize the PUSHed data, and packaging would either have to re-do that or accept lower performance. Accepting lower performance might be worthwhile if it allows more developers to adopt the smaller optimization.

Associated requirements:

2.2.7. Packages in version control

Once packages are generated, they should be stored in version control. Many popular VC systems auto-detect text files in order to “fix” their line endings. If the first bytes of a package look like text, while later bytes store binary data, VC may break the package.

Associated requirements:

3. Requirements

3.1. Essential

3.1.1. Indexed by URL

Resources should be keyed by URLs, matching how browsers look resources up over HTTP.

3.1.2. Request headers

Resource keys should include request headers like accept and accept-language, which allows content-negotiated resources to be represented.

This would require an extension to [MHTML], which uses the content-location response header to encode the requested URL, but has no way to encode other request headers. MHTML also has no instructions for handling multiple resources with the same content-location.

This also requires an extension to [ZIP]: we’d need to encode the request headers into ZIP’s filename fields.

3.1.3. Response headers

Resources should include their HTTP response headers, like content-type, content-encoding, expires, content-security-policy, etc.

This requires an extension to [ZIP]: we’d need something like [JAR]’s META-INF directory to hold extra metadata beyond the resource’s body.

3.1.4. Signing as an origin

Resources within a package are provably from an entity with the ability to serve HTTPS requests for those resources’ origin [RFC6454].

Resources within a package are provably from an entity with the ability to serve HTTPS requests for those resources’ origin [RFC6454].

Note that previous attempts to sign HTTP messages ([I-D.thomson-http-content-signature], [I-D.burke-content-signature], and [I-D.cavage-http-signatures]) omit a description of how a client should use a signature to prove that a resource comes from a particular origin, and they’re probably not usable for that purpose.

This would require an extension to the [ZIP] format, similar to [JAR]’s signatures.

In any cryptographic system, the specification is responsible to make correct implementations easier to deploy than incorrect implementations (Section 3.1.12).

3.1.5. Random access

When a package is stored on disk, the browser can access arbitrary resources without a linear scan.

[MHTML] would need to be extended with an index of the byte offsets of each contained resource.

3.1.6. Resources from multiple origins in a package

A package from origin A can contain resources from origin B authenticated at the same level as those from A.

3.1.7. Cryptographic agility

Obsolete cryptographic algorithms can be replaced.

Planning to upgrade the cryptography also means we should include some way to know when it’s safe to remove old cryptography (Section 3.2.5).

3.1.8. Unsigned content

Alex can create their own package without a CA-signed certificate, and Bailey can view the content of the package.

3.1.9. Certificate revocation

When a package is signed by a revoked certificate, online browsers can detect this reasonably quickly.

3.1.10. Downgrade prevention

Attackers can’t cause a browser to trust an older, vulnerable version of a package after the browser has seen a newer version.

3.1.11. Metadata

Metadata like that found in the W3C’s Application Manifest [W3C.WD-appmanifest-20170828] can help a client know how to load and display a package.

3.1.12. Implementations are hard to get wrong

The design should incorporate aspects that tend to cause incorrect implementations to get noticed quickly, and avoid aspects that are easy to implement incorrectly. For example:

3.2. Nice to have

3.2.1. Streamed loading

The browser can load a package as it downloads.

This conflicts with ZIP, since ZIP’s index is at the end.

3.2.2. Cross-signatures

Third-parties can vouch for packages by signing them.

3.2.3. Binary

The format is identified as binary by tools that might try to “fix” line endings.

This conflicts with using an [MHTML]-based format.

3.2.4. Deduplication of diamond dependencies

Nested packages that have multiple dependency routes to the same sub-package, can be transmitted and stored with only one copy of that sub-package.

3.2.5. Old crypto can be removed

The ecosystem can identify when an obsolete cryptographic algorithm is no longer needed and can be removed.

3.2.6. Compress transfers

Transferring a package over the network takes as few bytes as possible. This is an easier problem than Compress stored packages since it doesn’t have to preserve Random access.

3.2.7. Compress stored packages

Storing a package on disk takes as few bytes as possible.

3.2.8. Subsetting and reordering

Resources can be removed from and reordered within a package, without breaking signatures.

3.2.9. Packaged validity information

Certificate revocation and Downgrade prevention information can itself be packaged or included in other packages.

3.2.10. Signing uses existing TLS certificates

A “normal” TLS certificate can be used for signing packages. Avoiding extra requirements like “code signing” certificates makes packaging more accessible to all sites.

3.2.11. External dependencies

Sub-packages can be “external” to the main package, meaning the browser will need to either fetch them separately or already have them. (#35, App Installer Story)

3.2.12. Trailing length

The package’s length in bytes appears a fixed offset from the end of the package.

This conflicts with [MHTML].

4. Non-goals

Some features often come along with packaging and signing, and it’s important to explicitly note that they don’t appear in the list of Requirements.

4.1. Store confidential data

Packages are designed to hold public information and to be shared to people with whom the original author never has an interactive connection. In that situation, there’s no way to keep the contents confidential: even if they were encrypted, to make the data public, anyone would have to be able to get the decryption key.

It’s possible to maintain something similar to confidentiality for non-public packaged data, but doing so complicates the format design and can give users a false sense of security.

We believe we’ll cause fewer privacy breaches if we omit any mechanism for encrypting data, than if we include something and try to teach people when it’s unsafe to use.

4.2. Generate packages on the fly

See discussion at WICG/webpackage#6.

4.3. Non-origin identity

A package can be primarily identified as coming from something other than a Web Origin.

4.4. DRM

Special support for blocking access to downloaded content based on licensing. Note that DRM systems can be shipped inside the package even if the packaging format doesn’t specifically support them.

5. Security Considerations

The security considerations will depend on the solution designed to satisfy the above requirements. See [I-D.yasskin-dispatch-web-packaging] for one possible set of security considerations.

6. IANA Considerations

This document has no actions for IANA.

7. Informative References

[I-D.burke-content-signature] Burke, B., "HTTP Header for digital signatures", Internet-Draft draft-burke-content-signature-00, March 2011.
[I-D.cavage-http-signatures] Cavage, M. and M. Sporny, "Signing HTTP Messages", Internet-Draft draft-cavage-http-signatures-07, July 2017.
[I-D.thomson-http-content-signature] Thomson, M., "Content-Signature Header Field for HTTP", Internet-Draft draft-thomson-http-content-signature-00, July 2015.
[I-D.yasskin-dispatch-web-packaging] Yasskin, J., "Web Packaging", Internet-Draft draft-yasskin-dispatch-web-packaging-00, June 2017.
[JAR] "JAR File Specification", 2014.
[MHTML] Palme, J., Hopmann, A. and N. Shelness, "MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)", RFC 2557, DOI 10.17487/RFC2557, March 1999.
[RFC6454] Barth, A., "The Web Origin Concept", RFC 6454, DOI 10.17487/RFC6454, December 2011.
[RFC6962] Laurie, B., Langley, A. and E. Kasper, "Certificate Transparency", RFC 6962, DOI 10.17487/RFC6962, June 2013.
[RFC7515] Jones, M., Bradley, J. and N. Sakimura, "JSON Web Signature (JWS)", RFC 7515, DOI 10.17487/RFC7515, May 2015.
[RFC7540] Belshe, M., Peon, R. and M. Thomson, "Hypertext Transfer Protocol Version 2 (HTTP/2)", RFC 7540, DOI 10.17487/RFC7540, May 2015.
[ServiceWorkers] Russell, A., Song, J., Archibald, J. and M. Kruisselbrink, "Service Workers 1", World Wide Web Consortium WD WD-service-workers-1-20161011, October 2016.
[W3C.WD-appmanifest-20170828] Caceres, M., Christiansen, K., Lamouri, M., Kostiainen, A. and R. Dolin, "Web App Manifest", World Wide Web Consortium WD WD-appmanifest-20170828, August 2017.
[ZIP] "APPNOTE.TXT - .ZIP File Format Specification", October 2014.

Appendix A. Acknowledgements

Author's Address

Jeffrey Yasskin Google EMail: jyasskin@chromium.org