Internet Engineering Task Force E. Zierau, Ed.
Internet-Draft The Royal Danish Library
Intended status: Informational December 8, 2017
Expires: June 11, 2018

A Persistent Web IDentifier (PWID) URN Namespace
draft-pwid-urn-specification-01

Abstract

This document specifies a Uniform Resource Name (URN) for Persistent Web IDentifiers to web material in web archives using the 'pwid' namespace identifier. The purpose of the standard is to support general, global, sustainable, humanly readable, technology agnostic, persistent and precise web references for web materials.

The PWID URN can assist in two ways: First, by providing potential resolvable precise and persistent reference scheme for documents, which is not sufficiently covered by existing web reference practices. Second, by providing a standardized way to specify web elements in a web collection also known as web corpus. Definitions of web collections are often needed for extraction of data used in production of research results, e.g. for evaluations in the future. Current practices today are not persistent as they often use some CDX version, which vary for different implementations.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on June 11, 2018.

Copyright Notice

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

The purpose of the PWID URN is to represent general, global, sustainable, humanly readable, technology agnostic, persistent and precise web archive resource references - in a way that can be used for technical solutions.

The motivation for defining a PWID namespace is the growing challenge of references to web resources, - both regarding referencing web resources from papers and regarding definition of web collection/corpus - calls for action now, which can be provided with PWID as a URN, allthough the prefixing with "urn:" is not ideal. In deatail the challenges are:

For the sake of usability and sustainability, the definition of the PWID URN is focused on only having the minimum required information to make a precise identification of a resource in an arbitrary web archive. Resent research have found that this is obtain by the following information [ResawRef]:

The PWID URN represents this information in an unambiguous way, and thus enabling technical solutions to be defined in this URN.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

2. Namespace Registration Template

Namespace Identifier:

Version:

Date:

Registrant:

Purpose:

                 pwid-urn  = = "urn" ":" pwid-NID ":" pwid-NSS 
               
                 pwid-NID = "pwid"
                 pwid-NSS = archive-id ":" archival-time ":" coverage-spec 
                 ":" archived-item
               
                 archive-id  = +( unreserved )
               
                 archival-time   = full-date datetime-delim full-pwid-time
                 datetime-delim  = "T"
                 full-pwid-time  = time-hour [":"] time-minute [":"] time-second "Z"
               
                 coverage-spec    = "part" / "page" / "subsite" / "site" 
                 / "collection" / "recording" / "snapshot"
                 / "other"
               
                 archived-item = URI / archived-item-id
                 archived-item-id  = +( unreserved )
               

Syntax:

Assignment:

Security and Privacy:

Interoperability:

Resolution:

Documentation:

Additional Information:

Revision Information:

3. Acknowledgements

A special thanks to Caroline Nyvang and Thomas Kromann who have contributed to the research identifying the minimum information required in a persistent web reference, and to Bolette Jurik contributed with supplementary research concerning requirements for web collection/copora definitions. Also thanks to all that have contributed to this work with the research and reviewing this RFC.

4. References

4.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002.
[RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, January 2008.
[RFC8141] Saint-Andre, P. and J. Klensin, "Uniform Resource Names (URNs)", RFC 8141, DOI 10.17487/RFC8141, April 2017.

4.2. Informative References

[DOI] International DOI Foundation, "The DOI System", 2016.

pwid:archive.org:2016-10-20_22.26.35:site:https://www.doi.org/

[DraftPwidUri] Zierau, E., "DRAFT: Scheme Specification for the pwid URI, version 3", December 2017.
[IPRES] Zierau, E., Nyvang, C. and T. Kromann, "Persistent Web References - Best Practices and New Suggestions", October 2016.

In: proceedings of the 13th International Conference on Preservation of Digital Objects (iPres) 2016, pp. 237-246

[ISO28500] International Organization for Standardization, "Information and documentation -- WARC file format", 2017.
[ISO8601] International Organization for Standardization, "Data elements and interchange formats -- Information interchange -- Representation of dates and times", 2004.
[RESAW] The Resaw Community, "A Research infrastructure for the Study of Archived Web materials", 2017.

pwid:archive.org:2017-05-29_11.31.50Z:site:http://resaw.eu/

[ResawColl] Jurik, B. and E. Zierau, "Data Management of Web archive Research Data", 2017.

In: proceedings of the RESAW 2017 Conference, DOI: 10.14296/resaw.0002

[ResawRef] Nyvang, C., Kromann, T. and E. Zierau, "Capturing the Web at Large - a Critique of Current Web Referencing Practices", 2017.

In: proceedings of the RESAW 2017 Conference, DOI: 10.14296/resaw.0004

[RFC2141] Moats, R., "URN Syntax", RFC 2141, DOI 10.17487/RFC2141, May 1997.
[RFC6068] Duerst, M., Masinter, L. and J. Zawinski, "The 'mailto' URI Scheme", RFC 6068, DOI 10.17487/RFC6068, October 2010.
[W3CDTF] W3C, "Date and Time Formats: note submitted to the W3C. 15 September 1997", 1997.

W3C profile of ISO 8601 pwid:archive.org:2017-04-03_03.37.42Z:page:http://www.w3.org/TR/NOTE-datetime

Author's Address

Eld Maj-Britt Olmuetz Zierau (editor) The Royal Danish Library Soeren Kierkegaards Plads 1 Copenhagen, 1219 Denmark Phone: +45 9132 4690 EMail: elzi@kb.dk