Network Working Group A. Newton
Internet-Draft VeriSign, Inc.
Expires: August 10, 2005 Y. Shafranovich
SolidMatrix Technologies, Inc.
February 9, 2005
Distributed Black/White Lists
draft-newton-shafranovich-distributed-blacklists-00
Status of this Memo
By submitting this Internet-Draft, I certify that any applicable
patent or other IPR claims of which I am aware have been disclosed,
and any of which I become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 10, 2005.
Copyright Notice
Copyright (C) The Internet Society (2005). All Rights Reserved.
Abstract
Many traditional, centrally-managed blacklists and whitelists
describe Internet end-points by characteristics such as connectivity
type or network function, and these characteristics are often used to
infer behavior from which authorization is derived. However, it is
often the case that connectivity type or network function are not
related to good or bad behavior. This document describes a means of
creating blacklists and whitelists representative of Internet
end-points based on observed behavior by many participants in a
Newton & Shafranovich Expires August 10, 2005 [Page 1]
Internet-Draft DxL February 2005
distributed monitoring network. The authors hope that distributed
lists will mitigate some of the problems associated with existing
centrally managed lists. While the concept, architecture, and data
model are general enough to be applied to any type of network
service, the authors of this document are specifically addressing the
problem of spam in blogs.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Document Terminology . . . . . . . . . . . . . . . . . . . . . 5
3. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 6
4. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 8
5. Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6. Formal XML Syntax . . . . . . . . . . . . . . . . . . . . . . 12
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.1 Normative References . . . . . . . . . . . . . . . . . . . . 15
7.2 Informative References . . . . . . . . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 15
Intellectual Property and Copyright Statements . . . . . . . . 17
Newton & Shafranovich Expires August 10, 2005 [Page 2]
Internet-Draft DxL February 2005
1. Introduction
For years, blacklists have been used as an authorization policy
mechanism for public network services, mostly email. These
centrally-managed blacklists lists can be categorized into two
groups:
o lists containing Internet end-points based on certain
characteristics, such as how they are connected to the Internet
(e.g. dial-up or residential broadband) or a type of network
function they may serve (e.g. proxy or relay)
o lists containing Internet end-points that have been observed to
exhibit certain behavior (e.g. sending unsolicited email).
Additionally, recently a smaller but evergrowing number of whitelists
have been developed and deployed to assist network administrators in
determining authorization rights for public network services.
Centrally managed whitelists usually contain positive information
about Internet end-points that is being vouched for by the party that
administers the list. In some cases this information is collected by
the administrating party independently of the end points listed, but
in many cases the party administering the list charges a fee for
inclusion, thus essentially operating an accreditation service.
Some blacklists and whitelists are do not necessarally list bad or
good information, but rather seek to provide reputation information
about Internet end points. Unfortunatly, as the case with
blacklists, reputation services tend to suffer from many of the same
problems stemming from accountability issues.
The purpose of such lists is to erradicate certain undesirable
side-effects of a highly successful network, usually unsolicited
email. However, these lists have a great tendacy to inhibit
universal network access, in many cases outweighing their perceived
benefits. For example:
o While it is true that many senders of unsolicited email (spam) use
dial-up network connections, it is not reasonable to assume that
all dial-up network connections are used to send spam: the two are
unrelated.
o Constrained by the need for human verification, many lists
specializing in observed unwanted behavior tend to mark whole
networks as bad versus specific end-points, though there is no
evidence that every end-point in a network has exhibited
undesirable behavior.
o There is often little guidance available on the criteria used to
create these lists and seldom useful information on how to correct
errors in these lists.
o In the case of whitelists, a fee chargable for accreditation and
inclusion into a whitelist may inhibit certain Internet users from
Newton & Shafranovich Expires August 10, 2005 [Page 3]
Internet-Draft DxL February 2005
obtaining network access. For example, individuals and
non-commercial users, especially ones from poorer countries may
not have the resources to pay an admission fee for inclusion into
a whitelist. If multiple whitelists become popular, the financial
burden will greatly descrease accessibility of Internet services
to those users.
For these reasons and more, these centrally-managed lists have failed
to make an impact on the spam problem and to be universally adopted.
This is all too evident given that spam continues to be a growing
problem not only in email, but slowly spreading to other network
services as well.
This document describes an architecture and data model for
Distributed Black/White Lists (DxL). The intent is to leverage an
peer to peer web-of-trust as opposed to a centrally managed list,
hopefully providing greater accuracy and understood accountability.
It should be noted, however, that the concept, architecture, and
data-model for DxLs could be applied to other network services.
However, the authors chose to target the design of DxLs toward a
relatively new type of web application called blogging.
Newton & Shafranovich Expires August 10, 2005 [Page 4]
Internet-Draft DxL February 2005
2. Document Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1].
Newton & Shafranovich Expires August 10, 2005 [Page 5]
Internet-Draft DxL February 2005
3. Motivations
Many of the problems arising in the use of blacklists and whitelists
is the fact that they are centrally managed by a third-party which
may not be accountable to or trusted by a network administrator who
wishes to use such lists. List users may also wish to express their
opinion on specific list entries or entire lists, but due to the
central nature of these lists that is not currently possible.
Additionally, many Internet users and network operators already have
existing relationships in place with others which can be utilized to
pass along blacklist and whitelist information, instead of
establishing new ones with the parties administering central lists.
In the real world, existing relationships and social networks are
often used to pass along reputation information, and the digital
world should in theory be no different. Thefore, in order to step
around the problem of trusting the party administering the central
list, we choose to distribute DxL information in a peer to peer
fashion. This gives users the ability to use their existing
relationships to establish a web of trust for the purposes of
authorizing access to public network services (which in this case are
ability to leave comments and trackbacks on blog posts, and passing
referer information). We also chose to allow lists to be combined
and passed on as new lists, thus allowing trust information to be
propogates via a social network.
Aditionally, in order to enforce accountability and transparency, we
chose to require URLs pointing to the original list from which the
information originates, URLs pointing to a removal page, and
creation/update data for all entries. While these may not be checked
for validity in all cases, nevertheless their presence indicates to
the list creators and users that these are mattersnot to be ignored.
Additionally, we believe that users will take the validity of this
information into account when trusting or not trusting specific
lists.
In order to allow flexibility for this system, we choose to add
weights to the list entries indicating the "black" or "white" value.
Many existing lists provides a binary "yes/no" decision in regards to
their entries which may not be flexible enough for all cases.
Additionally, a weight mechanism allows users to adjust weight
ratings on lists coming from other users based on their trust level.
Though this document may be the first formulization of a distributed
black/white list using XML, the concept of a peer-to-peer style
distribution of these lists has been seen in
and
Newton & Shafranovich Expires August 10, 2005 [Page 6]
Internet-Draft DxL February 2005
.
Newton & Shafranovich Expires August 10, 2005 [Page 7]
Internet-Draft DxL February 2005
4. Architecture
Unlike DNS-based blacklists [9] (known as DNSBLs) which operate over
DNS, a DxL is an XML document and is retrieved over the Internet by
using a protocol such as HTTP. This is modelled after RSS, which is
commonly found in the "blogosphere". Once retreived, a DxL is cached
for a period of time and checked for updates upon expiration. Note,
that this is not the only possible implementation or exchange
mechanism available for this data.
A DxL can be composed of entries derived from a private list based on
direct observation and other DxLs, known as component DxLs. Hence, a
DxL propogates data from many sources.
Newton & Shafranovich Expires August 10, 2005 [Page 8]
Internet-Draft DxL February 2005
5. Data Model
This section describes the data model of a DxL. The formal syntax
for a DxL is described in Section 6.
Each DxL has the following attributes:
o DxL URI - a URI pointing to the DxL
o description - a short, textual description describing the DxL
o description URI - a URI pointing to a longer description of the
DxL
o expiration date and time
o creation date and time
o last updated date and time
Each of these attributes is optional.
Each item in a DxL describes an observed instance with the following
trace data:
o either an IPv4 or IPv6 address
o a protocol identifier: either a domain name or a URI (a domain
name is RECOMMENDED given that URIs are free to manufacture)
o protocol content: domain names, URIs, or regular expressions
(regex) describing parts of content (domain names are RECOMMENDED)
- regular expressions must be typed with one of the following
identifiers:
* Perl - denotes a Perl style regular expression
* POSIX-enhanced - denotes a POSIX enhanced style regular
expression
* POSIX-basic - denotes a POSIX basic style regular expression
o proxy - a simple note indicating it was possible to detect that
the end-point served as a protocol-level proxy
o user agent
o application: text in the form of XXX.YYY where XXX is an
application name and YYY is a sub-application name - describes the
application or network service type specific to the trace data.
These values are defined as:
* web.referrer - web-based referrals
* blog.comments
* blog.trackbacks
The following are two examples of trace data from observed incidents:
1. A comment is left on a blog. The blog software records the
comment as coming from 192.0.2.1. The "URL" field was submitted
with the URI "http://example.org/foo" and the "comment" field was
submitted with the text "Buy all your foos at foo.example.org for
the lowest prices". The trace data would consist of the
following:
* an IPv4 address of 192.0.2.1
Newton & Shafranovich Expires August 10, 2005 [Page 9]
Internet-Draft DxL February 2005
* a protocol URI of http://example.org/foo
* a content domain of foo.example.org or example.org
2. An entry is left in a referrer log on a web server. The entry
shows the request coming from 192.0.10.1 with a referral URI of
http://example.com/bar. The trace data would consist of the
following:
* an IPv4 address of 192.0.10.1
* a protocol URI of http://example.com/bar or a protocol domain
name of example.com
Each item in a DxL as the following meta-data associated with it:
o URI of DxL source - taken directly from the Dxl URI of the DxL
document where the item originated
o description
o description URI
o removal URI - points to a location where instructions may be found
for removing an item from the source DxL
o method - describes what process was used to determine inclusion of
the item if it originated from a component DxL. These methods
are:
* intersection - the item was found in a component DxL and by
direct observation of this DxL publisher
* union - the item was found in a component DxL and was not
directly observed by the publisher of this DxL
* direct - the item was found only by direct obersvation
o hops - a non-negative integer indicating the number of times the
item has been derived from a component DxL. Zero indicates the
item is in the DxL of the publisher who made the observation.
o weight - a value between -1.0 and 1.0 indicating a value judgement
on the item. Values less than 0 are considered negative (i.e. a
blacklisted item) and values greater than 0 are considered
positive (i.e. a whitelisted item). Zero is considered neutral.
If value judgements are simply to be boolean (either positive or
negative), the values 1.0 and -1.0 SHOULD be used.
o expiration date and time
o created date and time
o last updated date and time
The following is an example of a DxL document:
-
Newton & Shafranovich Expires August 10, 2005 [Page 10]
Internet-Draft DxL February 2005
192.0.2.1
online-poker.com
www.online-poker.com
online-poker.com
http://www.online-poker.com/bogus
false
SpamBuddy/1.0
http://hxr.us/grumpops/dxl.xml
a persistent spammer
http://hxr.us/grumpops/dxl?item=abc123
http://hxr.us/grumpops/dxl-removal?item=abc123
intersection
0
1.0
2005-01-30T12:00:00Z
2005-01-20T12:00:00Z
2005-01-25T12:00:00Z
-
ff:ee::00
http://vegas-hotels.com/
www.vegas-hotels.com
visit.vegas-hotels.com
http://www.vegas-hotels.com/offer
http://www.vegas-hotels.com/redeem
true
SpamBuddy/1.0
http://shaftek.org/dxl.xml
a very persistent spammer
http://shaftek.org/dxl?item=def456
http://shaftek.org/dxl-removal?item=def456
intersection
1
0.7
2005-01-31T12:00:00Z
2005-01-22T12:00:00Z
2005-01-25T12:00:00Z
Newton & Shafranovich Expires August 10, 2005 [Page 11]
Internet-Draft DxL February 2005
6. Formal XML Syntax
The following describes the formal XML syntax for DxL instances using
XML Schema (see [2], [3], [5], and [4]). Implementors should note
that this is only a formalization of the syntax for creation of
interoperable processes and that an XML Schema capable parser is not
required.
This formal definition uses the XML Schema 'anyType' is places where
formal syntax definitions already exist:
o the syntax for domains is defined in [8]
o the syntax for IPv4 addresses is defined in [7]
o the syntax for IPv6 addresses is defined in [6]
In these cases, the formal syntax defers to the appropriate original
defintion.
A schema for describing
distributed black/white lists (DxL)
Newton & Shafranovich Expires August 10, 2005 [Page 12]
Internet-Draft DxL February 2005
as defined by RFC 0791
as defined by RFC 3513
as defined by RFC 1035
as defined by RFC 1035
Newton & Shafranovich Expires August 10, 2005 [Page 13]
Internet-Draft DxL February 2005
Newton & Shafranovich Expires August 10, 2005 [Page 14]
Internet-Draft DxL February 2005
7. References
7.1 Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", RFC 2119, BCP 14, March 1997.
[2] World Wide Web Consortium, "Extensible Markup Language (XML)
1.0", W3C XML, February 1998,
.
[3] World Wide Web Consortium, "Namespaces in XML", W3C XML
Namespaces, January 1999,
.
[4] World Wide Web Consortium, "XML Schema Part 2: Datatypes", W3C
XML Schema, October 2000,
.
[5] World Wide Web Consortium, "XML Schema Part 1: Structures", W3C
XML Schema, October 2000,
.
[6] Hinden, R. and S. Deering, "Internet Protocol Version 6 (IPv6)
Addressing Architecture", RFC 3513, April 2003.
[7] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981.
[8] Mockapetris, P., "Domain names - implementation and
specification", STD 13, RFC 1035, November 1987.
7.2 Informative References
[9] Levine, J., "DNS Based Blacklists and Whitelists for E-Mail",
draft-irtf-asrg-dnsbl-01.txt (work in progress), November 2004.
Authors' Addresses
Andrew L. Newton
VeriSign, Inc.
21345 Ridgetop Circle
Sterling, VA 20166
USA
Phone: +1 703 948 3382
EMail: anewton@verisignlabs.com; andy@hxr.us
URI: http://www.verisignlabs.com/
Newton & Shafranovich Expires August 10, 2005 [Page 15]
Internet-Draft DxL February 2005
Yakov Shafranovich
SolidMatrix Technologies, Inc.
EMail: YakovS@solidmatrix.com; ietf@shaftek.org
URI: http://www.shaftek.org/
Newton & Shafranovich Expires August 10, 2005 [Page 16]
Internet-Draft DxL February 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Newton & Shafranovich Expires August 10, 2005 [Page 17]