Network Working Group J. Levine
Internet-Draft Taughannock Networks
Intended status: Experimental March 31, 2015
Expires: October 2, 2015
Encoding mailbox local-parts in the DNS
draft-levine-dns-mailbox-00
Abstract
Many applications would like to store per-mailbox information
securely in the DNS. Mapping mailbox local-parts into the DNS is a
difficult problem, due to the fuzzy matching that most mail systems
do, and the DNS design that only does exact matching. We propose
several experimental approaches that attempt to implement the
required fuzzy matching through DNS queries.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 2, 2015.
Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
Levine Expires October 2, 2015 [Page 1]
Internet-Draft DNS mailbox March 2015
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3
2. Summary of the approaches . . . . . . . . . . . . . . . . . . 3
3. Literal bytes . . . . . . . . . . . . . . . . . . . . . . . . 3
4. Encoded bytes . . . . . . . . . . . . . . . . . . . . . . . . 4
4.1. Static or Dynamic name servers . . . . . . . . . . . . . 4
4.2. All names valid . . . . . . . . . . . . . . . . . . . . . 5
5. Encoded regular expressions . . . . . . . . . . . . . . . . . 5
5.1. Representing the DFA in the DNS . . . . . . . . . . . . . 6
5.2. Matching a local-part against a DFA . . . . . . . . . . . 6
6. Pointer to server . . . . . . . . . . . . . . . . . . . . . . 7
7. Scaling Issues . . . . . . . . . . . . . . . . . . . . . . . 7
8. Security Considerations . . . . . . . . . . . . . . . . . . . 8
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8
9.1. Normative References . . . . . . . . . . . . . . . . . . 8
9.2. Informative References . . . . . . . . . . . . . . . . . 9
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9
1. Introduction
E-mail mailboxes consist of a local-part (sometimes informally called
left hand side or LHS), an @-sign and a domain name. While the
domain name works like any other domain name, the local-part can
contain any ASCII characters, up to 64 characters long. Mailboxes in
Internationalized mail [RFC6532] can contain arbitrary UTF-8
characters in the local-part, not just ASCII. (The domain name also
can contain UTF-8 U-labels, but the process to translate U-labels to
ASCII A-labels for DNS resolution is well defined and is not further
addressed here.) While the DNS protocol is mostly 8-bit clean, other
than ASCII case folding, DNS provisioning software often works poorly
when dealing with characters outside the ASCII set.
Mail systems usually handle variant forms of local-parts. The most
common variants are upper and lower case, which are now invariably
treated as equivalent. But many other variants are possible. Some
systems allow and ignore "noise" characters such as dots, so local
parts johnsmith and John.Smith would be equivalent. Many systems
allow "extensions" such as john-ext or mary+ext where john or mary is
treated as the effective local-part, and the ext is passed to the
recipient for further handling. Yet other systems use an LDAP or
other directory to do approximate matching, so an address such as
john.smith might also match jsmith so long as there's no other
address that also matches.
Levine Expires October 2, 2015 [Page 2]
Internet-Draft DNS mailbox March 2015
[RFC5321] and its predecessors have always made it clear that only
the recipient MTA is allowed to interpret the local-part of an
address:
"... due to a long history of problems when intermediate hosts
have attempted to optimize transport by modifying them, the local-
part MUST be interpreted and assigned semantics only by the host
specified in the domain part of the address." (Sec 2.3.11.)
This presents a problem when attempting to map local-parts into the
DNS, since the DNS only handles exact matchies, and clients cannot
make any assumptions about variants of local parts, and hence cannot
try to normalize variants to a "standard" version published in the
DNS.
This document suggests some approaches to shoehorn local-parts into
the DNS. Since none of them have, to our knowledge, been implemented
they are all presented as experiments, with the hope that people
implement them, see how well the work, and perhaps later select one
of them for standardization.
1.1. Definitions
The ABNF terms "mailbox" and "local-part" are used as in [RFC5321].
2. Summary of the approaches
o Literal bytes: put the local-part directly into the DNS as a name
o Encoded bytes: encode the local-part into names consisting of
letters and digits
o Regex: encode the set of names as a Deterministic Finite Automaton
(DFA) corresponding to a regular expression that matches the valid
names.
o Pointer to server: securely identify an http server that will
handle the lookup.
3. Literal bytes
Since the DNS protocol is mostly 8-bit clean, just put the local-part
into the DNS as is. The suggested separator is _lmailbox so the
address Bob.Smith@example.com would be represented as:
Bob\.Smith._lmailbox.example.com
Levine Expires October 2, 2015 [Page 3]
Internet-Draft DNS mailbox March 2015
(The \. is the master file convention for a literal dot in a name.)
The maximum length of a local-part is 64 characters, while a DNS name
component is limited to 63, but actual local-parts of 64 characters
are vanishingly rare. It also cannot distinguish between upper and
lower case ASCII characters, but MTAs that do not treat them the same
are also very rare.
This has the benefit of simplicity--the server can directly see
exactly what name the client is looking up. Its disdvantage is that
some authoritative DNS servers do not handle names well if they
contain characters outside the usual ASCII printing character set.
Its other characteristics are similar to those for encoded bytes,
described next.
4. Encoded bytes
To avoid problems with characters in DNS names, we can encode the
local-part with a simple reversible transformation. To preserve
lexical order, which might be useful, take the local-part, pad it out
to 64 bytes with xFF bytes, which are invalid both in ASCII and UTF-
8, break the string into two 32 byte chunks. Then encode each chunk
as 52 base32 characters, with each 5-bit section represented as a
character from the sequence 0-9a-v. Then use the encoded high part,
a dot, and the encoded low part as end of the DNS name. The
suggested separator is _emailbox so the address Bob.Smith@example.com
would be represented as:
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvg.
89nm4bijdlkn8q7vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvg.
_emailbox.example.com
(The name is is displayed on several lines to make it fit in the
margins, but the actual name is one long string delimited by dots.)
4.1. Static or Dynamic name servers
A mail server with a small set of variants could export the names as
either literal or encoded bytes to be served by an ordinary
authoritative DNS server. A mail server with the more typical wide
range of variants could be lashed up to a special purpose DNS server
that recovers the local-part from the literal or encoded bytes,
figures out what key it corresponds to, and synthesizes an OPENPGPKEY
or SMIMEA or whatever, or NXDOMAIN if there isn't one.
Levine Expires October 2, 2015 [Page 4]
Internet-Draft DNS mailbox March 2015
4.2. All names valid
Synthesizing NXDOMAIN responses is likely to be hard, due to the
difficulty of figuring what the valid addresses above and below it
are (or even worse, the NSEC3 hashes.) Also, a static zone with NSEC
is easily enumerated, which would leak the set of mailboxes in the
domain.
A dynamic server has the option of returning a record for every query
for a syntactically valid encoded name, i.e. anything that is two
names of 52 characters from the set [0-9a-v]. If there is no key for
the mailbox (which may mean the mailbox doesn't exist or that it does
exist but doesn't have a key), the key field in the record is zero
length. This makes dynamic DNSSEC somewhat easier, since the server
doesn't have to synthesize NXDOMAIN responses for valid encoded
names, and for other names it is straightforward to compute the
nearest possible encoded names. It also makes it unproductive to try
to enumerate the names in the domain.
5. Encoded regular expressions
Many variant local-parts are easily described using regular
expressions. For example, the local-parts matching "bobsmith" on a
system that ignores ASCII case distinctions and allows dots between
the characters would be described as
"[Bb].?[Oo].?[Bb].?[Ss].?[Mm].?[Ii].?[Tt].?[Hh]". The local-parts
for the address "bob" with optional + extensions would be
"[Bb][Oo][Bb](+.*)?" For typical variant rules, it is
straightforward to generate the regular expressions, and even for
variants not easily described by patterns, it is possible to
enumerate the distinct variants, e.g.
"([Bb][Oo][Bb]|[Bb][Oo][Bb][Bb][Yy]|[Rr][Oo][Bb][Ee][Rr][Tt])".
Regular expressions are equivalent to Deterministitic Finite Automata
(DFA), often called state machines, and algorithms to translate
betwen them are well known. See, for example, chapter 3 of [ASU86].
Lexical analyzer generators such as lex [LESK75] take a collection of
regular expressions and translate them into a DFA that can be used to
match the regular expressions against input strings efficiently.
This approach stores the state machine in the DNS, to allow DNS
clients to efficiently match valid local-parts against the regular
expression. The state machine in a DFA consists of a set of states,
conventionally identified by decimal numbers. Each state can be a
terminal state, which means that if the input is at the end of the
string, the regular expression has matched. The state also has a set
of transitions, pairs of (character,state) that tell the DFA to
switch to the given state based on the next input character.
Levine Expires October 2, 2015 [Page 5]
Internet-Draft DNS mailbox March 2015
To match an input string, the client starts at state zero, then uses
each character in the input string (in this case the local-part) to
choose a next state. If at any stage the character does not have a
corresponding next state, the match fails. If at the end of the
string, the final state is a terminal state, the match succeeded and
the terminal state identifies which regular expression it matched.
The DFA matcher here is considerably simpler than the one that lex
and similar programs use, since they repeatedly match expressions
against a long string of input while in this application there is one
input string that either matches or not.
5.1. Representing the DFA in the DNS
Each state in the DFA is represented by a collection of DNS names and
records. We define a new DFA record that contains a single 16-bit
field, which is the state number of the next state. Most records are
of the form:
cccc.ddd._rmailbox.example.com IN DFA 123
In this example, ddd is the current state number as a decimal number,
and cccc is the hex value of the next character. Non-terminal states
have a DFA record to identify the next state. Terminal states (which
may also be non-terminal states if one local-part is a prefix of
another) have key records such as SMIMEA.
For wildcards, written as "." in regular expressions, the cccc is a *
DNS wildcard. The DNS closest encloser rule allows states where a
few characters have specific matches, and everything goes to a
default state, as in situations were a user calls out a few specific
address extensions, e.g. "bob-dnslist" and "bob-jokes" and every
other extension matches "bob-*".
Once the local-parts are compiled into the state machine records,
they are an ordinary DNS zone that can be served by an ordinary
authoritative server.
5.2. Matching a local-part against a DFA
Start by turning the local-part into a list of characters. For
traditional ASCII local-parts, the characters are octets, for
internationalized local-parts the characters are Unicode characters,
which may be represented by several UTF-8 octets. Set the state
number to zero, which is by convention the initial state.
For each character, create a DNS name using the hex code of the
current character, the current state and _rmailbox.domain. If this
is not the last character in the local-part, look up a DFA record to
Levine Expires October 2, 2015 [Page 6]
Internet-Draft DNS mailbox March 2015
find the next state. If the DFA record is found, use its value as
the next state and advance to the next character. If there is no DFA
record, stop, there is no key for this name.
If this is the last character of the local-part, look up whatever key
record is desired. If it's found, it's the key for the local-part.
If not, there is no key.
As a minor optimization, state number 65535 means a trailing wildcard
that matches the rest of the local-part. This permits more efficient
matching of the common extension idioms such as "bob+*" without
having to iterate through the characters in the extension. If a
retrieved DFA record contains 65535, the client fetches the key
record at the same name.
6. Pointer to server
Rather than trying to encode local-parts into the DNS, publish a
pointer to a per-domain web server that can provide the keys,
identified by URI RR [URIRR]. Each key type will have to register a
new enumservice [RFC6117] type for naming the URI record, e.g.:
_smimecert._smtp.example.com URI 0 0 "https://keyserver.example.com/
smimecerts"
The URI has to be https, with the name suitably verified by TLSA
certificates. To find a key, take the URI, add "?mailbox={escmbx}"
where {escmbx} is the full mailbox name suitably hex escaped for a
URI, and fetch it. The server will either return a result as
application/pgp-keys or application/pkix-cert or other appropriate
type or a 4xx status if there is no key available.
This is certainly slower than a single DNS lookup, but it's probably
not slower than the sequence of lookups for the DFA encoding, and
it's about the same speed as the subsequent SMTP session to send a
message, so it' probably fast enough.
7. Scaling Issues
Mail systems vary from tiny home systems with a handful of users to
giant public systems with upwards of 400 million users. Signing and
publishing a zone with one key per user for a large mail system would
likely exceed the capacity of much DNS software. For comparison, the
largest signed zone is probably the .COM TLD, with about 280 million
records and 117 million names. Considering the large size of key
records, a zone with one key per user for a large mail system could
easily be an order of magnitude larger. Hence, any approach that
Levine Expires October 2, 2015 [Page 7]
Internet-Draft DNS mailbox March 2015
requires putting all of the keys into a static signed zone is unikely
to be practical at scale.
With this in mind, the more promising approaches appear to be encoded
names (Section 4), which offers the possibility of responses
generated from the underlying database on the fly, or pointer to
server (Section 6) going directly to a web service.
8. Security Considerations
Some approaches may make it somewhat easier to extract valid local
parts for a domain. The All Names Valid option makes name searches
unproductive.
The regular expression representation is difficult to reverse
engineer. With NSEC records it's possible to recover the DFA and in
principle to translate it back into a large regular expression, but
there's no efficient way to take the regular expression and extract a
useful set of distinct names. (It's easy to enumerate lots of
variants of the same name, which is not useful to spammers since a
blast of mail to the same recipient is typically shut down in moments
by bulk counters.)
All of the usual attacks against DNS servers are likely to occur.
The usual techniques for mitigating them should work. Many queries
will cache poorly, but probably no worse than rDNS or DNSBL queries
do now.
9. References
9.1. Normative References
[RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
October 2008.
[RFC6117] Hoeneisen, B., Mayrhofer, A., and J. Livingood, "IANA
Registration of Enumservices: Guide, Template, and IANA
Considerations", RFC 6117, March 2011.
[RFC6532] Yang, A., Steele, S., and N. Freed, "Internationalized
Email Headers", RFC 6532, February 2012.
[URIRR] Faltstrom, P. and O. Kolkman, "The Uniform Resource
Identifier (URI) DNS Resource Record", March 2015,
.
Levine Expires October 2, 2015 [Page 8]
Internet-Draft DNS mailbox March 2015
9.2. Informative References
[ASU86] Aho, A., Sethi, R., and J. Ullman, "Compilers: Principles,
Techniques, and Tools", 1986.
[LESK75] Lesk, M., "Lex--A Lexical Analyzer Generator", CSTR 39,
1975, .
Author's Address
John Levine
Taughannock Networks
PO Box 727
Trumansburg, NY 14886
Phone: +1 831 480 2300
Email: standards@taugh.com
URI: http://jl.ly
Levine Expires October 2, 2015 [Page 9]