INTERNET DRAFT                                             Melinda Shore
draft-shore-h323-firewalls-00.txt                                  Nokia
February 3, 2000
Expires: July 3, 2000


     H.323 and Firewalls: Problem Statement and Solution Framework


STATUS OF THIS MEMO

     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC2026.  Internet-Drafts are work-
     ing documents of the Internet Engineering Task Force (IETF), its
     areas, and its working groups.  Note that other groups may also
     distribute working documents as Internet Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other docu-
     ments at any time.  It is inappropriate to use Internet-Drafts as
     reference material or to cite them other than as "work in
     progress".

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.


ABSTRACT

This paper attempts to describe in detail the problems associated with
passing H.323 through firewalls and NAT devices, and discuss the appli-
cability of a range of technologies currently available to solve these
problems.  We conclude that the only general solution to the problem is
external application control of firewalls.

1.  INTRODUCTION

It is generally recognized throughout the IP telephony industry that the
standard signaling protocol, H.323, is difficult to operate through
firewalls.  Worse, it is nearly impossible to operate when one of the
entities involved in a call, whether it is a gatekeeper, a terminal, or
a gateway, has its IP address hidden through the use of network address
translation (NAT).  A few firewall vendors have built products which
perform stateful inspection of H.323 signaling streams and do address
rewriting, allowing successful interaction with NATs, but this solution
cannot work in a secure signaling environment.  In this paper we try to
provide some detail about why the problem is so difficult, describe some
available technologies and discuss their applicability, and try to pre-
sent a framework for addressing the problem.


                                                                [Page 1]

Internet Draft             H.323 and Firewalls             February 2000


2.  THE PROBLEM

2.1.  Basics

H.323 [1] is a description of how to use a family of protocols to per-
form call control for multimedia communication on packet networks.  The
most important protocols used to set up, manage, and tear down calls are
H.225 and H.245.  H.225 is used to perform call control, and H.245 is
used to perform call management.

In the most basic use of H.323v1 to set up a call, an endpoint initiates
an H.225 exchange on a TCP well-known port with another endpoint.  This
exchange uses ISDN Q.931 signaling.  Once a call has been established
using Q.931 procedures, the H.245 call management phase of the call is
begun.  H.245 negotiations take place on a separate channel from the one
used for H.225 call setup (although with the use of H.245 tunneling,
H.245 messages can be encapsulated in Q.931 messages on existing H.225
channels), and the H.245 channel is dynamically-allocated during the
H.225 phase.  The port number to be used for H.245 negotiation is not
known in advance.  The media channels (those used to transport voice and
video) are similarly dynamically-allocated, this time using the H.245
OpenLogicalChannel procedure.

The following table lists the kinds of data streams used in H.323 and
H.225, and whether they are allocated on a well-known port or on one
unknown in advance:

           Type of data stream     Well known or dynamic port

           Audio/RTP               Dynamic
           Audio/RTCP              Dynamic
           Video/RTP               Dynamic
           Video/RTCP              Dynamic
           Call Signalling         Well known or dynamic
           H.245                   Dynamic
           RAS                     Well known or dynamic

                                Table 1

Note that H.245 channels are unidirectional.  In a minimal situation
with direct call signaling between endpoints and the use of one bidirec-
tional voice channel, for each call there will be a minimum of five
channels (one H.225 channel, one H.245 channel, and one shared voice
channel).  Three of these will be on dynamically-allocated ports.

Because of the heavy use of dynamically-allocated ports, it is not pos-
sible to preconfigure firewalls to allow H.323-signaled traffic without
opening up large numbers of holes in the firewall.  Microsoft's web site
has a page [2] on configuring firewalls for use with NetMeeting, which
is H.323-based, and they recommend this: "To establish outbound NetMeet-
ing connections through a firewall, the firewall must be configured to
do the following:


                                                                [Page 2]

Internet Draft             H.323 and Firewalls             February 2000


+    Pass through primary TCP connections on ports 389, 522, 1503, 1720,
     and 1731.

+    Pass through secondary TCP and UDP connections on dynamically
     assigned ports (1024-65535)."

Needless to say, this represents a somewhat more lax firewall policy
than would be acceptable at many sites, and it does not address the
problem of receiving incoming calls.

One very popular mechanism used by firewalls to accommodate applications
in which port numbers are not known in advance is "stateful inspection."
In firewalls which use stateful inspection, knowledge of certain proto-
cols (such as H.323 or Sun RPC) is configured into the firewalls, and
they are able to examine traffic in order to be able to recognize when
new ports are being allocated in order to open "pinholes" in the fire-
wall, allowing traffic to pass.

The H.323 family of protocols is represented in ASN.1 notation, which is
compiled into a wire-line protocol using the ITU-T's Packed Encoding
Rules (PER).  PER is designed to optimize the use of bandwidth, but the
tradeoff is complexity -- for example, there are five different ways to
encode integer values, and "unconstrained" integer values (i.e. the
range of potential values is unlimited) are fit into the minimum number
of octets needed.  That is to say, some fields are variable in length.
The problem of locating desired information within a data stream is
aggravated by the use of optional fields, which may or may not appear at
all.

Another way to look at the problem is to consider the end-to-end nature
of IP.  Firewalls introduce a disruption in the end-to-end model at the
IP layer, much like a malfunctioning router.  However, the layered model
for IP and other networking protocols assumes that there is minimal or
no communication between the layers in an endpoint, and therefore no
mechanism for knowing that it is a firewall disrupting communication.

2.2.  Network address translation

"Network Address Translation is a method by which IP addresses are
mapped from one realm to another, in an attempt to provide transparent
routing to hosts. Traditionally, NAT devices are used to connect an iso-
lated address realm with private unregistered addresses to an external
realm with globally unique registered addresses." [3] NAT is generally
used for two purposes: 1) as a mechanism to work around the problem of
IPv4 address space depletion, and 2) for security purposes (to hide
hosts at an unroutable address).

NAT works by having a NAT device, often implemented as part of a fire-
wall application, rewrite IP headers as packets pass through the NAT.
The NAT maintains a table of mappings between IP addresses and port num-
bers.

The problem with NAT from an H.323 perspective is that H.225 and H.245
make heavy use of embedded IP addresses.  If NAT is being used,


                                                                [Page 3]

Internet Draft             H.323 and Firewalls             February 2000


addresses in the protocol stream will be the addresses in the private
address space (behind the NAT), rather than the address at which the
host has a public, routable interface.  For example, a host may have its
address in a private address space, 172.16.0.81 [4], which when travers-
ing a NAT is translated to 207.127.234.239.  When that host attempts to
place a call, the "calling party" information element in the H.225 sig-
naling stream will contain the private, non-routable address
(172.16.0.81), and attempts to make an H.225 connection back to that
address will fail.

2.3.  Encrypted signaling

Recognizing the need for secure (authenticated, confidential, non-
spoofable) signaling for IP telephony, the ITU-T ratified H.235 in 1998.
H.235 provides a framework for signaling security parameters, such as
encryption and authentication mechanisms, among H.323 entities.  H.235
allows the initial H.225 connection to be either encrypted or unen-
crypted.  During initial call setup, call participants may negotiate
among themselves whether other data streams, such as H.245 channels,
media channels, and so on, will be encrypted.

Any solution which relies on being able to inspect the contents of sig-
naling streams, such as firewalls which provide stateful inspection
capabilities, will fail if the signaling streams are encrypted.

2.4.  The combined problem

If we take a look at the NAT problem in conjunction with the problem of
the impossibility of deciphering encrypted signaling streams, we can see
that

+    NAT causes a mismatch between addresses in IP headers and addresses
     in signaling payloads

+    encrypting the signaling data prevents an H.323-aware NAT device
     from rewriting addresses in the signaling payloads, and

+    if the signaling data are unencrypted but authenticated using a
     MAC, rewriting the addresses as they cross a NAT will cause the
     authentication check upon receipt to fail.

That is to say, using the technologies available today (see below) if
signaling streams are encrypted the NAT problem is insoluble without the
modification of H.323 (see below).

3.  TECHNOLOGIES

A number of different firewall and firewall-related technologies are
available, and all provide potential solutions of varying applicability
to the problem posed by running H.323 through firewalls and address
translators.  While older technologies, such as simple packet filtering,
provide no mechanism for passing H.323 traffic, more sophisticated tech-
nologies are now available.  In the following sections we examine the
applicability of a variety of technologies, with particular attention


                                                                [Page 4]

Internet Draft             H.323 and Firewalls             February 2000


paid to their ability to function in the presence of either NAT or
encrypted signaling.

Table 2  summarizes the applicability of the various technologies to
unencrypted/untranslated H.323 signaling, encrypted signaling, and net-
work-translated hosts.

                                Cleartext signaling   Encrypted signaling    NAT

Simple packet filtering                 NO                    NO             NO
Stateful inspection                     YES                   NO            YES
Application proxy                       YES                   NO            MAYBE
Virtual Private Network (VPN)           YES                LIMITED          YES
Circuit proxy (SOCKS)                   YES                   NO            YES
Firewall control interface              YES                   YES           MAYBE

                                Table 2


3.1.  Simple packet filtering

This is the original, and simplest, form of firewalling.  A packet fil-
ter will examine all traffic traversing it and will pass that traffic or
discard it based on rules, configured by the systems administrator.  For
example, an administrator may decide that a given host will accept only
incoming connections destined for the SMTP port and will reject all oth-
ers.  This is implemented in the firewall by examining the IP header on
each packet.  If the packet is destined for that particular host and the
protocol type is tcp, the TCP header is then examined to see if the TCP
port is 25.  If so, the packet is relayed to its destination, if not, it
is dropped.

Problem: Simple packet filters cannot accommodate protocols in which new
ports (streams) are allocated during a protocol session.  H.323 will not
work with a simple packet filtering firewall.

3.2.  Stateful inspection

Stateful inspection is a more sophisticated form of packet filtering in
which the packet payload is examined for more detailed information which
would indicate whether or not the packet is acceptable.  Continuing from
our previous example, a systems administrator would be able to install a
rule specifying that any email passing to a particular host containing a
particular text string (say, offensive language) or a certain MIME type
(say, executable files) will not be permitted through.

Firewalls which use stateful inspection may be able to parse H.323 sig-
naling streams and use the contents of those streams to recognize the
creation of H.245 control channels and media channels in order to open
pinholes.  Particularly sophisticated firewalls which also do NAT may be
able to rewrite addresses in H.225 and H.245 streams, allowing H.323 to
be used successfully through both firewalls and NAT devices.  Check
Point's Firewall-1 is an example of a firewall with this capability.


                                                                [Page 5]

Internet Draft             H.323 and Firewalls             February 2000


Problem: It is not possible to inspect the content of encrypted signal-
ing streams, and it is not possible to alter the contents of messages
which have been authenticated for end-to-end delivery.

3.3.  Application proxying

An application proxy is an instance of the application (in this case, an
H.323 entity such as a gatekeeper or gateway) which runs on a trusted
host and acts as a relay between external, untrusted entities and inter-
nal ones.  Signaling and media circuits terminate on the proxy, which
means that the addresses in the IP headers are those of the host on
which the proxy is running.

Problem: When NAT is used, whether or not the proxy has knowledge of and
access to the private space depends on where the proxy is located.  If
it is located on the public side of the firewall, it sees the trans-
lated-to address in the IP headers and the translated-from address in
the signaling stream.  One might think that this would afford it the
possibility to do address rewriting in the signaling data, but it has no
way of knowing in advance to what address/port combination the NAT will
map the new streams (H.245, media) as they are created, nor does it have
read or write access to the NAT table in the firewall.  If the proxy is
located on the private side of the firewall, it sees only the private
addresses in both the IP headers and in the signaling stream, and does
not have sufficient information to be able to do address rewriting.  If
the proxy is integrated into the firewall, however, it has knowledge of
both public and private address spaces as well as access to the NAT
table.

While passing encrypted media streams would probably not be difficult
for an application proxy, since it would not be examining the contents
of media streams, end-to-end encryption signaling remains a significant
problem.  Also, application proxying is known to perform poorly, in
terms of processor consumption and packet rates.

3.4.  Virtual private networks

A VPN is basically just a secure connection between entities over an
insecure medium.  This is generally accomplished through the use of
encryption (traffic management may or may not be available, as well, but
is outside the scope of this paper).  The encryption may take place
between hosts, between firewalls or routers, or between some combination
of hosts, firewalls, and routers.

As this suggests, each participant in a VPN must be running encryption
software compatible with the software being run by the other partici-
pants, and each participant must be configured and authenticated to par-
ticipate in any given VPN.  This means that all participants must be
known in advance.  If the participant is a router or firewall, rather
than an endpoint, communications remain unsecured from the host to the
router/firewall unless additional encryption is used end-to-end, rein-
troducing the problem of reading encrypted signaling streams.  Encryp-
tion between hosts and firewalls is certainly a possibility, but
encrypting at the host, decrypting and re-encrypting at the firewalls,


                                                                [Page 6]

Internet Draft             H.323 and Firewalls             February 2000


and decrypting at the opposite end can introduce tremendous latencies.

VPNs are well-suited to toll bypass applications in which all of the
gateways which might be called are known in advance (firewall to fire-
wall communication), or to enterprise environments in which endpoints
can be guaranteed to be running particular software.  The day when IPSec
can be universally assumed to be available is still far away.

Problem: As described above, all participants in the call must be run-
ning compatible VPN software, or the firewalls must be.  Furthermore, a
VPN is an encryption/decryption process, so, for example, if an IP phone
is placing a call over the public internet to a gateway on a remote net-
work, either the phone must encrypt at its end or it must be behind a
firewall which can participate in the VPN, and the gateway must decrypt
at its end or it must be behind a firewall which can participate in the
VPN.  Satisfying this constraint may not be feasible in all circum-
stances.

Another difficulty is that it is generally recommended that VPNs be run
in conjunction with some sort of packet-filtering (stateful inspection
or otherwise) firewall, which reintroduces the H.323 and firewalls prob-
lem.

3.5.  Circuit proxies

A circuit proxy is much like an application proxy -- the difference is
that instead of putting application logic on the proxy, it remains in
the endpoint or host.  The host requests that specific address/port com-
binations be proxied for it.  The most widely-used circuit-proxying pro-
tocol is SOCKS, a product of the IETF's Authenticated Firewall Traversal
working group.  SOCKS vendors often provide a SOCKS .dll for Windows
systems or SOCKS daemon for Unix systems.  These intercept network sys-
tem calls and request that the streams being created be proxied on a
SOCKS server.  Because the application logic resides on the host, there
is no need to inspect signaling streams to check for the creation of new
information flows.  This means that encrypted signaling streams are sim-
ply not an issue.

Problem: SOCKS is generally implemented as a stand-alone server rather
than as a firewall.  As such, it has no access to a firewall's NAT
table, and NAT continues to be a problem.  Also, SOCKS libraries and
daemons work without application modification under limited circum-
stances, and for complex server applications, some code modification
would almost certainly be necessary.  Another problem is that each end-
point would need to establish a trust relationship with a SOCKS proxy
server, which introduces obvious management overhead.  And, of course,
the problem of dealing with end-to-end encryption and/or authentication
remains.

3.6.  RSIP

RSIP [5] is a mechanism allowing an IP endpoint in one address space to
"borrow" an IP address from another address space, allowing for the
integrity of end-to-end addressing.  Because it is implemented by


                                                                [Page 7]

Internet Draft             H.323 and Firewalls             February 2000


installing a virtual network interface on a client, it effectively makes
that client multi-homed.  H.323 requires that endpoints be able to embed
their own IP address in signaling packets, which means that multi-homed
hosts must be able to determine which address among several is the one
to use.

3.7.  Firewall control protocols

A firewall control protocol is similar to a circuit proxy, in that
application logic remains in the endsystems and requests are made over a
secure channel to the firewall to open and close pinholes and to manipu-
late or read NAT table entries.  This allows the use of both encrypted
signaling traffic and address translated endpoints.

Problem: There is no such thing.  Storage Technology proposed a firewall
control interface in a now-expired internet draft, but the work has been
dropped.

4.  DISCUSSION

4.1.  Current State

Service providers and enterprise network managers consider the ability
to place components of their H.323 systems behind firewalls to be very
high priority.  Firewalls provide a measure of host security beyond what
can be engineered into individual applications, and for those who build
a business around providing internet telephony services, they provide
increased protection against theft of services and denial-of-service
attacks.  H.323, however is an extremely firewall-hostile protocol.

Firewall vendors are aware of this problem, and some of them are working
hard on solving it within the framework of their existing products.
Approaches to firewalling vary widely, as described above, and companies
which produce particular kinds of products have a vested interest in
continuing with their existing strategy.  Some firewall vendors, for
example, believe quite strongly that applications should have no aware-
ness of firewalls in their network paths, and appear to be inflexible in
their adherence to a stateful inspection model.

VoIP vendors have been slow to produce H.235 implementations, but IP
telephony service providers are increasingly demanding H.235-based secu-
rity features, particularly the use of encrypted and digitally-signed
signaling messages.  This increases the pressure to find some solution
to the problem other than ones which require the ability to read, parse,
and possibly modify H.323 signaling messages.

The widespread use of devices which break the end-to-end model are caus-
ing the question of the viability of that model to come under investiga-
tion [6].  H.323 may be the most egregious and/or visible example of a
protocol which violates network layering in its use of transport
addresses and which uses a third party to control communications between
two other parties, but it is not the only one.  In response to this kind
of problem, there have recently been proposals suggesting that certain
types of network devices should be made actively visible [7], as well as


                                                                [Page 8]

Internet Draft             H.323 and Firewalls             February 2000


proposed protocols for controlling network elements from application
servers [8].

4.2.  Standards

There has recently been a flurry of activity around firewalls and H.323
in various IP telephony standards bodies.  Most of this activity has
been around identifying that there is a problem, with relatively little
being done to solve it.

An exception to this is a proposed change to H.225 to help H.323 func-
tion through NAT devices, almost certainly to be included in H.225v4
(scheduled for decision in February 2000).  This change requires the use
of H.245 tunneling and requires that RTP and RTCP streams be sent on the
same ports on which they expect to receive the corresponding stream.  It
imposes constraints on the network architecture and does not solve prob-
lems associated with common requirements, such as the need for endpoints
behind a NAT to receive incoming calls from outside the NAT and the need
to be served by a gatekeeper in a different address space.

5.  CONCLUSION

Firewalls are turning out to be a significant impediment to the provi-
sion of commercial VoIP services -- not many providers are willing to
compromise the security of their networks by allowing unfiltered traffic
through.  The approaches which have been used with varying success to
date will not work at all when signaling channels are secured end-to-
end.  A more comprehensive approach is needed -- either the firewall
needs to be aware of the application or the application needs to be
aware of the firewall, and the former is not possible if signaling is
encrypted.  We believe that some sort of firewall and/or NAT control
protocol is necessary to solve this problem.

6.  REFERENCES

[1] ITU-T Recommendation H.323.  "Packet-based Multimedia Communications
     Systems," 1998.

[2] Microsoft Corporation, "Firewall Configuration."
     http://www.microsoft.com/Windows/NetMeeting/Corp/ResKit/Chap-
     ter4/default.asp

[3] Srisuresh, P. and Matt Holdredge, "IP Network Address Translator
     (NAT) Terminology and Considerations."  Internet draft draft-ietf-
     nat-terminology-03.txt, June 1999.

[4] Rechter, Y. et al., "Address Allocation for Private Internets." RFC
     1918, February 1996.

[5] Borella, M. et al., "Realm Specific IP: Framework."  Internet draft
     draft-ietf-nat-rsip-framework-03.txt, December 1999.

[6] Carpenter, Brian, "Internet Transparency."  Internet draft draft-
     draft-carpenter-transparency-05.txt, December 1999.  [7] Lear,


                                                                [Page 9]

Internet Draft             H.323 and Firewalls             February 2000


     Eliot, "NAT and other Network "Intelligence": Clearing Architec-
     tural Haze through the use of Fog Lamps."  Internet draft draft-
     lear-foglamps-01.txt, December 1999.

[8] Cerpa, A. et al., "NECP: The Network Element Control Protocol."
     Internet draft draft-cerpa-necp-00.txt, November 1999.

7.  Author's Address

Melinda Shore
Nokia IP Telephony
127 West State Street
Ithaca, NY  14850
USA
Phone: +1 607 273 0724 x81
Fax: +1 607 275 3610
Email:  melinda.shore@nokia.com


                                                               [Page 10]