INTERNET DRAFT Melinda Shore draft-shore-h323-firewalls-00.txt Nokia February 3, 2000 Expires: July 3, 2000 H.323 and Firewalls: Problem Statement and Solution Framework STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are work- ing documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other docu- ments at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. ABSTRACT This paper attempts to describe in detail the problems associated with passing H.323 through firewalls and NAT devices, and discuss the appli- cability of a range of technologies currently available to solve these problems. We conclude that the only general solution to the problem is external application control of firewalls. 1. INTRODUCTION It is generally recognized throughout the IP telephony industry that the standard signaling protocol, H.323, is difficult to operate through firewalls. Worse, it is nearly impossible to operate when one of the entities involved in a call, whether it is a gatekeeper, a terminal, or a gateway, has its IP address hidden through the use of network address translation (NAT). A few firewall vendors have built products which perform stateful inspection of H.323 signaling streams and do address rewriting, allowing successful interaction with NATs, but this solution cannot work in a secure signaling environment. In this paper we try to provide some detail about why the problem is so difficult, describe some available technologies and discuss their applicability, and try to pre- sent a framework for addressing the problem. [Page 1] Internet Draft H.323 and Firewalls February 2000 2. THE PROBLEM 2.1. Basics H.323 [1] is a description of how to use a family of protocols to per- form call control for multimedia communication on packet networks. The most important protocols used to set up, manage, and tear down calls are H.225 and H.245. H.225 is used to perform call control, and H.245 is used to perform call management. In the most basic use of H.323v1 to set up a call, an endpoint initiates an H.225 exchange on a TCP well-known port with another endpoint. This exchange uses ISDN Q.931 signaling. Once a call has been established using Q.931 procedures, the H.245 call management phase of the call is begun. H.245 negotiations take place on a separate channel from the one used for H.225 call setup (although with the use of H.245 tunneling, H.245 messages can be encapsulated in Q.931 messages on existing H.225 channels), and the H.245 channel is dynamically-allocated during the H.225 phase. The port number to be used for H.245 negotiation is not known in advance. The media channels (those used to transport voice and video) are similarly dynamically-allocated, this time using the H.245 OpenLogicalChannel procedure. The following table lists the kinds of data streams used in H.323 and H.225, and whether they are allocated on a well-known port or on one unknown in advance: Type of data stream Well known or dynamic port Audio/RTP Dynamic Audio/RTCP Dynamic Video/RTP Dynamic Video/RTCP Dynamic Call Signalling Well known or dynamic H.245 Dynamic RAS Well known or dynamic Table 1 Note that H.245 channels are unidirectional. In a minimal situation with direct call signaling between endpoints and the use of one bidirec- tional voice channel, for each call there will be a minimum of five channels (one H.225 channel, one H.245 channel, and one shared voice channel). Three of these will be on dynamically-allocated ports. Because of the heavy use of dynamically-allocated ports, it is not pos- sible to preconfigure firewalls to allow H.323-signaled traffic without opening up large numbers of holes in the firewall. Microsoft's web site has a page [2] on configuring firewalls for use with NetMeeting, which is H.323-based, and they recommend this: "To establish outbound NetMeet- ing connections through a firewall, the firewall must be configured to do the following: [Page 2] Internet Draft H.323 and Firewalls February 2000 + Pass through primary TCP connections on ports 389, 522, 1503, 1720, and 1731. + Pass through secondary TCP and UDP connections on dynamically assigned ports (1024-65535)." Needless to say, this represents a somewhat more lax firewall policy than would be acceptable at many sites, and it does not address the problem of receiving incoming calls. One very popular mechanism used by firewalls to accommodate applications in which port numbers are not known in advance is "stateful inspection." In firewalls which use stateful inspection, knowledge of certain proto- cols (such as H.323 or Sun RPC) is configured into the firewalls, and they are able to examine traffic in order to be able to recognize when new ports are being allocated in order to open "pinholes" in the fire- wall, allowing traffic to pass. The H.323 family of protocols is represented in ASN.1 notation, which is compiled into a wire-line protocol using the ITU-T's Packed Encoding Rules (PER). PER is designed to optimize the use of bandwidth, but the tradeoff is complexity -- for example, there are five different ways to encode integer values, and "unconstrained" integer values (i.e. the range of potential values is unlimited) are fit into the minimum number of octets needed. That is to say, some fields are variable in length. The problem of locating desired information within a data stream is aggravated by the use of optional fields, which may or may not appear at all. Another way to look at the problem is to consider the end-to-end nature of IP. Firewalls introduce a disruption in the end-to-end model at the IP layer, much like a malfunctioning router. However, the layered model for IP and other networking protocols assumes that there is minimal or no communication between the layers in an endpoint, and therefore no mechanism for knowing that it is a firewall disrupting communication. 2.2. Network address translation "Network Address Translation is a method by which IP addresses are mapped from one realm to another, in an attempt to provide transparent routing to hosts. Traditionally, NAT devices are used to connect an iso- lated address realm with private unregistered addresses to an external realm with globally unique registered addresses." [3] NAT is generally used for two purposes: 1) as a mechanism to work around the problem of IPv4 address space depletion, and 2) for security purposes (to hide hosts at an unroutable address). NAT works by having a NAT device, often implemented as part of a fire- wall application, rewrite IP headers as packets pass through the NAT. The NAT maintains a table of mappings between IP addresses and port num- bers. The problem with NAT from an H.323 perspective is that H.225 and H.245 make heavy use of embedded IP addresses. If NAT is being used, [Page 3] Internet Draft H.323 and Firewalls February 2000 addresses in the protocol stream will be the addresses in the private address space (behind the NAT), rather than the address at which the host has a public, routable interface. For example, a host may have its address in a private address space, 172.16.0.81 [4], which when travers- ing a NAT is translated to 207.127.234.239. When that host attempts to place a call, the "calling party" information element in the H.225 sig- naling stream will contain the private, non-routable address (172.16.0.81), and attempts to make an H.225 connection back to that address will fail. 2.3. Encrypted signaling Recognizing the need for secure (authenticated, confidential, non- spoofable) signaling for IP telephony, the ITU-T ratified H.235 in 1998. H.235 provides a framework for signaling security parameters, such as encryption and authentication mechanisms, among H.323 entities. H.235 allows the initial H.225 connection to be either encrypted or unen- crypted. During initial call setup, call participants may negotiate among themselves whether other data streams, such as H.245 channels, media channels, and so on, will be encrypted. Any solution which relies on being able to inspect the contents of sig- naling streams, such as firewalls which provide stateful inspection capabilities, will fail if the signaling streams are encrypted. 2.4. The combined problem If we take a look at the NAT problem in conjunction with the problem of the impossibility of deciphering encrypted signaling streams, we can see that + NAT causes a mismatch between addresses in IP headers and addresses in signaling payloads + encrypting the signaling data prevents an H.323-aware NAT device from rewriting addresses in the signaling payloads, and + if the signaling data are unencrypted but authenticated using a MAC, rewriting the addresses as they cross a NAT will cause the authentication check upon receipt to fail. That is to say, using the technologies available today (see below) if signaling streams are encrypted the NAT problem is insoluble without the modification of H.323 (see below). 3. TECHNOLOGIES A number of different firewall and firewall-related technologies are available, and all provide potential solutions of varying applicability to the problem posed by running H.323 through firewalls and address translators. While older technologies, such as simple packet filtering, provide no mechanism for passing H.323 traffic, more sophisticated tech- nologies are now available. In the following sections we examine the applicability of a variety of technologies, with particular attention [Page 4] Internet Draft H.323 and Firewalls February 2000 paid to their ability to function in the presence of either NAT or encrypted signaling. Table 2 summarizes the applicability of the various technologies to unencrypted/untranslated H.323 signaling, encrypted signaling, and net- work-translated hosts. Cleartext signaling Encrypted signaling NAT Simple packet filtering NO NO NO Stateful inspection YES NO YES Application proxy YES NO MAYBE Virtual Private Network (VPN) YES LIMITED YES Circuit proxy (SOCKS) YES NO YES Firewall control interface YES YES MAYBE Table 2 3.1. Simple packet filtering This is the original, and simplest, form of firewalling. A packet fil- ter will examine all traffic traversing it and will pass that traffic or discard it based on rules, configured by the systems administrator. For example, an administrator may decide that a given host will accept only incoming connections destined for the SMTP port and will reject all oth- ers. This is implemented in the firewall by examining the IP header on each packet. If the packet is destined for that particular host and the protocol type is tcp, the TCP header is then examined to see if the TCP port is 25. If so, the packet is relayed to its destination, if not, it is dropped. Problem: Simple packet filters cannot accommodate protocols in which new ports (streams) are allocated during a protocol session. H.323 will not work with a simple packet filtering firewall. 3.2. Stateful inspection Stateful inspection is a more sophisticated form of packet filtering in which the packet payload is examined for more detailed information which would indicate whether or not the packet is acceptable. Continuing from our previous example, a systems administrator would be able to install a rule specifying that any email passing to a particular host containing a particular text string (say, offensive language) or a certain MIME type (say, executable files) will not be permitted through. Firewalls which use stateful inspection may be able to parse H.323 sig- naling streams and use the contents of those streams to recognize the creation of H.245 control channels and media channels in order to open pinholes. Particularly sophisticated firewalls which also do NAT may be able to rewrite addresses in H.225 and H.245 streams, allowing H.323 to be used successfully through both firewalls and NAT devices. Check Point's Firewall-1 is an example of a firewall with this capability. [Page 5] Internet Draft H.323 and Firewalls February 2000 Problem: It is not possible to inspect the content of encrypted signal- ing streams, and it is not possible to alter the contents of messages which have been authenticated for end-to-end delivery. 3.3. Application proxying An application proxy is an instance of the application (in this case, an H.323 entity such as a gatekeeper or gateway) which runs on a trusted host and acts as a relay between external, untrusted entities and inter- nal ones. Signaling and media circuits terminate on the proxy, which means that the addresses in the IP headers are those of the host on which the proxy is running. Problem: When NAT is used, whether or not the proxy has knowledge of and access to the private space depends on where the proxy is located. If it is located on the public side of the firewall, it sees the trans- lated-to address in the IP headers and the translated-from address in the signaling stream. One might think that this would afford it the possibility to do address rewriting in the signaling data, but it has no way of knowing in advance to what address/port combination the NAT will map the new streams (H.245, media) as they are created, nor does it have read or write access to the NAT table in the firewall. If the proxy is located on the private side of the firewall, it sees only the private addresses in both the IP headers and in the signaling stream, and does not have sufficient information to be able to do address rewriting. If the proxy is integrated into the firewall, however, it has knowledge of both public and private address spaces as well as access to the NAT table. While passing encrypted media streams would probably not be difficult for an application proxy, since it would not be examining the contents of media streams, end-to-end encryption signaling remains a significant problem. Also, application proxying is known to perform poorly, in terms of processor consumption and packet rates. 3.4. Virtual private networks A VPN is basically just a secure connection between entities over an insecure medium. This is generally accomplished through the use of encryption (traffic management may or may not be available, as well, but is outside the scope of this paper). The encryption may take place between hosts, between firewalls or routers, or between some combination of hosts, firewalls, and routers. As this suggests, each participant in a VPN must be running encryption software compatible with the software being run by the other partici- pants, and each participant must be configured and authenticated to par- ticipate in any given VPN. This means that all participants must be known in advance. If the participant is a router or firewall, rather than an endpoint, communications remain unsecured from the host to the router/firewall unless additional encryption is used end-to-end, rein- troducing the problem of reading encrypted signaling streams. Encryp- tion between hosts and firewalls is certainly a possibility, but encrypting at the host, decrypting and re-encrypting at the firewalls, [Page 6] Internet Draft H.323 and Firewalls February 2000 and decrypting at the opposite end can introduce tremendous latencies. VPNs are well-suited to toll bypass applications in which all of the gateways which might be called are known in advance (firewall to fire- wall communication), or to enterprise environments in which endpoints can be guaranteed to be running particular software. The day when IPSec can be universally assumed to be available is still far away. Problem: As described above, all participants in the call must be run- ning compatible VPN software, or the firewalls must be. Furthermore, a VPN is an encryption/decryption process, so, for example, if an IP phone is placing a call over the public internet to a gateway on a remote net- work, either the phone must encrypt at its end or it must be behind a firewall which can participate in the VPN, and the gateway must decrypt at its end or it must be behind a firewall which can participate in the VPN. Satisfying this constraint may not be feasible in all circum- stances. Another difficulty is that it is generally recommended that VPNs be run in conjunction with some sort of packet-filtering (stateful inspection or otherwise) firewall, which reintroduces the H.323 and firewalls prob- lem. 3.5. Circuit proxies A circuit proxy is much like an application proxy -- the difference is that instead of putting application logic on the proxy, it remains in the endpoint or host. The host requests that specific address/port com- binations be proxied for it. The most widely-used circuit-proxying pro- tocol is SOCKS, a product of the IETF's Authenticated Firewall Traversal working group. SOCKS vendors often provide a SOCKS .dll for Windows systems or SOCKS daemon for Unix systems. These intercept network sys- tem calls and request that the streams being created be proxied on a SOCKS server. Because the application logic resides on the host, there is no need to inspect signaling streams to check for the creation of new information flows. This means that encrypted signaling streams are sim- ply not an issue. Problem: SOCKS is generally implemented as a stand-alone server rather than as a firewall. As such, it has no access to a firewall's NAT table, and NAT continues to be a problem. Also, SOCKS libraries and daemons work without application modification under limited circum- stances, and for complex server applications, some code modification would almost certainly be necessary. Another problem is that each end- point would need to establish a trust relationship with a SOCKS proxy server, which introduces obvious management overhead. And, of course, the problem of dealing with end-to-end encryption and/or authentication remains. 3.6. RSIP RSIP [5] is a mechanism allowing an IP endpoint in one address space to "borrow" an IP address from another address space, allowing for the integrity of end-to-end addressing. Because it is implemented by [Page 7] Internet Draft H.323 and Firewalls February 2000 installing a virtual network interface on a client, it effectively makes that client multi-homed. H.323 requires that endpoints be able to embed their own IP address in signaling packets, which means that multi-homed hosts must be able to determine which address among several is the one to use. 3.7. Firewall control protocols A firewall control protocol is similar to a circuit proxy, in that application logic remains in the endsystems and requests are made over a secure channel to the firewall to open and close pinholes and to manipu- late or read NAT table entries. This allows the use of both encrypted signaling traffic and address translated endpoints. Problem: There is no such thing. Storage Technology proposed a firewall control interface in a now-expired internet draft, but the work has been dropped. 4. DISCUSSION 4.1. Current State Service providers and enterprise network managers consider the ability to place components of their H.323 systems behind firewalls to be very high priority. Firewalls provide a measure of host security beyond what can be engineered into individual applications, and for those who build a business around providing internet telephony services, they provide increased protection against theft of services and denial-of-service attacks. H.323, however is an extremely firewall-hostile protocol. Firewall vendors are aware of this problem, and some of them are working hard on solving it within the framework of their existing products. Approaches to firewalling vary widely, as described above, and companies which produce particular kinds of products have a vested interest in continuing with their existing strategy. Some firewall vendors, for example, believe quite strongly that applications should have no aware- ness of firewalls in their network paths, and appear to be inflexible in their adherence to a stateful inspection model. VoIP vendors have been slow to produce H.235 implementations, but IP telephony service providers are increasingly demanding H.235-based secu- rity features, particularly the use of encrypted and digitally-signed signaling messages. This increases the pressure to find some solution to the problem other than ones which require the ability to read, parse, and possibly modify H.323 signaling messages. The widespread use of devices which break the end-to-end model are caus- ing the question of the viability of that model to come under investiga- tion [6]. H.323 may be the most egregious and/or visible example of a protocol which violates network layering in its use of transport addresses and which uses a third party to control communications between two other parties, but it is not the only one. In response to this kind of problem, there have recently been proposals suggesting that certain types of network devices should be made actively visible [7], as well as [Page 8] Internet Draft H.323 and Firewalls February 2000 proposed protocols for controlling network elements from application servers [8]. 4.2. Standards There has recently been a flurry of activity around firewalls and H.323 in various IP telephony standards bodies. Most of this activity has been around identifying that there is a problem, with relatively little being done to solve it. An exception to this is a proposed change to H.225 to help H.323 func- tion through NAT devices, almost certainly to be included in H.225v4 (scheduled for decision in February 2000). This change requires the use of H.245 tunneling and requires that RTP and RTCP streams be sent on the same ports on which they expect to receive the corresponding stream. It imposes constraints on the network architecture and does not solve prob- lems associated with common requirements, such as the need for endpoints behind a NAT to receive incoming calls from outside the NAT and the need to be served by a gatekeeper in a different address space. 5. CONCLUSION Firewalls are turning out to be a significant impediment to the provi- sion of commercial VoIP services -- not many providers are willing to compromise the security of their networks by allowing unfiltered traffic through. The approaches which have been used with varying success to date will not work at all when signaling channels are secured end-to- end. A more comprehensive approach is needed -- either the firewall needs to be aware of the application or the application needs to be aware of the firewall, and the former is not possible if signaling is encrypted. We believe that some sort of firewall and/or NAT control protocol is necessary to solve this problem. 6. REFERENCES [1] ITU-T Recommendation H.323. "Packet-based Multimedia Communications Systems," 1998. [2] Microsoft Corporation, "Firewall Configuration." http://www.microsoft.com/Windows/NetMeeting/Corp/ResKit/Chap- ter4/default.asp [3] Srisuresh, P. and Matt Holdredge, "IP Network Address Translator (NAT) Terminology and Considerations." Internet draft draft-ietf- nat-terminology-03.txt, June 1999. [4] Rechter, Y. et al., "Address Allocation for Private Internets." RFC 1918, February 1996. [5] Borella, M. et al., "Realm Specific IP: Framework." Internet draft draft-ietf-nat-rsip-framework-03.txt, December 1999. [6] Carpenter, Brian, "Internet Transparency." Internet draft draft- draft-carpenter-transparency-05.txt, December 1999. [7] Lear, [Page 9] Internet Draft H.323 and Firewalls February 2000 Eliot, "NAT and other Network "Intelligence": Clearing Architec- tural Haze through the use of Fog Lamps." Internet draft draft- lear-foglamps-01.txt, December 1999. [8] Cerpa, A. et al., "NECP: The Network Element Control Protocol." Internet draft draft-cerpa-necp-00.txt, November 1999. 7. Author's Address Melinda Shore Nokia IP Telephony 127 West State Street Ithaca, NY 14850 USA Phone: +1 607 273 0724 x81 Fax: +1 607 275 3610 Email: melinda.shore@nokia.com [Page 10]