Internet Engineering Task Force MidCom WG Internet Draft P. Cordell draft-cordell-midcom-span-discuss-00.txt Ridgeway Systems & Software 29 August, 2002 Expires: 29 February, 2003 SPAN Discussion Issues STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as work in progress. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document collects points of discussion surrounding the pre-midcom SPAN deliverable (SPAN = Simple Protocol for Augmenting NATs). As far as possible it is intended to act as a collation point for facts surrounding the SPAN deliverable. Where a discussion item is not black and white it attempts to collate opinion from all angles as far as the author is able to without bias. It does not draw conclusions of any sort. 1. Introduction This document collects points of discussion surrounding the pre-midcom SPAN deliverable. As far as possible it is intended to act as a collation point for facts surrounding the SPAN deliverable. Cordell [Page 1] Internet Draft SPAN Discussion August 2002 Where a discussion item is not black and white it attempts to collate opinion from all angles as far as the author is able to without bias. It does not draw conclusions of any sort. It is intended that other designers will contribute to the knowledge base presented by this document without compromising its goal of being as impartial as possible. It is suggested that the designers should present their personal conclusions on the various topics raised in this document in separate documents. Hopefully a number of these documents will be collaborative efforts! This document is very much of the form of a brain dump. As it is not expected to go beyond being an interim Internet Draft, only minimal effort has been made to make it more presentable. Its main purpose is to capture design issues and it is more akin to a set of meeting minutes than a formal design document. Note: This document was originally drafted during the early stages of the pre-midcom design team. Design decisions in SPAN-A have been made as a result of drawing conclusions from the issues raised in this document. 2. Definitions Client: A device in the Inner Network running the client side of the SPAN protocol. Also known as a SPAN client. In some cases the client may be run on a proxy and so may not run on the same device as the client for the protocol that is being traversed. Connection: A data path between two devices identified by source address and port and destination address and port. For the purposes of this document, a connection may be TCP or UDP, even though UDP is connectionless. Inner Address: An address valid for the Inner Network. Inner Network: A network that is separated from an Outer Network such that the addresses in the inner network need to be mapped to unique addresses in the outer network using a NAT function before connections can be made across the outer network. NAT Function: A device for connecting the address spaces of an Inner Network and an Outer Network. A number of addresses from the outer network are assigned to represent connections made from the inner network. Typically the number of addresses allocated from the outer network address space will be smaller than the actual number of addresses in use in the inner network and so a dynamic mapping between inner addresses and outer addresses needs to be made. Cordell [Page 2] Internet Draft SPAN Discussion August 2002 NAT Address: The address used to represent an Inner Address on the Outer Network. Outer Network: A network that connects multiple Inner Networks. Note that it is possible for an Outer Network to also be an Inner Network at some other level. Thus Outer Networks can be nested. Outer Address: An address valid for the Outer Network. Relay: A device in a shared network that will accept incoming packets and forward them to a client in a 'private' network. TCP Connection: A connection that uses TCP. UDP Connection: A connection that uses UDP. (Note that because UDP is connectionless this is really more of an association rather than a true connection.) 3. Background Information 3.1. Types of NAT/NAPT [STUN] lists a number of different types of NAT/NAPT that are in deployment. These are (with paraphrased descriptions): Full Cone: New connections using the same inner address are mapped to the same NAT address. Return packets allowed from anywhere. Restricted Cone: New connections using the same inner address are mapped to the same NAT address. Return packets only allowed from addresses that have already been sent to. Port Restricted Cone: New connections using the same inner address are mapped to the same NAT address. Return packets only allowed from addresses and ports that have already been sent to. Symmetric: Each source address/port and destination address/port combination is given a separate NAT address. Additionally, there is 1-to-1 NAT, which maps a separate NAT address to each private address in use. {N.B. I'm assuming that 1-to-1 NAT does Cone. Is this true?} STUN can be used for UDP with 1-to-1 and full cone types. With variations to the protocols that use it, STUN can also be used with various restricted cone types. SPAN is intended to be used with all types, but is required for the symmetric type. SPAN is also intended to address inbound TCP. The residential market currently use 1-to-1 NATs, but this area is Cordell [Page 3] Internet Draft SPAN Discussion August 2002 migrating to more use of Port Restricted Cone. The enterprise segment is predominantly Port Restricted Cone [Mahadev]. {What types of NAT are used in the airport and NAT scenarios?} 4. Firewall Security Policy It is important for SPAN not to compromise site security policy. This is not only with regard to the letter of the law (i.e. the firewall rules), but also the spirit of the law (the intent that led to a particular set of rules being written). In this respect there are a number of different types of inbound connection that SPAN could potentially allow, the two main types being TCP and UDP. 5. UDP Issues Incoming UDP can be handled in two ways: UDP permanent cone: A SPAN relay could open a port that, throughout the lifetime of the relay connection, allows packets from any location to be relayed to the client. UDP collapsing cone: The UDP port on the relay starts as a cone, but will collapse down after receiving the first packet so that it will only forward packets received from the initial source. All other received packets are discarded. Both methods begin as a cone because the source of packets is typically not known a-priori. The permanent cone is attractive to allow multiple incoming call attempts. The permanent cone is also required to allow for remote RTCP sender reports and receiver reports to come from different locations on the remote client (even though this may not happen often). A UDP permanent cone will allow attacker packets to get to a client in addition to packets from the genuine remote party. This may allow the attacker to compromise the client. Further, if the NAT is intentionally a symmetric NAT, the presence of the permanent cone relay will undermine what little security the NAT offers. On the other hand, a collapsing cone makes it easy for an attacker to launch a denial of service attack based on stealing the service. And once an attacker has access to the path, they may carry out any attack that they might carry out over a non-collapsing cone. For both forms it is necessary to decide whether the address that data is received from needs to be identified to the client. Such Cordell [Page 4] Internet Draft SPAN Discussion August 2002 information would allow the relaying of protocols that require sending replies back to the original source address, and would allow a client know that it is receiving packets from multiple sources which may suggest some sort of attack. There may be a case for hiding this information from the client in the interests of attempting to conform more closely to the assumed site security policy. In the case of the permanent cone, if address hiding is thought desirable, it will be possible for the relay to identify the source of the packets to the client using logical identifiers. This will allow the client to tell which session the received packets relate to, but will not give it detailed information about the actual source. This implies some additional multiplexing in the channel (and additional work if the component was embedded in a NAT), which might not be desirable. The use of logical identifiers instead of real source addresses can be seen as a violation of the end-to-end principle. Going further, not including any indication of source address (real or otherwise) is an even bigger violation of the principle, and it may follow that relays MUST include the source address in all cases. It is not clear at this time whether only one of the types will be used, or the client will be able to tell the server what behaviour is required. Almost by definition, STUN does not have to consider firewall security as it is only useable in the absence of a firewall. (i.e. STUN only works with cone NATs, but a firewall will typically make the NAT appear like a symmetric NAT.) 6. TCP Issues The handling of in-bound TCP comes down to the characteristics of the TCP listeners on the relay. There are two options: (1) One-shot listener - A listener is posted on the relay that is terminated as soon as the first incoming connection is received. (2) Persistent listener - The listener remains active and able to receive additional incoming connections after the first connection is received. The problem with (1) is that if a one-shot listener is used to route incoming call notification (e.g. as in H.323) the address to route those notifications is lost as soon as the first call notification is received. The client then has to obtain a new 'one-shot listener' address and re-register that address. This potentially opens up an attack where the attacker can repeatedly force the client (or clients) to keep re-registering with the proxy. This effectively represents an amplification of computing load by the time it reaches the registrar and amounts to a form of DDoS. Cordell [Page 5] Internet Draft SPAN Discussion August 2002 The conundrum for (2) is that its main benefit is also its main weakness. The main benefit is that it is able to allow multiple clients in the 'private' domain to post listeners that can be used to accept multiple incoming calls. The main weakness is that multiple hosts in the 'private' domain are able to accept multiple incoming connections. It seems that it is impossible to allow the benefit without the problem! (2) is mainly of use when multiple clients are behind a firewall/NAT and the initial protocol signaling is done over TCP (e.g. H.323, SIP over TCP or TLS). Where only one client is behind a firewall or NAT, or it is possible to deploy a proxy within the protected network, hard coding of the NAT may be possible (e.g. port 5060 is mapped to the client). However, even in these cases relay forwarding may be required if it is not possible to configure the NAT. (What is the situation with residential ADSL, cable modems etc?) Initially it appears that there are at least two classes of problem that a persistent TCP listener enables: (a) It allows un-authorised servers to be setup for essentially malicious purposes, (b) It allows servers to be set up that, through ignorance, are not sufficiently hardened to attack. In the case of (a) it needs to be decided whether this represents a significantly greater threat to enterprise security policy than a number of other inside attacks that already exist. In the case of (b) it needs to be decided whether the threat of a persistent TCP listener is really any more of a threat than a one-shot listener. Both types of listener likely require some sort of public database (such as a SIP proxy) to map external requests to the dynamically allocated ports on the relay. Method (1) may be used for receiving multiple inbound call requests if, on receiving a new connection, the client immediately requests a new listener and re-registers with the public proxy. The presence of a public proxy is likely to work against the benefit realized by using what amounts to port hopping, and if this is thought to be a threat it might be necessary to somehow ban use of both types of TCP listener initiating sessions. On the other hand, if an attacker is attempting to find victims by port scanning the relay, as long as multiple clients are using a single relay, it is unlikely that the attacker will be able to repeatedly connect to the same client and attack it in a sustained way. The attacker may also not be able to readily associate a particular communication session with a particular client and so attacks requiring multiple sessions will be harder. But this may simply slow the attacker down, rather than preventing the problem. Cordell [Page 6] Internet Draft SPAN Discussion August 2002 One option to allow for persistent listeners with the slightly higher security of one-shot listeners, might be for the client to explicitly signal acceptance of an incoming connection and for the relay to get involved in some form of TCP accept rate limiting. (2) potentially makes the client susceptible to a DoS attack based on request flooding. However, (1) enables other forms of DoS attack, such as service stealing. It needs to be weighed up whether using (1) is practical enough not to use (2) and whether the threat of (2) is sufficiently greater than the threat of either (1), or whether there are comparable threats associated with allowing inbound UDP connections before (2) is discarded. There is obviously much discussion to be had here! 6.1. Listener Lifetime Management Both types of TCP listener may be in the listening state for a long time before they are actually triggered. In both cases it may be necessary to terminate the listener before it actually receives an incoming connection. A one-shot listener could be associated with a TCP connection between the relay and the client, and the lifetimes of the two tied together such that when the TCP connection is closed, the listener is also closed. A persistent listener will likely require some form of out-of-band control to close it down. Both hard state and soft state mechanisms could be used to maintain the status of the listener. 7. Outbound Forwarding through the Relay An issue is whether it is ever necessary to send out-bound packets through the relay. (The other option being to always send them directly from the client.) Enabling the relay to do outbound forwarding would allow packets to flow back along the same logical path (defined solely by source and destination addresses and ports) that they were received on. This is primarily an issue for UDP as any inbound TCP connection is implicitly bi-directional and thus requires no explicit forwarding rules. There seems to be three options here: (1) do not allow the relay to do outbound forwarding, (2) allow the relay to forward packets to the destination that packets were first received from, Cordell [Page 7] Internet Draft SPAN Discussion August 2002 (3) allow the relay to forward outbound packets based on an explicit command from the client that remains in force until a subsequent forwarding command is issued, during which time multiple UDP packets may be forwarded, and (4) allow the relay to forward outbound packets based on per UDP packet explicit commands from the client. It maybe that (1) is sufficient. (2) enables an effective denial of service attack in which an attacker can simply send port scanning packets to the relay and steal service. (3) is much harder to implement as it implies out-of-band control, or multiplexed in-band control. (4) implies multiplexed in-band control. (2) is probably only useful when something like symmetric RTP has been explicitly indicated in the signalling as in the general case RTP packets do not go back from whence they came. One reason for relay based out-bound forwarding might be legal interception, although thus far the IETF has decided not to support such features. Additionally, service provider firewalls are likely to be a more appropriate location to support this function. To decide whether this is necessary, we need to look at: RTP SIP Use of symmetric RTP in SIP {is this still an option?} H.323 RAS H.323 Annex E Whether remote NATs will cause problems with non-symmetric paths. When carrying out the above analysis it may be necessary to consider real-world implementations rather than simply what the standards say. For example, experience has shown that some H.323 endpoints expect RTP and RTCP data to come from the same location that they are sending it to, even though there is no basis for this mode of operation in the various recommendations. 8. In-band Control Vs. Out-of-Band Control There needs to be communication between the client and the relay. Three possibilities exist: (1) Control uses a different transport connection to the data path, Cordell [Page 8] Internet Draft SPAN Discussion August 2002 (2) Control is multiplexed onto the same transport connection as the data using some form of multiplexing, (3) Control initially uses the transport connection, and then irreversibly switches over to data transport once the connection has been suitably configured. (1) is out-of-band and (3) is in-band control. Whether (2) is in-band or out-of-band is a matter of opinion. It is in-band if the transport is considered to be the level of multiplexing, and out-of-band if you consider that the necessary muxing on top of the transport connection is part of the multiplexing. (1) and (2) allow control and data to be exchanged at any time that it is required to do so. The most obvious benefit of this is that it allows controlled termination of the relay operation. It also readily allows the protocol to be extended. (1) requires a separate transport connection per client for the control. (2) requires some form of in-band multiplexing, although with a constrained data set and suitable care it may be possible to define a multiplexing scheme that does not normally require additional copying of the data in order to insert header shims. However, in the general case it does require special handling on occasion, and so may make it less attractive if the relay function is integrated into a NAT. (2) and (3) will often require the client user to be authenticated for each flow that is setup (e.g. twice per RTP session), whereas (1) allows for user authentication at initial control channel setup, and then a more localised authentication scheme for each flow that is setup thereafter. (Here user authentication is assumed to involve accessing user records, whereas a localized scheme may simply be based on some cryptographic token that does not require access to per user information.) It may be possible to ameliorate the difference between user authentication and localized authentication by using some form of user credential caching. This maybe complicated by the fact that typically two data paths will be setup at the same time (RTP and RTCP) and hence by at the time of authenticating the second data path, the first has not completed caching. There are solutions to this problem, but their complexity has to be compared to the complexity involved in setting up a separate control connection. User authentication may also adopt a challenge-based scheme to prevent exchange of actual passwords. This would make per flow user authentication less attractive as it requires more round-trips. Adopting a scheme where the data path authenticates using a more localized scheme allows the media relays to not have access to the user records. This simplifies their implementation, and hence helps with scalability. Cordell [Page 9] Internet Draft SPAN Discussion August 2002 In the case of (1), if the out-of-band control channel is based on TCP then TLS can readily be used to help encrypt and authenticate it. In the case of (2) or (3), IPSec or a more application specific scheme will have to be adopted for the UDP sessions. {Does IPSec allow for things like certificate exchange?} The requirement for multiple streams to have a particular port relationship (such as RTP and RTCP) may preference an out-of-band scheme as the second of the pair will have effectively been setup out-of-band. Hence, using an in-band scheme to set up pairs would result in both in-band and out-of-band techniques and at first sight may lead to a more complex design than a purely out-of-band technique. More detailed design should readily resolve this particular issue. 9. Keep Alives What methods should be used to keep UDP NAT and firewall bindings alive? Are zero length UDP packets sufficient? For streams that don't have much traffic (e.g. call signalling paths - assuming we have such things) is there a trade off between continuous keep-alives over UDP, and transporting the UDP data over TCP, which has better state management. Even if it has benefits, the latter is likely to be seen as a violation of firewall policy as it does not allow an administrator to allow TCP traffic while blocking UDP traffic. On the other hand the feeling seems to be that TCP is a bigger threat to security than UDP, so maybe UDP over TCP is not that bad!!! TCP NAT bindings vary widely. Some last as long as 24 hours. The Linux TCP NAT bindings appear to be as short as 15 minutes. One option to keep the TCP NAT bindings alive is to use TCP Keep-Alive. The recommended default period for this is 2 hours. It is recommended that stacks make this value configurable, but in a number of cases it isn't configurable or required a re-build of the OS. These considerations suggest that the TCP keep-alive mechanism isn't appropriate to keep the NAT bindings alive, and some other method is required. 10. ICMP Typically an ICMP message will be sent to a remote client if a UDP packet is received on a port that is not in use. This can give information to an attacker on which ports are active on a relay. To prevent this it may be worth recommending that ICMP reports are not sent in this situation. This mirrors the way some firewalls can be configured. 11. IP Fragmentation Cordell [Page 10] Internet Draft SPAN Discussion August 2002 Do we need to do/specify anything about this? 12. SCTP Do we need to cover SCTP or can we leave it as FFS? 13. Security Considerations The main consideration throughout this document has been security. The overriding philosophy of this document, as mentioned previously, is that it is important for SPAN not to compromise site security policy. This is not only with regard to the letter of the law (i.e. the firewall rules), but also the spirit of the law (the intent that led to a particular set of rules being written). Inferring such intent is difficult and there are likely to be as many opinions on the subject as there are people contributing to the debate. This document attempts to tread a path comparing threats that already exist against threats that may be introduced by a SPAN type protocol, tempered with the knowledge even minor concessions to functionality start to affect a sites security situation. (Even the deployment of something simple like STUN might have an impact on a site's intended security policy.) This is a difficult area that is not exclusive to SPAN like protocols. For example, Teredo faces exactly the same sorts of issues and, to some degree, so does OPES. The situation where a SPAN type deployment may work without administrator intervention is where any outbound TCP connection is allowed through the firewall. The main reason for this type of rule is to enable FTP operating in PASV mode without an ALG. However, most major firewalls and the majority of small DSL firewalls include FTP ALGs or stateful inspection and such 'any outbound protocol is OK' rules are no longer required and becoming increasingly rare. Indeed, even protocols such as HTTP and POP3 are subject to considerable scrutiny by various proxies to avoid the introduction of viruses. Additionally, there is good reason to avoid rules that allow any outbound protocol. Large amongst these reasons is avoiding the effects of trojans such as Back Orifice that, having obtained resources on an internal machine, will attempt to make an outbound connection back to their host. Hence, it would appear that in many situations, particularly those in where security is considered important, it would not be possible to run a SPAN like protocol without involvement of the administrator. Another issue for SPAN is that somebody inside an enterprise might be able to use it for malicious purposes. Chances are that they would have to be technically literate to make use of SPAN in which case they could implement their own solution without knowledge of SPAN etc. Also Back Orifice and a number of other attacks such as sendmail Cordell [Page 11] Internet Draft SPAN Discussion August 2002 attacks already make use of outbound connections made from compromised machines, and the presence of something like SPAN would not make such attacks any easier. Also, there are probably many better ways for a malicious internal person to operate than using SPAN. For example, rather than post an internal web server they could simply copy all the data they wanted onto a CD-ROM and post it externally. Such an approach is much less traceable, and far more sensible from the malicious person's perspective. One thing that would seem to be beneficial to the community of administrators would be to have guidance that there are protocols such as SPAN around and that there are simple ways to prevent them compromising a site's intended firewall security policy. One way to do this would be to publish a SPAN like protocol and include in the specification ways that it and other protocols can be blocked by administrators. That way firewall vendors can include a suitable set of firewall rules 'out-of-the-box', the reasons for such rules can be openly discussed and administrators will not get any unexpected results. 14. References [STUN]J. Rosenberg, "STUN - Simple Traversal of UDP Through NATs," IETF Internet Draft, draft-rosenberg-midcom-stun-01.txt, March 1 2002. [Mahadev]Information kindly provided by Mahadev Somasundaram in a private e-mail. 15. Authors' Addresses Pete Cordell Ridgeway Systems & Software 66 Suttons Business Park Reading RG6 1AZ England pcordell@ridgewaysystems.com Cordell [Page 12]