INTERNET DRAFT Srinivas Mantripragada Category: Informational NetContinuum Title: draft-srinivas-wat-01.txt Prasad Vellanki Date: Decemeber 1, 2003 NetContinuum Expires: June 1, 2004 Sridhar Raman NetContinuum Venkata Nambula NetContinuum Web Address Translation (WAT) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at: http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at: http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This draft specifies Web Address Translation (WAT) mechanism. The scheme allows user to hide or rewrite the backend (and internal) domain addresses. The scheme is based on a suite of URL translation schemes without requiring any of the backend application servers to be reconfigured. The WAT mechanism allows multiple applications to be hosted on different virtual servers via a single domain. The analogy is similar to NAT except that the proposed implementation scheme operates at the web application layer and brings the value of web address translation mechanism up into the network rather than doing the functions only in the web servers at the end point. Srinivas Expires May 24, 2004 [Page 1] Internet-Draft WAT Nov 2003 Table of Contents 1.0 Introduction 2.0 Terminology 2.1 Network Address Translation (NAT) 2.2 Uniform Resource Identifier (URI) 2.3 World Wide Web (WWW) 2.4 Uniform Resource Locator (URL) 2.5 Hypertext Reference (HREF) 3.0 WAT Implementation 3.1 Website Cloaking 3.1.1 Status 3.1.2 Suppress Return Code 3.1.3 Filter Response Header 3.1.4 Headers to Filter 3.2 URL Translations 3.2.1 Status 3.2.2 External URL 3.2.3 External Domain 3.2.4 Internal URL 3.2.5 Internal Domain 3.3 URL Rewrite 3.3.1 Status 3.3.2 Matching Rule 3.3.3 Sequence Number 3.3.4 Action 3.3.4.1 Insert Header 3.3.4.2 Remove Header 3.3.4.3 Replace Header 3.3.4.4 Rewrite URL 3.3.4.5 Redirect URL 3.3.5 Header 3.3.6 Continue Processing other Rewrites 3.3.7 New Value 4.0 References 5.0 Authors 6.0 Full Copyright statement 1.0 Introduction Enterprises are actively migrating business applications to web technologies to improve access and control costs. At the same time, the threat of attack is growing exponentially. The majority of attacks now are exploiting application-layer vulnerabilities. While traditional network firewalls address network access control and block unauthorized network-level requests, application firewalls address the application layer by enforcing security policies within application sessions. An application firewall specifically protects the Web application communication stream and all associated application resources from attacks that happen via the Web protocol. Srinivas Expires May 24, 2004 [Page 2] Internet-Draft WAT Nov 2003 The logical place to add this protection is at the corporate edge where the traditional firewall currently sits. A significant amount of web attacks are applied through tampering with the HTTP protocol compliant URLs and header fields. One of the pre-requisites of a true web application firewall is URL and header protection. In this draft, we propose the Web Address Translation (WAT) scheme that can be effectively implemented as a standard mechanism to provide URL and header protection. The key highlights of the WAT implementation scheme include: (1) Ability to hide an internal structure (or layout) of a company's web site. (2) Create a homogeneous and consistent URL layout over all WWW servers within an Intranet Web cluster. (3) Give the WWW namespace a consistent server-independent layout. (4) Provide a consistent URL translation mechanism by which: (4.1) Exported URLs do not need to bind to any physically correct target server. (4.2) Applications do not need to be altered to work outside the firewall. The authors feel that WAT is a natural extension to NAT implementation (RFC 1631) with a different goal in mind. The NAT implementation presents a technique to connect end IP addresses in a public (or external) network to communicate with the end IP addresses in a private (or internal) network and vice-versa. NAT works by using the several million private addresses that have been put aside by the Internet Engineering Task Force, turning a public IP address such as 192.156.136.22 into a private address, such as 10.0.0.4, for delivery to a user's PC. Private IP addresses cannot be "seen" by the Internet, and therefore may be reused by various enterprise networks. The WAT implementation adopts similar philosophy and proposes a series of techniques to modify and translate URLs and headers that are globally visible in the WWW namespace to a private URL namespace that is not visible to the external world. The WAT implementation specifics are described in the subsequent sections. 2.0 Terminology The following terms are used in the rest of the document. 2.1 Network Address Translation (NAT) The term NAT in this document refers to the translation of a private/ internal IP address to a public/external IP address and vice-versa. 2.2 Uniform Resource Identifier (URI) The W3C's codification of the name and address syntax of present and future objects on the Internet. In its most basic form, a URI consists of a scheme name (such as file, http, ftp, news, mailto, gopher) Srinivas Expires May 24, 2004 [Page 3] Internet-Draft WAT Nov 2003 followed by a colon, followed by a path whose nature is determined by the scheme that precedes it (see RFC 1630). URI is the umbrella term for URNs, URLs, and all other Uniform Resource Identifiers. 2.3 World Wide Web (WWW) The World Wide Web is a collection of information servers linked together through a language called hypertext. This allows you to select a hypertext link on one page which may take you to a different server halfway around the world. 2.4 Uniform Resource Locator (URL) The World Wide Web address of a site on the Internet. 2.5 Hypertext Reference (HREF) This is an attribute used to set the URL of an object that is being referenced. This attribute is used in many tags, but often with the tag. 3.0 WAT Implementation The proposed WAT implementation is split into 3 main techniques. (1) Website Cloaking (2) URL Translations (3) URL Rewrite (Request and Response) 3.1 Website Cloaking Website Cloaking is described as a method to conceal enterprise web resources from hackers and worms scanning for vulnerabilities. Almost every successful attack is preceded by probing websites for weakness. Readily available tools on the Internet such as Whisker, Nessus and Nikto make it easy for potential intruders to scan any website. This allows them to determine exactly how the applications were built, what kind of servers are they running on, and which URLs contain the vulnerabilities. Worms such as Code Red auto scan the internet for specific server types with known vulnerabilities in order to launch an attack. In the proposed implementation, website cloaking effectively hides URL return codes, HTTP headers and backend IP addresses. As a result, there is zero visibility into which web servers, application servers, operating systems, directory structure and patches are running on the protected web sites. The implementation details follows: Srinivas Expires May 24, 2004 [Page 4] Internet-Draft WAT Nov 2003 3.1.1 Status This parameter is used to enable or disable the Website Cloaking feature. 3.1.2 Suppress Return Code When enabled, this parameter blocks the return of an HTTP status code in a response header. These codes are returned from a server if there is a problem with the browser or the Web server, itself. The two types of response error codes that are suppressed include the following: . 4xx (client): These are "400-series" error codes. These codes are intended for instances where a client seems to have erred when attempting to access a Web page. For example, "404: Page not found." . 5xx (server): These are "500-series" error codes. These codes are intended to indicate that a Web server is aware that it has a problem or that it is incapable of performing a request. For example, "500: Internal Error". With these codes "suppressed", weaknesses in any infrastructure will also be suppressed. The hacker will not know whether there is a problem with the client or the Web server. 3.1.3 Filter Response Header When enabled, this parameter filters a specific HTTP header in a response. The actual HTTP header response can be defined by using the "Headers" option (defined below). 3.1.4 Headers This parameter is used to define the banner header in a response that needs to be filtered. The input format is specified in string format. For example, "Host", "Server", "Content-Length" are few such examples. 3.2 URL Translations When a Web site sends a page to a user, it typically includes a variety of embedded references to other objects on the site. If the references are relative, meaning that they don't include the name of the server within them (e.g. /content.html) rather than absolute (http://www.example.come/content.html), there is no problem. Srinivas Expires May 24, 2004 [Page 5] Internet-Draft WAT Nov 2003 However, most Web sites do embed absolute links. This exposes two problems. The first problem frequently occurs in situations when a proxy is performing SSL acceleration. When links embedded in the document are prefixed with "http" instead of "https", "users" click are made to the unencrypted pages. As a result, URLs sometimes can be delivered without question or simply don't work. The URL translation mechanism should allow parsing the response and rewriting the "http" to "https". The second problem occurs when a proxy's domain name is different from the server's name - for example, a server named server.example.com and a proxy named www.example.com. Applications that look to the host might end up embedding links such as http://server.example.com/content.html when they should say http://www.example.com/content.html. JavaScript and HTTP cookies increase the problem. JavaScript-driven pages often dynamically assemble URLs on the client side. The HTTP cookies are sent from the server such that the client will only return them back when the communication is made with the server directly without a proxy in between. In most cases, site administrators lack the resources to make the changes to applications to fix a problem. Instead, what is needed is a rewriting/mapping of incorrect URLs to the correct form. The rewriting/mapping has to happen for links being sent from the server to the client and for HTTP requests from the client to the server. The URL translation is able to rewrite URLs embedded within HTML, DHTML, XHTML, Cascading Style Sheets, JavaScript, HTTP cookies and Flash. A link that once appeared as http://intranet.company.com/content.html will now appear as https://proxy.company.com/prx/000/http/intranet.company.com/content.html The URL translation occurs such that everything is syntactically and semantically correct. These are the following steps. Step1: User request https://www.example.com Step2: Server responds with content which includes the link http://server.example.com/images/logi.jpg Step3: URL translation rewrites the outgoing response and sends it to the User. Srinivas Expires May 24, 2004 [Page 6] Internet-Draft WAT Nov 2003 Step4: User requests https://example.com/prx/00/http/server.example.com/image/logo.jpg Step5: URL translation rewrites the incoming request and the server recieves: http://server.example.com/logo.jpg By performing the above operations, the server doesn't realize that the content was modified in any way. This helps provide application security. The implementation details follows: 3.2.1 Status Expects a Boolean input [Yes/No]. This parameter is used to enable or disable this feature. 3.2.2 External URL Expects a string input. The external URL should be a publicly exported URL in the WWW namespace and has to be unique. An empty value means that no translations need to be performed on this external URL. Requests coming from the client with the matching input string are mapped to a unique URL translation rule. The domain part of the outgoing requests is rewritten back with the input value. The string "*" means rewrite all absolute URLs on the response data. Domain can be a suffix pattern or a simple string. 3.2.3 External Domain Expects a string input. The external domain should be the publicly exported Domain in the WWW namespace and has to be unique. For example www.mysite.com, www.mydomain.com etc. 3.2.4 Internal URL Expects a string input. The internal URL should always start with a '/' character and should be locally visible. 3.2.5 Internal Domain Expects a string input. The internal domain represents the local namespace server or IP address that is not visible (or exported) to the external user. 3.2.5 Example The following example configuration can be used to translate an internal URL such as /bugzilla to an external visible URL, http://www.mydomain.com/bugs. As a result the internal URL is now invisible to the external user. Srinivas Expires May 24, 2004 [Page 7] Internet-Draft WAT Nov 2003 http://www.mydomain.com/bugs => /bugzilla Name: bugs Status: On/Off External URL: /bugs External Domain: www.mydomain.com Internal URL: / Internal Domain: bugzilla 3.3 URL Rewrite The WAT implementation proposes URL rewrite for both incoming requests and outgoing responses. The specific implementation details follows: 3.3.1 Status This parameter is used to enable or disable this feature. 3.3.2 Matching Rule Expects a string input. This can be a regular expression or a prefix- suffix pattern. Multiple rules can be specified. The pattern will be used to match the URL or the Header as specified in the Action field below. 3.3.3 Sequence Number Expects a non-negative value. This number specifies the order in which the matching rules specified in Section 3.3.2 need to be processed. 3.3.4 Action The Action parameter specifies the operation that needs to be followed once the rule is matched. The action attributes apply to only Header and URL field items and are listed below: 3.3.4.1 Insert Header The matching rule specified in Section 3.3.2 applies to the Header field. This applies to both incoming request and outgoing response. If the rule as specified in Section 3.3.2 matches, the URL rewrite mechanism inserts a new header. The new header string is taken as an input parameter and is specified in Section 3.3.7. 3.3.4.2 Remove Header The matching rule specified in Section 3.3.2 applies to the Header field. This applies to both incoming request and outgoing response. If the rule as specified in Section 3.3.2 matches, the URL rewrite mechanism removes the header portion of the string. Srinivas Expires May 24, 2004 [Page 8] Internet-Draft WAT Nov 2003 3.3.4.3 Replace Header The matching rule specified in Section 3.3.2 applies to the Header field. This applies to both incoming request and outgoing response. If the rule as specified in Section 3.3.2 matches, the URL rewrite mechanism replaces the old header string with the new header string. The new header string is taken as an input parameter and is specified in Section 3.3.7. 3.3.4.4 Rewrite URL The matching rule specified in Section 3.3.2 applies to the URL field. This applies to the incoming request only. If the rule as specified in Section 3.3.2 matches, the URL rewrite mechanism rewrites the old header string with the new header string. The new header string is taken as an input parameter and is specified in Section 3.3.7. 3.3.4.5 Redirect URL The matching rule specified in Section 3.3.2 applies to the URL field. This applies to the incoming request only. If the rule as specified in Section 3.3.2 matches, the URL rewrite mechanism redirects the URL to a new location. The new location field is taken as an input parameter and is specified in Section 3.3.7. 3.3.5 Header Expects a string input. It specifies one of the many header fields that need to be matched and the corresponding action as specified in Section 3.3.4 that needs to be taken. 3.3.6 Continue Processing other Rewrites Expects a boolean input [Yes/No]. Provides an option for the URL rewrite engine to stop after the first match or continue processing all the rules specified. 3.3.7 New Value Expects a string input. This specifies the new value that the action specified in Section 3.3.4 will operate upon. 4.0 References [NAT] Egevang, K. and P. Francis, "The IP Network Address Translator (NAT)", RFC 1631, May 1994. [NAT-TERM] Srisuresh, P. and M. Holdrege, "IP Network Address Translator (NAT) Terminology and Considerations", RFC 2663, August 1999. Srinivas Expires May 24, 2004 [Page 9] Internet-Draft WAT Nov 2003 5.0 Authors Srinivas Mantripragada 1705 Wyatt Drive Santa Clara, CA 95054 USA Phone: 408-961-5600 Fax: 408-986-8997 Email: srinivas@netcontinuum.com The authors would like to thank Robert Doyle and Neil Correa for their comments and feedback on the initial version of this document. 6.0 Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Srinivas Expires May 24, 2004 [Page 10]