INTERNET DRAFT                                       Srinivas Mantripragada
Category: Informational                                        NetContinuum
Title: draft-srinivas-wat-01.txt                            Prasad Vellanki
Date: Decemeber 1, 2003                                        NetContinuum
Expires: June 1, 2004                                         Sridhar Raman
                                                               NetContinuum
                                                            Venkata Nambula
                                                               NetContinuum


                         Web Address Translation (WAT)


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering Task
   Force (IETF), its areas, and its working groups.  Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at
   any time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at:

      http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at:

      http://www.ietf.org/shadow.html.

   Copyright Notice

     Copyright (C) The Internet Society (2003).  All Rights Reserved.

Abstract

   This draft specifies Web Address Translation (WAT) mechanism. The
   scheme allows user to hide or rewrite the backend (and internal)
   domain addresses. The scheme is based on a suite of URL translation
   schemes without requiring any of the backend application servers to
   be reconfigured. The WAT mechanism allows multiple applications to be
   hosted on different virtual servers via a single domain. The analogy is 
   similar to NAT except that the proposed implementation scheme operates 
   at the web application layer and brings the value of web address 
   translation mechanism up into the network rather than doing the 
   functions only in the web servers at the end point.


Srinivas			Expires May 24, 2004		[Page 1]

Internet-Draft                          WAT                     Nov 2003


Table of Contents

   1.0  Introduction
   2.0  Terminology
	2.1  Network Address Translation (NAT)
	2.2  Uniform Resource Identifier (URI)
	2.3  World Wide Web (WWW)
	2.4  Uniform Resource Locator (URL)
	2.5  Hypertext Reference (HREF)
   3.0  WAT Implementation
        3.1  Website Cloaking
             3.1.1 Status
             3.1.2 Suppress Return Code
             3.1.3 Filter Response Header
             3.1.4 Headers to Filter
        3.2  URL Translations
             3.2.1 Status
             3.2.2 External URL
             3.2.3 External Domain
             3.2.4 Internal URL
             3.2.5 Internal Domain
        3.3  URL Rewrite
             3.3.1 Status
             3.3.2 Matching Rule
             3.3.3 Sequence Number
             3.3.4 Action
                   3.3.4.1 Insert Header
                   3.3.4.2 Remove Header
                   3.3.4.3 Replace Header
                   3.3.4.4 Rewrite URL
                   3.3.4.5 Redirect URL
             3.3.5 Header
             3.3.6 Continue Processing other Rewrites
             3.3.7 New Value
   4.0 References
   5.0 Authors
   6.0 Full Copyright statement

1.0  Introduction

   Enterprises are actively migrating business applications to web
   technologies to improve access and control costs. At the same
   time, the threat of attack is growing exponentially. The majority 
   of attacks now are exploiting application-layer vulnerabilities.
   While traditional network firewalls address network access control
   and block unauthorized network-level requests, application firewalls
   address the application layer by enforcing security policies within
   application sessions. An application firewall specifically protects 
   the Web application communication stream and all associated 
   application resources from attacks that happen via the Web protocol.


Srinivas			Expires May 24, 2004		[Page 2]

Internet-Draft                          WAT                     Nov 2003


   The logical place to add this protection is at the corporate edge where
   the traditional firewall currently sits. A significant amount of web 
   attacks are applied through tampering with the HTTP protocol compliant 
   URLs and header fields. One of the pre-requisites of a true web 
   application firewall is URL and header protection. In this draft, we
   propose the Web Address Translation (WAT) scheme that can be effectively 
   implemented as a standard mechanism to provide URL and header protection. 
   The key highlights of the WAT implementation scheme include:

   (1) Ability to hide an internal structure (or layout) of a company's 
       web site.
   (2) Create a homogeneous and consistent URL layout over all WWW servers
       within an Intranet Web cluster.
   (3) Give the WWW namespace a consistent server-independent layout.
   (4) Provide a consistent URL translation mechanism by which:
       (4.1) Exported URLs do not need to bind to any physically correct
             target server.
       (4.2) Applications do not need to be altered to work outside the
             firewall.

   The authors feel that WAT is a natural extension to NAT implementation
   (RFC 1631) with a different goal in mind. The NAT implementation
   presents a technique to connect end IP addresses in a public (or 
   external) network to communicate with the end IP addresses in a private 
   (or internal) network and vice-versa. NAT works by using the several 
   million private addresses that have been put aside by the Internet 
   Engineering Task Force, turning a public IP address such as 
   192.156.136.22 into a private address, such as 10.0.0.4, for delivery 
   to a user's PC. Private IP addresses cannot be "seen" by the Internet, 
   and therefore may be reused by various enterprise networks. The WAT 
   implementation adopts similar philosophy and proposes a series of 
   techniques to modify and translate URLs and headers that are globally 
   visible in the WWW namespace to a private URL namespace that is not 
   visible to the external world. The WAT implementation specifics are 
   described in the subsequent sections.


2.0  Terminology

   The following terms are used in the rest of the document.

   2.1 Network Address Translation (NAT)

   The term NAT in this document refers to the translation of a private/
   internal IP address to a public/external IP address and vice-versa.

   2.2 Uniform Resource Identifier (URI)

   The W3C's codification of the name and address syntax of present and
   future objects on the Internet. In its most basic form, a URI consists 
   of a scheme name (such as file, http, ftp, news, mailto, gopher) 


Srinivas			Expires May 24, 2004		[Page 3]

Internet-Draft                          WAT                     Nov 2003


   followed by a colon, followed by a path whose nature is determined by 
   the scheme that precedes it (see RFC 1630). URI is the umbrella term 
   for URNs, URLs, and all other Uniform Resource Identifiers.

   2.3 World Wide Web (WWW)

   The World Wide Web is a collection of information servers linked
   together through a language called hypertext. This allows you to select
   a hypertext link on one page which may take you to a different server 
   halfway around the world.

   2.4 Uniform Resource Locator (URL)

   The World Wide Web address of a site on the Internet.

   2.5 Hypertext Reference (HREF)

   This is an attribute used to set the URL of an object that is being
   referenced. This attribute is used in many tags, but often with the 
   <A> tag.

3.0  WAT Implementation

   The proposed WAT implementation is split into 3 main techniques.
   (1) Website Cloaking
   (2) URL Translations
   (3) URL Rewrite (Request and Response)

   3.1 Website Cloaking

       Website Cloaking is described as a method to conceal enterprise 
       web resources from hackers and worms scanning for vulnerabilities. 
       Almost every successful attack is preceded by probing websites for 
       weakness. Readily available tools on the Internet such as Whisker, 
       Nessus and Nikto make it easy for potential intruders to scan any 
       website. This allows them to determine exactly how the applications 
       were built, what kind of servers are they running on, and which 
       URLs contain the vulnerabilities. Worms such as Code Red auto scan 
       the internet for specific server types with known vulnerabilities 
       in order to launch an attack.

       In the proposed implementation, website cloaking effectively hides 
       URL return codes, HTTP headers and backend IP addresses. As a 
       result, there is zero visibility into which web servers, 
       application servers, operating systems, directory structure and 
       patches are running on the protected web sites.

       The implementation details follows:


Srinivas			Expires May 24, 2004		[Page 4]

Internet-Draft                          WAT                     Nov 2003


       3.1.1 Status

       This parameter is used to enable or disable the Website Cloaking
       feature.

       3.1.2 Suppress Return Code

       When enabled, this parameter blocks the return of an HTTP status 
       code in a response header. These codes are returned from a server 
       if there is a problem with the browser or the Web server, itself. 
       The two types of response error codes that are suppressed include
       the following:


       . 4xx (client): These are "400-series" error codes.
       These codes are intended for instances where a client seems to have
       erred when attempting to access a Web page. For example, "404: Page 
       not found."

       . 5xx (server): These are "500-series" error codes.
       These codes are intended to indicate that a Web server is aware 
       that it has a problem or that it is incapable of performing a 
       request. For example, "500: Internal Error".

       With these codes "suppressed", weaknesses in any infrastructure 
       will also be suppressed. The hacker will not know whether there
       is a problem with the client or the Web server.

       3.1.3 Filter Response Header

       When enabled, this parameter filters a specific HTTP header in a
       response. The actual HTTP header response can be defined by using 
       the "Headers" option (defined below).

       3.1.4 Headers

       This parameter is used to define the banner header in a response 
       that needs to be filtered. The input format is specified in string  
       format. For example, "Host", "Server", "Content-Length" are few
       such examples.


   3.2 URL Translations

   When a Web site sends a page to a user, it typically includes a variety
   of embedded references to other objects on the site. If the references 
   are relative, meaning that they don't include the name of the server 
   within them (e.g. /content.html) rather than absolute
   (http://www.example.come/content.html), there is no problem.


Srinivas			Expires May 24, 2004		[Page 5]

Internet-Draft                          WAT                     Nov 2003


   However, most Web sites do embed absolute links. This exposes two
   problems. The first problem frequently occurs in situations when a 
   proxy is performing SSL acceleration. When links embedded in the 
   document are prefixed with "http" instead of "https", "users" click 
   are made to the unencrypted pages. As a result, URLs sometimes can
   be delivered without question or simply don't work. The URL translation 
   mechanism should allow parsing the response and rewriting the "http" 
   to "https". The second problem occurs when a proxy's domain name is 
   different from the server's name - for example, a server named 
   server.example.com and a proxy named www.example.com. Applications that 
   look to the host might end up embedding links such as 

   http://server.example.com/content.html when they should say

   http://www.example.com/content.html.

   JavaScript and HTTP cookies increase the problem. JavaScript-driven 
   pages often dynamically assemble URLs on the client side. The HTTP
   cookies are sent from the server such that the client will only 
   return them back when the communication is made with the server 
   directly without a proxy in between.

   In most cases, site administrators lack the resources to make the 
   changes to applications to fix a problem. Instead, what is needed 
   is a rewriting/mapping of incorrect URLs to the correct form. The 
   rewriting/mapping has to happen for links being sent from the server 
   to the client and for HTTP requests from the client to the server.

   The URL translation is able to rewrite URLs embedded within HTML,
   DHTML, XHTML, Cascading Style Sheets, JavaScript, HTTP cookies and 
   Flash.

   A link that once appeared as

   http://intranet.company.com/content.html

   will now appear as
   https://proxy.company.com/prx/000/http/intranet.company.com/content.html

   The URL translation occurs such that everything is syntactically and
   semantically correct. These are the following steps.

   Step1: User request https://www.example.com

   Step2: Server responds with content which includes the link

   http://server.example.com/images/logi.jpg

   Step3: URL translation rewrites the outgoing response and sends it to  
          the User.


Srinivas			Expires May 24, 2004		[Page 6]

Internet-Draft                          WAT                     Nov 2003


   Step4: User requests
   https://example.com/prx/00/http/server.example.com/image/logo.jpg

   Step5: URL translation rewrites the incoming request and the server
          recieves:
   http://server.example.com/logo.jpg

   By performing the above operations, the server doesn't realize that the
   content was modified in any way. This helps provide application
   security.

   The implementation details follows:

     3.2.1 Status

     Expects a Boolean input [Yes/No]. This parameter is used to enable or
     disable this feature.

     3.2.2 External URL

     Expects a string input. The external URL should be a publicly 
     exported URL in the WWW namespace and has to be unique. An empty 
     value means that no translations need to be performed on this 
     external URL. Requests coming from the client with the matching 
     input string are mapped to a unique URL translation rule. The 
     domain part of the outgoing requests is rewritten back with the 
     input value. The string "*" means rewrite all absolute URLs on the
     response data. Domain can be a suffix pattern or a simple string.

     3.2.3 External Domain

     Expects a string input. The external domain should be the publicly
     exported Domain in the WWW namespace and has to be unique. For
     example www.mysite.com, www.mydomain.com etc.

     3.2.4 Internal URL

     Expects a string input. The internal URL should always start with a '/'
     character and should be locally visible.

     3.2.5 Internal Domain

     Expects a string input. The internal domain represents the local
     namespace server or IP address that is not visible (or exported) to 
     the external user.

     3.2.5 Example

     The following example configuration can be used to translate an
     internal URL such as /bugzilla to an external visible URL,
     http://www.mydomain.com/bugs. As a result the internal URL is
     now invisible to the external user.

Srinivas			Expires May 24, 2004		[Page 7]

Internet-Draft                          WAT                     Nov 2003


     http://www.mydomain.com/bugs => /bugzilla

     Name:              bugs
     Status:            On/Off
     External URL:      /bugs
     External Domain:   www.mydomain.com
     Internal URL:      /
     Internal Domain:   bugzilla


   3.3 URL Rewrite

   The WAT implementation proposes URL rewrite for both incoming requests
   and outgoing responses. The specific implementation details follows:

     3.3.1 Status
     This parameter is used to enable or disable this feature.


     3.3.2 Matching Rule

     Expects a string input. This can be a regular expression or a prefix-
     suffix pattern. Multiple rules can be specified. The pattern will be 
     used to match the URL or the Header as specified in the Action field 
     below.

     3.3.3 Sequence Number

     Expects a non-negative value. This number specifies the order in which
     the matching rules specified in Section 3.3.2 need to be processed.

     3.3.4 Action

     The Action parameter specifies the operation that needs to be followed 
     once the rule is matched. The action attributes apply to only Header 
     and URL field items and are listed below:

           3.3.4.1 Insert Header
                   The matching rule specified in Section 3.3.2 applies to 
		   the Header field. This applies to both incoming request 
		   and outgoing response. If the rule as specified in 
		   Section 3.3.2 matches, the URL rewrite mechanism inserts 
		   a new header. The new header string is taken as an input 
                   parameter and is specified in Section 3.3.7.

           3.3.4.2 Remove Header
                   The matching rule specified in Section 3.3.2 applies to 
		   the Header field. This applies to both incoming request
		   and outgoing response. If the rule as specified in 
		   Section 3.3.2 matches, the URL rewrite mechanism removes 
		   the header portion of the string. 

Srinivas			Expires May 24, 2004		[Page 8]

Internet-Draft                          WAT                     Nov 2003


           3.3.4.3 Replace Header
                   The matching rule specified in Section 3.3.2 applies to 
		   the Header field. This applies to both incoming request 
		   and outgoing response. If the rule as specified in 
		   Section 3.3.2 matches, the URL rewrite mechanism replaces
		   the old header string with the new header string. The new 
                   header string is taken as an input parameter and is 
                   specified in Section 3.3.7.

           3.3.4.4 Rewrite URL
                   The matching rule specified in Section 3.3.2 applies to 
		   the URL field. This applies to the incoming request only. 
		   If the rule as specified in Section 3.3.2 matches, the 
		   URL rewrite mechanism rewrites the old header string 
	           with the new header string. The new header string is 
		   taken as an input parameter and is specified in 
	           Section 3.3.7.

           3.3.4.5 Redirect URL
                   The matching rule specified in Section 3.3.2 applies to 
		   the URL field. This applies to the incoming request only. 
		   If the rule as specified in Section 3.3.2 matches, the 
		   URL rewrite mechanism redirects the URL to a new location.
	           The new location field is taken as an input parameter 
		   and is specified in Section 3.3.7.

     3.3.5 Header

     Expects a string input. It specifies one of the many header fields that 
     need to be matched and the corresponding action as specified in 
     Section 3.3.4 that needs to be taken.

     3.3.6 Continue Processing other Rewrites

     Expects a boolean input [Yes/No]. Provides an option for the URL
     rewrite engine to stop after the first match or continue processing 
     all the rules specified.

     3.3.7 New Value

     Expects a string input. This specifies the new value that the action
     specified in Section 3.3.4 will operate upon.

4.0  References

[NAT]      Egevang, K. and P. Francis, "The IP Network Address
           Translator (NAT)", RFC 1631, May 1994.

[NAT-TERM] Srisuresh, P. and M. Holdrege, "IP Network Address
           Translator (NAT) Terminology and Considerations", RFC
           2663, August 1999.


Srinivas			Expires May 24, 2004		[Page 9]

Internet-Draft                          WAT                     Nov 2003


5.0  Authors

     Srinivas Mantripragada
     1705 Wyatt Drive
     Santa Clara, CA 95054 USA
     Phone: 408-961-5600
     Fax: 408-986-8997
     Email: srinivas@netcontinuum.com

     The authors would like to thank Robert Doyle and Neil Correa for their
     comments and feedback on the initial version of this document.

6.0  Full Copyright Statement

     Copyright (C) The Internet Society (2003). All Rights Reserved.

     This document and translations of it may be copied and furnished to
     others, and derivative works that comment on or otherwise explain it
     or assist in its implementation may be prepared, copied, published
     and distributed, in whole or in part, without restriction of any
     kind, provided that the above copyright notice and this paragraph are
     included on all such copies and derivative works. However, this
     document itself may not be modified in any way, such as by removing
     the copyright notice or references to the Internet Society or other
     Internet organizations, except as needed for the purpose of
     developing Internet standards in which case the procedures for
     copyrights defined in the Internet Standards process must be
     followed, or as required to translate it into languages other than
     English.

     The limited permissions granted above are perpetual and will not be
     revoked by the Internet Society or its successors or assignees.
     This document and the information contained herein is provided on an
     "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
     TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
     BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
     HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
     MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Srinivas			Expires May 24, 2004			[Page 10]