Document June 2002 Internet Engineering Task Force S.Ayyasamy Internet Draft UMKC Expires: December 2002 F.Baker Cisco Systems Recommended Internet Service Provider Procedures For Emergency Preparedness Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The purpose of this document is to express what the engineering community expects of Service providers with respect to emergency preparedness. The document can be considered as a set of recommendations to support mission-critical traffic at various locations like access and core. The goal of the draft is to raise discussion among Internet service providers (ISPs) on the importance of accommodating emergency services across Internet. Conventions used in this document The recommendations are stated as points ordered chronologically. For example,ÆÆR (X)ÆÆ indicates recommendation number X. But, they don't represent any order of preference. Ayyasamy Expires - December 2002 [Page 1] Document June 2002 1. Introduction 1.1 Background The Internet can be useful before, during, and after disaster events. For a long time, it has proved to be useful when other communication failed, there by acting as a secondary form of communication .The process of identifying and prioritizing mission-critical traffic is the basic requirement. The other obvious requirements that internet can serve includes information dissemination for media and public, accessing operational data on-line, providing back up communication, instant messaging and visualizing events via video. The other useful area includes listing of victims, route maps and dynamically updated information source. But, Internet has some drawbacks, such as congestion, dependence on power, security risks and software/hardware requirements. Though some shortcomings exist, history shows Internet worked fine during disaster times when compared with telephone circuits [3]. But, the success of emergency management lies in integrating the information from satellite, surveillance, ad-hoc management and location identification from global positioning system (GPS) into the Internet. 1.2 Objective The solution techniques proposed for emergency requirements [4] is very broad. It includes effective control of best-effort traffic, enhancements to QoS models like intserv and diffserv, application level signaling support with SIP/H.323, traffic engineering methods using IP, MPLS and other multi-service networks etc.The document[5] serves as the frame work for supporting emergency traffic in IP telephony environment .The solution space can be enhanced or may even change as newer applications and technologies arise. Hence, an effective implementation lies in the hands of ISPs. They can just over-provision or implement any of the standard solutions recommended by IETF. The service providers manage autonomous systems, which form the global Internet. They operate the backbone network, which provide communication between the customers and the rest of the Internet. In today's commercial Internet, the backbone networks are engineered in a way that there is virtually no packet loss, even during the worst possible network congestion .The access and peering points are carefully rate limited and tuned to adhere to service level agreements and peering arrangements. Hence, recommendations vary largely between the access and peering points in comparison to the backbone, which necessitate a separate study. Ayyasamy Expires - December 2002 [Page 2] Document June 2002 The need for awareness among ISPs for possible use of Internet as a mission critical information transfer medium is the objective behind the draft. 1.3 Scope The draft focuses on recommendations for emergency management within Internet and doesnÆt deal with mode of action at junction points between Internet and PSTN network. Also, the draft addresses service providers only. 1.4 Terminology 1.Authorized Users They include people authorized by appropriate authority (Ex: NCS [6] in US) to establish emergency communication sessions through public networking facilities for facilitating immediate disaster recovery operation [4]. 2.Disaster It includes incidents (Ex: terrorist attacks, forest fire, state emergencies etc), which leads to event driven congestion across Internet and other telecommunication networks. 3.Mission-critical traffic The information transfer between authorized users is classified as mission-critical traffic. It excludes other priority traffic, which doesnÆt support emergency management. Other alternative terms include IEPREP traffic or high-priority traffic. 4.Emergency service It includes any type of service, which helps the process of critical information dissemination. For example, service can be at an application-level like instant massaging (IM) or at a physical layer level like optical restoration techniques. 5. Emergency management It includes operation and management (OAM) functionalities necessary for providing priority across Internet. Traffic engineering, signaling and reservation are part of operation functionalities while traffic monitoring and measurement comes under management functions. 6. Service providers Ayyasamy Expires - December 2002 [Page 3] Document June 2002 It includes organizations in the business of providing Internet connectivity or other Internet services including but not restricted to web hosting services, content providers and e-mail services. 2. Intra-domain issues 2.1 Over-Provisioning This is the default capacity management technique exercised by service providers on backbone networks .In the face of extreme congestion, QoS methods like Intserv, diffserv and MPLS etc work when the overall capacity requirement of emergency traffic is less than the available bandwidth. Network performance degrades rapidly as the high-priority traffic exceeds the limit. Some of the issues in over- provisioning include: R (1): An improved switching and capacity augmentation guarantee high probability of call completion across Internet. Adding backbone capacity appears to be an obvious step although bandwidth alone cannot solve the problem. The other possible areas of improvement include deploying dense wave division multiplexing (DWDM) equipment to increase the transmission capacity. Routers still remain a bottleneck to the bandwidth supply. Hence, switching and routing capacity can also be increased. R (2):It is also important to conserve bandwidth wisely. Some of the techniques for bandwidth management includes [7], traffic shifting, dynamic delay pools, dynamic bandwidth allocation and content supply tools. R (3): Unpredictable event driven congestion during periods of disaster leads to excess web traffic. Hence, redundancy and increase in content supply can avoid the loading of web servers. This helps in reducing access congestion thereby increasing the availability requirement. R (4):If the service providers have agreements for providing guarantee to high-priority traffic, it is advisable to over-provision rather than over-subscribe links. R (5): Sometimes, capacity augmentation increases the congestion due to uneven traffic distribution. Hence, network planners are recommended to use a combination of QoS and bandwidth augmentation approach for an effective emergency management. 2.2 QoS and Diffserv Over provisioning cannot prevent or avoid hot spots caused by uneven traffic distribution. The various Quality of service models (Intserv, Ayyasamy Expires - December 2002 [Page 4] Document June 2002 Diffserv, MPLS etc) are the only means by which a predictable service can be guaranteed. The framework document [5] is the standard document to be consulted for giving priority to VoIP applications in a QoS environment. The following points has to be taken into account while using QoS as a mode of availability: R (6): To avoid inter-operability problems, it is highly recommended to re-use the existing technology rather than implement new methods. But, it is also equally emphasized that one can use any kind of method within their domain as long it doesnÆt have impact on the Internet. R (7): Complex implementations/ modifications should be avoided .Our goal highly depends on short time scale and hence, complex implementations make the work difficult (Ex: specifying multiple parameters for per-flow queuing schemes). 2.2.1 diffserv R (8): A combination of Differentiated Services Architecture [8] and the Framework for Integrated Services Operation over Differentiated Services Networks [9] avoids inter-class effects. The exact configuration is documented in [10]. Our requirements are best achieved when diffserv is used in combination with traffic engineering and active queue management [11] techniques. R (9): The various active queue management schemes can be used for managing capacity and can act as best effort control in addition to the functions mentioned in [11]. The queuing and discard behavior of the AF PHB group as stated in section 4 of [12] provides many suggestions on possible use of RED [13] within diffserv domain. RED can be used to implement queues with packets of different drop priorities. During times of congestion, low priority traffic can be preferentially dropped while packets within emergency traffic being dropped last. R (10): New PHB traffic classes and code points can be used within private network. But, one has to conform [14] when used across Internet. R (11): The per-domain behavior (PDB) [15] defines PDB as "The expected treatment that an identifiable or target group of packets will receive from "edge-to-edge" of a DS domain. A particular PHB (or, if applicable, list of PHBs) and traffic conditioning requirements are associated with each PDB." Ayyasamy Expires - December 2002 [Page 5] Document June 2002 The virtual wire PDB offers strict traffic conditioning and attributes. Virtual wire gives a highly guaranteed support for emergency services. Author's note: The draft on virtual wire PDB has expired. Recent discussion in Diffserv mailing list suggests further work on this side. 2.2.2 MPLS Some of the recommendations include: R (12): Explicit routes can be specified for Label switched paths (LSPs) which can be configured manually or dynamically using Constrained-based routing. The fast reroute [16] serves as a back up when the primary route fails by pre-configuring the desired link. Explicit routing and fast-reroute are the two important techniques recommended for emergency services. R (13): The providers who has ATM core and want to support IEPREP, can use MPLS as a circuit-emulation technology and can easily inter- operate with IP networks. 2.3 Traffic Engineering (TE) Traffic engineering is the method by which traffic is routed selectively and control are applied at various levels to bring the congested network to normal conditions. The term is highly misused and depends on the context of the problem to be approached. 2.3.1 TE in IP network R (14):Interior-gateway protocols (IGP) use link weights in finding which route a packet should take .The additive weights are assigned manually to influence the route used. Hence, IGP metrics can be tailored such that emergency traffic is routed on an alternate path thereby avoiding congested links. Some of the limitations include: First, there is no effective relation between metric attributes and the routes. Second, additive metrics like link state attributes does not provide the ability to manipulate non-additive based constraints. TE in IP networks is normally done at large time scales. But, disaster events require service within short time scales. Hence, a hybrid approach with MPLS has to be used to converge at short time scales. [17] Provides pointers on doing traffic engineering in IP networks. Ayyasamy Expires - December 2002 [Page 6] Document June 2002 R (15): The congestion arising due to uneven distribution of traffic during periods of disaster can be averted by following Equal cost multi-path (ECMP) technique. This technique will allow distribution of traffic evenly among the available paths. 2.3.2 TE with MPLS R (16): A combination of the stated techniques in section 2.2.2 with constrained based routing algorithms is helpful in dynamic emergency management. The overview and principle of traffic engineering RFC [18] explains in more detail on various TE techniques. 2.4 Other recommendations R (17) During extreme overload conditions, OSPF LSA ACK and Hello packets can be prioritized by some DSCP marking [19]. R (18) Dynamic routing schemes can be used along with constrained Based algorithms for pre-emption and re-routing of desired traffic [20]. 3. Inter-domain issues The inter-domain has long been neglected because of policy issues. Redundancy, symmetry and load balancing are some of the Inter-domain protocol properties relevant to our objectives. Though prevalent in use, BGP is considered as the weakest link in the Internet. Stability and configuration problems affect High-priority traffic than other background traffic. Hence, BGP have to be made more robust and resilient. Some of the factors to be considered include: R (19): For a long time, the service providers involve in manual configurations to tweak the BGP routing policies. Humans are prone to mistakes which are reflected in the BGP peering. One of the reasons for the longer convergence time includes massive prefix loss from the routing tables during peak congestion [21]. It underscores the importance of a robust Inter-domain protocol. The service providers are recommended to avoid manual mis-configuration, which leads to excess loss on periods of disaster. R (20): The service providers should make public their filtering policies and contact information. R (22): The evolutionary architecture advocated by IRTF routing group [22] must support diffserv domains to recognize service level specifications. Ayyasamy Expires - December 2002 [Page 7] Document June 2002 The routing system MUST provide means for detecting infrastructural failures of both node equipment and communication links at short time intervals. The future routing architecture should take these factors into account for a resilient Internet. R (23): The class of inter-domain protocols is governed by policies of the respective peering domains. Hence, co-ordination between service providers is essential. The service provider conference like nanog [24] are the places where discussions between service providers can be initiated for emergency management by way of BOF sessions. 4. Access network: The access providers give connectivity to customers at various levels depending on the infrastructure cost. The customers increase their fault tolerance and capacity by means of multi-homing. Also, access can be more congested than the backbone because there is less statistical multiplexing. For instance, a flash crowd can easily overwhelm a web site while having little effect on backbone traffic. Recommendations in section 2 apply to access networks too. Some recommended practices include: R (25): The flash crowd problem results in many SYNs and SYN ACKs, congesting the access links. This causes some dropping of SYNs, but also results in loss of data. SYNs and SYN ACKs, in opposing directions, represent new sessions. Under this circumstance, the SYNs and SYN ACKs can be put at a lower priority, meaning that they will either get dropped or will simply wait. One has to identify the traffic of emergency service to give higher priority. When early congestion notification is applied to subsequent packets in the TCP exchange, they can help in characterizing emergency traffic, while dropping SYNs and SYN ACKs [35]. Author's note: Multi-homing with BGP, speed mis-match issues and ?? can also discussed. 5. Peering and service level agreements R (26): The robustness of the Internet lies in the redundancy among service providers in providing service to a particular geographical area. A certain degree of volunteering is necessary other than the business motives behind the peering agreements. The tier one-service providers can give transit at subsidized rates for other service providers who want to route the emergency traffic. The respective government agencies can also decrease the regulations and provide Ayyasamy Expires - December 2002 [Page 8] Document June 2002 incentives for the service providers who support emergency management. The Peering BOF sessions [24] should also discuss ways for effective co-operation between the service providers on times of disaster. The ISP should also consider providing their contact information transparently and correctly for urgent contacts. R (27): Service level specification (SLS) trades various technical requirements to be met by the provider to the customer or another provider. The technical specification required for satisfying availability requirement is more important than guaranteed service. Automatic and dynamic negotiation of service level specification can be exercised. Reliability parameters like maximum down time and recovery time are most important factors to be negotiated than other performance metrics .A tight bound SLS can also be guaranteed for emergency services on use of virtual leased lines. R (28): For a long-time, flat rate based pricing was the commonly used practice to charge customers. Though, it satisfies the social fairness criteria of pricing, it is not suitable for economic efficiency of service providers in general and for emergency management operations in particular. Any pricing scheme used by Service providers should assume that resources will be scarce or the network is running at high level of utilization. Since, prices indicate one's usage, pricing based on priority adds economic value to the providers and also serve as a useful method for charging high- priority users. Though, social fairness cannot be accommodated, at least proportional fairness have to be exercised. The Service level specifications should also address this recommendation. R (29): The presence of priority fields in the protocol formats tempts one to insert local priority policy into the protocol fields. The end -to-end principle requires separating label from policy .So, policy of a particular domain should be stored either in SIP proxies or bandwidth brokers in a diffserv domain, thereby allowing dynamic policy allocation and transfer. Another debatable issue is preemption. This term is more related to circuit based networks and has little relevance when considering connection-less networks, like Internet. Even with circuit-emulation technologies like MPLS, a particular LSP can pre-empt another LSP and as such it doesnÆt depend on any policy issue. 6. Infrastructure management The problem of link outages and destruction of transmission system can thwart all priority mechanisms implemented across one's network. Hence, this is an important factor to be considered by service providers. Ayyasamy Expires - December 2002 [Page 9] Document June 2002 R (30)[25]: During outages, Interoperability and reliability are the main issues to be considered for re-routing the traffic across other network. Also, vendors should work cooperatively with service providers to get essential communications up and running. Power consumption and conservation should also be addressed. 7. Traffic monitoring R (31): A recommended method for giving priority to mission-critical traffic edge to edge is stated below: Traffic tools should be employed to do the following four steps: 1.Traffic differentiation 2.Examine traffic 3.Control mechanisms. 4. Verification. Traffic differentiation can be done either at packet-level or flow - level. The recommended classification markers include SIP resource priority header [26] at application level and Diffserv markers [10] at network level. Classification based on fancy applications (like MPLS) has become a norm in present day routers obviating predicate classification based on . The second part concentrates on performance and utilization analysis of various flows (responsive flow, high bandwidth flow, irresponsive flow, high priority flow). It throws light on various efficiency factors, which has to be considered for bandwidth allocation. The result of this phase can be obtained from sniffers like tcp dump [27] or by passive packet measurement techniques [28], [29]. Reports provide insight into historical performance, emergency service compliance, and metric analysis and billing .If the reports are not to our expectations, control policies of the ISP should be modified or SLAs have to be reviewed. These mechanisms are standardized at packet level as in [28] and flow level as in [30]. 8. Specific recommendations 8.1 Web hosting services R (33): Disaster events lead to access congestion at news and government web sites. An obvious approach to reduce access congestion is to increase the content supply. Mirroring works by duplicating entire websites at several locations by placing them closer to the user. But, Each site has different uniform resource Ayyasamy Expires - December 2002 [Page 10] Document June 2002 header (URL) and the user must choose one of the mirror sites with no idea which site might provide the best performance. Content distribution network (CDN) can be employed by web hosting services for load balancing the web traffic during such hard times. CDN is a network optimized to deliver specific content, such as static web pages, transaction-based web sites, and streaming media. It works more like a web cache system but in addition place copies of their data to the user as near as possible by any casting method [31]. R (34) content limitation& reduction: A gradual reduction can be done starting from removing advertisements & graphics, Links and other information to just text format. The rate of reduction can depend on the congestion at the server. 8.2 Application service providers R (35): The Potential applications which supports emergency service include information retrieval from distributed database, Instant messaging and presence, application layer signaling (SIP / H.323), VoIP based services, electronic mail, file transfer and world wide web. R (36): Most of the emergency calls originate from PSTN networks and have their destination in Internet. This requires an access platform for transferring emergency calls. ASPs can use soft switches for providing critical connectivity to the PSTN in a highly scalable and reliable fashion. R (37): The I Am Alive (IAA) system works as a database for handling victim's information. But, those web sites often become vulnerable to security hazards. Hence, the application service providers should come up with more reliable distributed algorithms. 9. Other recommendations R (38): Regardless of transport, applications should comply with the IETF Congestion Control Principles [33]. R (39): Today's Internet is predominantly point to point or unicast. Multicast offers ISPs the promise of significant bandwidth savings and information transfer. Multicast can be used to reduce demand on the server. Multicast is one of the few services used on September 11 for communication. But, multicasting is difficult to deploy and is not in much use. If the Service providers offer multicast based Ayyasamy Expires - December 2002 [Page 11] Document June 2002 solutions, they are recommended to create a ubiquitous and reliable Internet. 10. Security Considerations The draft [34] details on the security mechanisms required for emergency preparedness. 11. Acknowledgements Many thanks to Jennifer Rexford Of AT&T research for private discussions on traffic engineering and other inter-domain issues. Additional thanks to Arvind and Kapil for reviewing the draft. 12. Reference 1 Bradner, S., "The internet standards Process -- Revision 3", BCP 9, RFC 2026,October 1996. 2 Bradner, S., "Key words for use in RFCs to indicate requirement levels ", BCP 14, RFC 2119, March 1997. 3 Brewin, B., "Nation's Networks See Large Volume Spikes After attacks," Computer world, September 17, 2001. 4 Folts, H., Beard, C, "Requirements for Emergency telecommunication capabilities in the internet," Internet draft, Work in Progress, June 2002. 5 Carlberg, K. and I. Brown, "Framework for Supporting IEPS in IP Telephony," Internet draft, work in progress, May 2002. 6 http://www.ncs.gov/ 7 Dias, G.V., "Managing Internet Bandwidth: The LEARN ExperienceÆÆ, NET 2001,June 2001. 8 Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and Weiss, w., "An Architecture for Differentiated Services", RFC 2475,December 1998. 9 Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., Speer, M., Braden, R., Davie, B., Wroclawski, J. and E.Felstaine, "A Framework for Integrated Services Operation over Diffserv Networks", RFC 2998, November 2000. 10 Baker, F.,"Edge Interface PHB recommendations," Internet draft, work in progress, June 2002. Ayyasamy Expires - December 2002 [Page 12] Document June 2002 11 Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., Zhang, L.,ææRecommendations on queue management and congestion avoidance in the internet,ÆÆ RFC - 2309, April 1998. 12 Heinanen, J., Baker, F., Weiss, W. and Wroclawski, J., "Assured forwarding PHB," RFC 2597, June 1999. 13 Floyd, S., and Jacobson, V., ææRandom Early Detection gateways for Congestion AvoidanceÆÆ. IEEE/ACM Transactions on Networking, Volume 1, Number 4, August 1993, pp. 397-413. 14 Brim, S., Carpenter, B. and F. Le Faucheur, Black, D, "Per Hop Behavior Identification Codes", RFC 3140, June 2001. 15 Nichols, K., Carpenter, B., ææDefinition of Differentiated Services Per-domain Behaviors and Rules for their SpecificationÆÆ, RFC-3086, April 2001. 16 Ping, P., Der-Haw, G., George, S., Jean, P.V., Dave, C., Alia, A., Markus, J.,"Fast Reroute Extensions to RSVP-TE for LSP Tunnels", Internet Draft, work in progress, July 2002. 17 Feldmann, A., Greenberg, A., Lund, C., Reingold, N., Rexford, J., "NetScope: Traffic engineering for IP networks," IEEE Network Magazine, Mar./Apr. 2000,pp.11-19. 18 Awduche, D., Chiu, A., Elwalid, I., Widjaja, X., Xiao,X.,"Overview and Principles of Internet Traffic Engineering", RFC 3272,May 2002 19 Ash, J., Choudhury, G.L., Sapozhnikova, V.D., Sherif, M., Manral, V., Maunder, A.," Congestion Avoidance & Control for OSPF Networks", Internet Draft, Work in progress, April 2002. 20 Ash, G.R., "Dynamic Routing in Telecommunications Network", MacGraw Hill, 1998. 21 Mahajan, R., wetherall, D., Anderson, T.,ææthe impact of BGP misconfiguration on connectivity,ÆÆ in proc. nanog23, October 2001. 22 http://www.irtf.org. 23 Feamster, N., Borkenhagen, J., Rexford, J.,"Controlling the impact of BGP policy changes on IP traffic," AT&T Research Technical Report 011106-02, November 2001. 24 http://nanog.org. Ayyasamy Expires - December 2002 [Page 13] Document June 2002 25 Yankee group, " September 11, 2001: Infrastructure Impacts, Implications, and Recommendations ", Special Report, September 2001. 26 Polk, J., Schulzrinne, H, "SIP Communications Resource Priority Header", Internet Draft, Work In Progress, December 2001. 27 http://www.tcpdump.org 28 Duffield, N., Greenberg, A., Grossglauser, M., Rexford, J.,"A Framework for Passive Packet Measurement", work in progress, February 2002. 29 http://www.Sflow.org. 30 http://ipfix.doit.wisc.edu/. 31 Partridge, C., Mendez, T., Milliken, W.,"Host Any casting Service", RFC 1546,November 1993. 32 Berners-Lee, T., Gettys, J., Nielsen, H.F., "Replication and Caching Position Statement," World-Wide Web consortium, November 2000. 33 Floyd, S., "Congestion Control Principles", BCP 41, RFC2914, September 2000. 34 Brown, I., "A Security Framework for Prioritised Emergency Communication", Internet draft, March 2002. 35 http://iepscheme.net/archive/ Ayyasamy Expires - December 2002 [Page 14] Document June 2002 Author's Addresses Ayyasamy SenthilKumar Fred Baker University Of Missouri Cisco Systems Kansas City 1121 Via Del Rey MO 64110 Santa Barbara, CA 93117 USA USA Phone: +1-408-526-4257 Email: saq66@umkc.edu Fax: +1-413-473-2403 Email: fred.baker@cisco.com Ayyasamy Expires - December 2002 [Page 15]