Network Working Group Chia Yuan Cho Internet-document Sukanta Kumar Hazra Expires: August 2004 February 9, 2004 Statistical Inter-flow Field Behaviour for Context Replication in ROHC-TCP Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract Context replication increases header compression gains by reducing the redundancy between flows via efficient replicate (IR-CR) packets. The optimum design of IR-CR packet formats requires elaborate understanding of the inter-flow redundancy. As context replication is most well-suited for TCP, this document presents a statistical analysis of TCP/IP inter-flow field behaviour. Based on the analysis, recommendations on ROHC-TCP packet format specifications for context replication are made. It is also shown that inter-flow field behaviour is inherently and significantly asymmetrical, and various ways of handling it are considered. Finally, based on the inter-flow behaviour of TCP Window field, it is noted that current encoding methods do not compress it efficiently. Cho & Hazra [Page 1] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP Table of contents 1. Introduction....................................................2 2. Terminology.....................................................3 3. Header Compression Model........................................4 4. Methodology.....................................................6 5. Results.........................................................9 5.1. IPv4 Identification......................................11 5.2. IP DonÆt Fragment and Time To Live.......................13 5.3. IP Destination Address...................................14 5.4. TCP Source Port..........................................15 5.5. TCP Destination Port.....................................16 5.6. TCP Sequence Number and Acknowledgement Number...........17 5.7. TCP Flags and Urgent Pointer.............................18 5.8. TCP Window...............................................18 5.9. TCP Checksum.............................................21 5.10. TCP Options..............................................21 5.11. Mean Sizes of Compressed Fields..........................21 6. Handling Asymmetrical Inter-flow Behaviour.....................22 7. Security Considerations........................................23 8. References.....................................................23 9. Authors' Addresses.............................................24 Appendix A. State Transition Threshold............................26 1. Introduction Context replication offers an alternative to the conventional context initialization procedure by performing context initialization via more efficient IR-CR packets. In contrast to IR packets, which contain mostly uncompressed fields, IR-CR packets carry compressed header fields, obtained by reducing the redundancy between packets of different flows. As such, header compression can possibly start right from the first packet of a flow and compression efficiency is improved. The motivations for context replication, as well as elaborations on the context replication mechanism are already in [ROHC-CR]. Although context replication is a general ROHC mechanism, this document focuses on the application of context replication to the ROHC-TCP profile in particular. This is because the motivation for context Cho & Hazra [Page 2] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP replication originated from the ROHC-TCP profile, and furthermore due to TCP's æshort-lived' characteristic, context replication is able to improve header compression gains most significantly for the ROHC-TCP profile. Context replication is possible due to significant redundancy between multiple simultaneous, or near-simultaneous flows passing through the same compressor-decompressor pair. For any header compression scheme to work, the first step has to be towards understanding the field behaviour to recognize areas of redundancy. The nature of context relication focuses on relatively unexplored inter-flow field behaviour, rather than well-understood intra-flow field behaviour. In that aspect, [TCP-BEH] provides an elaborate qualitative analysis on TCP/IP field behaviour. However, it has focused more on the intra- flow aspect rather than the inter-flow aspect, for which this document is meant in part as an extension. The difficulty in understanding and describing inter-flow field behaviour is compounded by the fact that it depends on human usage patterns, in addition to the underlying protocol characteristics. This gives inter-flow field behaviour a much larger variance and higher degree of uncertainty. In this document, a method of extracting the inter-flow field behaviour relevant for context replication is presented, as well as the quantitative results of statistical analysis on the TCP/IP inter- flow behaviour, based on four TCPdump traces containing 1.9 million TCP/IP packet samples. From the results, a number of recommendations are made. Firstly, the possibly optimum combination of encoding methods to be used for each field during context replication are recommended, as well as parameters and estimated probabilities of success for each encoding method. Secondly, it is shown that inter-flow field behaviour is significantly asymmetrical, and ways of handling this behaviour are explored. Finally, it is noted that current encoding methods can be improved upon to compress the Window field more efficiently. For verification of the replicate packet format specifications prescribed in this document, the EPIC-LITE implementation [EPIC-IMPL] from the University of Split was modified to support context replication. 2. Terminology This document reuses some of the terminology found in [RFC-3095], [ROHC-TCP], [ROHC-CR], [TCP-BEH], [EPIC-LITE] and [ROHC-FN]. In addition, this document defines the following terms: 'Incoming' and 'Outgoing' Packets 'Incoming' packets are packets traveling towards client hosts through the channel of interest over which ROHC is employed. Cho & Hazra [Page 3] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP 'Outgoing' packets are packets traveling away from client hosts through the channel of interest over which ROHC is employed. Asymmetrical Header Compression Header Compression is performed asymmetrically when 'incoming' and 'outgoing' packets are compressed differently. This requires the packet format specifications for compressor-decompressor pairs to be configured differently depending on the direction of packet flow they deal with. Replication Match Rate The replication match rate for a trace is defined as the percentage of uni-directional flows within the trace which can be context replicated. A new flow is replicable when there is at least one suitable base context present in the compressor upon arrival of the first packet of the flow. This is used as a form of measure to estimate the probability of using context replication for context initialization. State Transition Threshold The State Transition Threshold for a uni-directional flow is the number of initial TCP/IP packets (near the start of a flow) converted into IR or IR-CR packets. 3. Header Compression Model With the objective of extracting the TCP/IP inter-flow field behaviour, we focus on the deployment of ROHC over the final hop. The ROHC compressor-decompressor pair is deployed at the two endpoints of the (possibly wireless) low-bandwidth channel and cooperates to transmit packets efficiently in the direction towards the decompressor. Since TCP requires a full-duplex channel, another compressor-decompressor pair may be present to compress packets in the reverse direction. Considering the direction of flow of packets with respect to clients using the low-bandwidth channel, packets can thus be classified as 'incoming' and 'outgoing'. 'Incoming' and 'outgoing' packets use different compressor-decompressor pairs. This is shown in Fig. 1. Although ROHC was originally targeted at cellular links, the convergence of the telecommunication and computer communication industries means that it may be employed over wireless links in general. As such, the header compression model in Fig. 1 does not define the target ælow-bandwidthÆ channel explicitly. Mobile Terminal clients are connected to the Internet via a last-hop router node as seen in Fig. 1, on which we focus on the æheader compression entityÆ situated on the data link layer of the node. This can have different manifestations depending on the nature of the wireless Cho & Hazra [Page 4] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP +---+ 'outgoing' | C |--- +---+ --- +-------+ +------+ | D |<-- --- | +---+ | -->|Server| +---+ --- -->| | D | | - - - - - - - - - -- +------+ --- | +---+ | / \ -- 'incoming' ---| | C | | | |<-- | +---+ | | | +------+ Clients | |<->| Internet |<-------->|Server| | +---+ | | | +------+ 'outgoing' -->| | D | | | |<-- --- | +---+ | \ / -- +---+ --- ---| | C | | - - - - - - - - - -- +-------+ | C |--- --- | +---+ | -->| Other | +---+ --- +-------+ |Clients| | D |<-- Last-hop +-------+ +---+ 'incoming' Router |__________| |______________________|________| Low-bandwidth Wired Wired or Wireless Channel C - Compressor D - Decompressor Fig. 1: Header compression model showing 'incoming' and 'outgoing' flows link. For example, in Universal Mobile Telecommunications System (UMTS), the ROHC entity is part of the Packet Data Convergence Protocol (PDCP) sub-layer on a Base Station; if ROHC is employed over Wireless Ethernet (IEEE 802.11), it can be part of the data link layer on a wireless router; in Mobile Ad Hoc networks, the ROHC entity can reside on a æforwarding nodeÆ. Due to the nature of the protocol suite under study, we expect client-server computing to dominate over peer-to-peer, as is the case currently. As such, 'incoming' and 'outgoing' flows are inherently asymmetrical. As noted in [ROHC-TCP], some asymmetry is already present in TCP/IP intra-flow field behaviour. An example is the relationship between TCP Sequence Number and Acknowledgement Number, for which 'outgoing' flows are likely to exhibit large deltas between consecutive packets in Acknowledgement Number and small deltas in Sequence Number, but the converse is likely for 'incoming' flows. With respect to context replication, [ROHC-TCP] also acknowledges some inter-flow asymmetry in the TCP source/destination port. Cho & Hazra [Page 5] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP As will be shown in Section 5, asymmetry becomes even more pronounced between flows. The above figure partly serves to illustrate that asymmetrical header compression, if desired, can be achieved by configuring compressor-decompressor pairs differently based on their 'incoming' or 'outgoing' role. Finally, it should be noted that the focus on ROHC over the final hop in Fig. 1 does not reduce the scope of applicability in the obtained results on inter-flow behaviour. In general, header compression may be deployed over any hop, e.g. over a core network links in Multiple Protocol Label Switching (MPLS), or over intermediate hops in Mobile Ad Hoc networks. Regardless of the location of ROHC deployment, the TCP/IP endpoints remain the same. The advantage of focusing on the last hop, then, is that it allows any asymmetrical behaviour to be distilled. Bi-directional asymmetry over intermediate hops causes inherent asymmetrical behaviour to be lost. However, over intermediate hops, inter-flow results continue to be applicable using the symmetric treatment as prescribed in Section 6. 4. Methodology Given the bizarre range of inter-flow field behaviour, a suitable methodology for obtaining inter-flow field behaviour relevant for context replication is proposed. Inter-flow field behaviour can be obtained by emulating a context- replication enabled compressor. To observe any asymmetrical behaviour, Tcpdump traces are fed into the æcompressor emulatorÆ separately, according to the direction they flow, i.e. æincomingÆ or æoutgoingÆ. Thus, the emulator simulates the compressors found on client terminals and routers in the æoutgoingÆ and æincomingÆ directions respectively. In the same way as a compressor, the emulator creates, maintains and updates a list of contexts dynamically for each arriving packet. The emulator keeps an extensible list of contexts, one for each unique TCP connection, arranged in a Most Recently Used (MRU) stack. Each TCP/IP packet updates its context unique for that flow. A context retrieved for updating or referencing is placed at the top of stack, followed by its base context, if a base context has just been simultaneously used as reference. Whenever possible, each new flow is context replicated. Context replication is possible when a base context exists, with the implementation-dependent selection criteria requiring the IP source to be shared, and with preference but no necessity for the same IP destination. For simplicity, all contexts are assumed to be acknowledged by default. Furthermore, if the first packet of a flow can be context replicated, then it is assumed that the subsequent two packets of the flow would also be Cho & Hazra [Page 6] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP replicated. This means that up to the first 3 packets of each flow are converted into IR-CR packets. This number is the upper bound of the State Transition Threshold range, and is based on the estimate of the upper bound of TCP/IP packets possibly converted to IR-CR packets. Elaboration on this would be done in Appendix A. Even though we show results at the upper bound of the State Transition Threshold, it was also found that the inter-flow field behaviour remains invariant at smaller State Transition Threshold values. For the purpose of this study, four Tcpdump traces totaling 1.9 million packets were captured from within the Local Area Network of the Institute for Infocomm Research. The LAN configuration is shown in Fig. 2. Macro statistics of each trace are shown in the Table 1. +--------+ | Client | |Terminal|<- +--------+ - - +--------+ ->|Last-Hop| ->| Router |<- - +--------+ - +--------+ - - +--------+ | Client |<- ->| NAT | |Terminal| ->| Router |<- +--------+ - +--------+ - +--------+ - - +--------+ |Last-Hop|<- ->| Border |<->Internet ->| Router | ->|Gateway | - +--------+ +--------+ - +--------+ - | NAT |<-- ... <- ->| Router | - +--------+ - ... <- Fig. 2: Configuration of Local Area Network Three out of four traces were captured at the Border Gateway, so that traffic from a large number of client terminals can be gathered in each single trace. However, as in most LANs, Network Address Translation (NAT) is in use. NAT transparently changes æoutgoingÆ Source IP Address and Port, as well as æincomingÆ Destination IP Address and Port. Thus, packets captured at the Border Gateway reflect the changed values rather than original values. To deal with this, the forth trace TCP180903 captured at a client terminal was used to investigate these fields as well as to verify results from traces captured at the Border Gateway. Cho & Hazra [Page 7] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP +---------------+-----------+------------+------------+-----------+ | Trace | TCP180803 | TCP080903a | TCP080903b | TCP180903 | |Identification | | | | | +---------------+-----------+------------+------------+-----------+ | Duration | 30 min | 30 min | 30 min | 27.4 hrs | +---------------+-----------+------------+------------+-----------+ | Location | Gateway | Gateway | Gateway | Client | | | Router | Router | Router | Terminal | +---------------+-----------+------------+------------+-----------+ |No. of packets | 516172 | 509281 | 507293 | 383594 | +---------------+-----------+------------+------------+-----------+ | Replication | 97.5 | 94.4 | 94.3 | 93.4 | | Match Rate(%) | | | | | +---------------+-----------+------------+------------+-----------+ Table 1: Macro statistics of TCPdump traces By using packets captured from our LAN, it is assumed that TCP/IP inter-flow field behaviour does not vary significantly between the wired Ethernet-based channel and the target low bandwidth, possibly less reliable channel where header compression takes place. Provided the header compression layer is sufficiently robust to be transparent, this is reasonable because the upper (network, transport and application) layer protocol characteristics and human usage behaviour remains the same. It is desired that the inter-flow behaviour of TCP/IP fields are mapped using a system of classification such that fields within a category share the same characteristic. [TCP-BEH] already provides a good system of classification for intra-flow field behaviour: INFERRED, STATIC, STATIC-DEF, STATIC-KNOWN, CHANGING, where each category follows some general trend(s) hinting how fields in that category may be compressed. For inter-flow behaviour, [TCP-BEH] uses a different system of classification: 'N/A/', 'No', 'Yes', which unfortunately does not achieve the same level of effectiveness, because one can only discern whether a field is compressible for context replication, but does not know how to suitably compress it. Therefore, in this document, the inter-flow field behaviour is classified based on the same categories as used for intra-flow behaviour: INFERRED, STATIC, STATIC-KNOWN, CHANGING. However, it should be noted that the context here lies in inter-flow field behaviour. Furthermore, here STATIC-DEF is merged into STATIC because it is meaningless to define a STATIC category for fields defining a packet stream where inter-flow field behaviour is concerned. Classification can be done with the help of observing the range of deltas. Here, delta is defined as the difference in field value between that in the current packet and the stored field value in the base context. The delta analysis is useful for the following reasons. For any field not known to be INFERRED or STATIC-KNOWN, if delta = 0 in all samples, then this field is a STATIC field. If not, the field Cho & Hazra [Page 8] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP is categorized as CHANGING. For CHANGING fields, by further analyzing the range of deltas obtained, it can be found whether the field can still be encoded using the STATIC encoding method with significant probability. Since deltas tend to be small, the number of least significant bits used (in LSB encoding) to encode that field with a significant probability of success can be determined. Fields which tend to have uniformly distributed deltas may only be suitably encoded as IRREGULAR. Finally, where certain unique trends are observed for a field, raw and/or network-byte-order converted versions of field values are also studied. 5. Results Our initial categorization is shown in Table 2. Differences between intra-flow classification (in [TCP-BEH]) and inter-flow classification here are marked with '(2)'. At this stage, there is no asymmetry observed in categorization between æincoming and æoutgoingÆ flows. +-----------------------------------+------------+ | Field |Category | +-----------------------------------+------------+ |IPv4 Version |STATIC | |IPv4 Header Length |STATIC-KNOWN| |IPv4 Type Of Service |STATIC(1) | |IPv4 ECN Capable Transport |STATIC(1) | |IPv4 Congestion Experienced |STATIC(1) | |IPv4 Packet Length |INFERRED | |IPv4 Identification |CHANGING | |IPv4 Reserved Flag |STATIC(1) | |IPv4 DonÆt Fragment Flag |CHANGING | |IPv4 More Fragments Flag |STATIC-KNOWN| |IPv4 Fragment Offset |STATIC-KNOWN| |IPv4 Time To Live |CHANGING | |IPv4 Protocol |STATIC | |IPv4 Header Checksum |INFERRED | |IPv4 Source Address |STATIC | |IPv4 Destination Address |CHANGING(2) | |TCP Source Port |CHANGING(2) | |TCP Destination Port |CHANGING(2) | |TCP Sequence Number |CHANGING | |TCP Acknowledgement Number |CHANGING | |TCP Data Offset |INFERRED | |TCP Reserved |STATIC(1) | |TCP Congestion Window Reduced |STATIC(1) | |TCP Echo Congestion Experienced |STATIC(1) | |TCP URG flag |CHANGING | |TCP ACK flag |CHANGING | |TCP PSH flag |CHANGING | |TCP RST flag |CHANGING | Cho & Hazra [Page 9] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP |TCP SYN flag |CHANGING | |TCP FIN flag |CHANGING | |TCP Window |CHANGING | |TCP Checksum |CHANGING | |TCP Urgent Pointer |CHANGING | |TCP Options |CHANGING | +-----------------------------------+------------+ (1)These fields were found to be STATIC from samples, but context replication should follow the classification in [TCP-BEH] for future-proofing. (2)Differs from intra-flow classification [TCP-BEH] due to context replication. Table 2: TCP/IP Fields and Classifications Some changes in categorization are made in this study because of the current slow adoption of IP and TCP congestion notification fields. However, these fields are expected to be used in the future and should be CHANGING instead of STATIC. The encoding methods to be used for STATIC, STATIC-KNOWN and INFERRED fields are straightforward, but CHANGING fields need to be further analyzed. This will be unraveled in subsequent sub-sections. CHANGING fields can sometimes be encoded with STATIC, LSB, or other encoding methods with significant probability. For LSB encoding, it is desired to determine the suitable number of least significant bits to be used to encode that field. Therefore, our frequency bins are defined in increasing ceil(log2(|delta|+1)) (the reason for this expression will be elaborated later in this section), which is effectively the minimum number of bits possibly used to encode delta values within that bin. Negative delta values are mapped to ûceil(log2(|delta|+1)), and are useful for defining the offset value used in LSB encoding. From our frequency tables, we can also derive the correct combination of encoding methods to use, as well as the estimated probability of each encoding method being used. The inter-flow behaviour of CHANGING fields can be summarized directly in the form of packet format specifications for IR-CR packets. This is shown in Fig. 3, in EPIC-LITE terminology [EPIC- LITE], which is derived from the BNF input language [RFC-2234]. To illustrate asymmetrical inter-flow behaviour, packet format specifications with any differences between 'incoming' and 'outgoing' flows are defined separately for each field with the postfix ô_inö or ô_outö. Note however that if the same set of encoding methods are used in both directions for the same field, and only the probabilities are different, then it may mean that significant asymmetrical behaviour has not been observed. Cho & Hazra [Page 10] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP Identification_in ::= NBO(16) ;network byte order LSB(3,-1,50%) | LSB(8,-1,17%) | IRREGULAR(33%) Identification_out ::= NBO(16) LSB(3,-1,65%) | LSB(8,-1,14%) | IRREGULAR(21%) DonÆt_Fragment_in ::= STATIC(73%) | IRREGULAR(1,27%) DonÆt_Fragment_out ::= STATIC(99%) | IRREGULAR(1,1%) Time_To_Live_in ::= STATIC(98%) | IRREGULAR(8,2%) Time_To_Live_out ::= STATIC(97%) | IRREGULAR(8,3%) Destination_Address_in ::= STATIC(100%) Destination_Address_out ::= STATIC(86%) | IRREGULAR(32,14%) Source_Port_in ::= STATIC(70%) | IRREGULAR(16,30%) Source_Port_out ::= LSB(3,0,73%) | LSB(8,0,14%) | IRREGULAR(16,13%) Destination_Port_in ::= LSB(3,0,73%) | LSB(8,0,14%) | IRREGULAR(16,13%) Destination_Port_out ::= STATIC(70%)| IRREGULAR(16,30%) Sequence_Number ::= IRREGULAR(32,100%) Acknowledgement_Number_in ::= IRREGULAR(32,100%) Acknowledgement_Number_out ::= VALUE(32,0,33%) | IRREGULAR(32,67%) URG_flag ::= IRREGULAR(1,100%) ACK_flag ::= IRREGULAR(1,100%) PSH_flag ::= IRREGULAR(1,100%) RST_SYN_FIN_flag ::= VALUE(3,2,30%) | VALUE(3,0,65%) | IRREGULAR(3,5%) Urgent_Pointer ::= STATIC(99%) | IRREGULAR(16,1%) Window_in ::= STATIC(30%)| IRREGULAR(16,70%) Window_out ::= STATIC(43%) | IRREGULAR(16,57%) Fig. 3. Packet format specifications for CHANGING fields. In Fig. 3, specifications are expressed in the notation used by EPIC- LITE instead of the Formal Notation [ROHC-FN] due to a number of reasons. Firstly, basic encoding methods used in both remain the same, and so EPIC-LITE expressions can be easily converted into Formal Notation. Moreover, the equivalent of the 'multiple_packet_ formats' encoding method in ROHC-FN, used to specify multiple encoding methods for a field, can be represented in a more compact Cho & Hazra [Page 11] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP form using the OR operator, '|' in EPIC-LITE. Also, because EPIC-LITE involves Huffman coding, it allows the expression of the probability of each encoding method being successful as a parameter, which is also useful for expressing the frequency of use of an encoding method. Finally, it allows the packet format specifications to be readily verified via context replication implementation in EPIC-LITE. Details of the inter-flow behaviour of each CHANGING field are elaborated in the following sub-sections. 5.1. IPv4 Identification Table 3 shows the distribution of delta values in logarithmic scale. Note that for delta > 0, the number of bits used to encode the delta may be expressed as n = ceil(log2(|delta|+1)), as we are trying to find the smallest n for which delta <= 2^n - 1. For delta < 0, the equivalent mapping is n = -ceil(log2(|delta|+1)). +--------+---------------+-----------+-----------+ |Encoded | Delta Range | Incoming | Outgoing | |Bits,n | | Frequency | Frequency | +--------+---------------+-----------+-----------+ |-16 |[-65535:-32768]| 6.0% | 2.3% | |-15 |[-32767:-16384]| 4.5% | 2.1% | |-14 |[-16383:-8192] | 2.4% | 2.1% | |-13 |[-8191:-4096] | 1.5% | 0.8% | |-12 |[-4095:-2048] | 0.7% | 0.6% | |-11 |[-2047:-1024] | 0.3% | 0.3% | |-10 |[-1023:-512] | 0.2% | 0.1% | |-9 |[-511:-256] | 0.1% | 0.1% | |-8 |[-255:-128] | 0.1% | 0.1% | |-7 |[-127:-64] | 0.0% | 0.0% | |-6 |[-63:-32] | 0.0% | 0.0% | |-5 |[-31:-16] | 0.0% | 0.0% | |-4 |[-15:-8] | 0.0% | 0.0% | |-3 |[-7:-4] | 0.1% | 0.0% | |-2 |[-3:-2] | 0.2% | 0.2% | |-1 |[-1] | 0.6% | 0.4% | |0 |[0] | 0.3% | 0.0% | |1 |[1] | 23.4% | 33.7% | |2 |[2:3] | 20.6% | 20.8% | |3 |[4:7] | 6.6% | 10.5% | |4 |[8:15] | 3.9% | 4.3% | |5 |[16:31] | 3.6% | 3.3% | |6 |[32:63] | 3.6% | 2.4% | |7 |[64:127] | 3.4% | 2.0% | |8 |[128:255] | 2.3% | 1.6% | |9 |[256:511] | 2.3% | 1.2% | Cho & Hazra [Page 12] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP |10 |[512:1023] | 1.7% | 1.2% | |11 |[1024:2047] | 1.4% | 1.1% | |12 |[2048:4095] | 0.9% | 1.0% | |13 |[4096:8191] | 1.3% | 1.1% | |14 |[8192:16383] | 2.5% | 2.3% | |15 |[16384:3276] | 3.0% | 2.4% | |16 |[32768:65535] | 2.4% | 1.9% | +--------+---------------+-----------+-----------+ Table 3: Frequency distribution of Identification delta Slightly asymmetrical behaviour can be observed from Table 3. æIncomingÆ replicated packets are less likely to be encoded within 3 bits compared to æoutgoingÆ replicated packets. Moreover, æincomingÆ delta values are more distributed, with higher occurrence of negative deltas as well as deltas encodable between 6 to 10 bits. This is reasonable because æincomingÆ replicated packets face larger deltas due to busy servers handling multiple connections simultaneously or near-simultaneously. Inter-flow Identification deltas for æoutgoingÆ replicated packets tend to be smaller than for æincomingÆ, as clients do not usually maintain a large number of simultaneous or near-simultaneous TCP connections. It should be noted that Table 3 depicts network-byte-order corrected Identification deltas. Typical implementation policies of IPv4 Identification increment are: sequential (increments by 1), sequential-jump (typically increments by 256) and random. Linux based implementations usually implements the sequential policy, and older versions of Microsoft Windows usually implements the sequential-jump policy with a jump size of 256. This is the equivalent of incrementing the more significant byte of the two-byte Identification field by 1. From a compression viewpoint, sequential-jump implementations can be network-byte-order corrected at the compressor end and reverted back to the original form at the decompressor end. This approach has the advantage of compressing Identification fields generated from both policies efficiently using the same encoding method. A network byte order (NBO) flag is communicated to differentiate between the two policies. Randomly incremented Identification implementations cannot be efficiently compressed and are sent as-is. Current proposals for context replication compresses the IPv4 Identification field into 0 or 16 bits, using VALUE and IRREGULAR encoding methods respectively. The VALUE encoding method is suitable for protocols like DHCP, and is not seen in Fig. 3 because we are focusing on TCP/IP. However, it can be seen from the above inter-flow behaviour that this field can also be compressed more efficiently using LSB encoding, with recommended parameters as shown in Fig. 3. Cho & Hazra [Page 13] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP 5.2. IP DonÆt Fragment Flag and Time to Live The DF Flag is a single bit which may be set or unset. Although it may be impractical to allow multiple encoding methods for a single bit field, for the sake of characterizing its behaviour, STATIC and IRREGULAR encoding methods are used. The IPv4 TTL (or equivalently, IPv6 Hop Limit) is a 8-bit field which remains constant when the route between the two endpoints is unchanged; when the route does change due to congestion, it is better to simply send the field uncompressed. Therefore, DF can be further analyzed in the same category as TTL: we either encode them as STATIC, or uncompressed as IRREGULAR. The actual probabilities associated with each encoding method based on the samples is shown in Table 4. +----------------+--------+-----------+ |Encoding Method | STATIC | IRREGULAR | +----------------+--------+-----------+ | æIncomingÆ flows | +-------------------------------------+ |DonÆt Fragment | 72.8% | 27.2% | |Time To Live | 98.1% | 1.9% | +----------------+--------+-----------+ | æOutgoingÆ flows | +-------------------------------------+ |DonÆt Fragment | 98.5% | 1.5% | |Time To Live | 96.9% | 3.1% | +-------------------------------------+ Table 4: Percentage frequency of STATIC and IRREGULAR for DF and TTL 5.3. IP Destination Address We have allowed for an implementation to use context replication for scenarios where packets share at least the same Source IP Address, but the Destination IP Address may be different. Therefore, the Destination IP Address may be STATIC or IRREGULAR for these two scenarios. The proportion of IR-CR packets replicable due to the same/different Destination IP Address is of interest. This determines how effective the use of context replication to cover different IP Destination Addresses can be. This proportion is tabulated in Table 5. Cho & Hazra [Page 14] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP +------------+----------+-----------+ | | STATIC | IRREGULAR | +------------+----------+-----------+ | 'Incoming' | 100.0% | 0.0% | | 'Outgoing' | 85.8% | 14.2% | +------------+----------+-----------+ Table 5: Percentage frequency of STATIC and IRREGULAR for IP Destination Address. As can be noted from Table 5, the results are skewed towards STATIC (same Destination IP Address). This is because our emulator selects the base context with preference for sharing the same Source and Destination IP Address, although it is much easier to find contexts sharing only the same Source IP Address. For some intervals, the proportion of 'outgoing' IRREGULAR cases got as high as 48%. Asymmetry is again observed to be inherent between æincomingÆ and æoutgoingÆ flows. æIncomingÆ flows originating from Internet servers are not likely to engage multiple common subnet clients within a short period of time. However, the converse is true for æoutgoingÆ flows, corresponding to prevalent usage patterns. Our results also justify the virtue of an implementation which considers context replication for cases even when the Destination IP Address is different. This maximizes context replication efficiency gains for æoutgoingÆ flows. 5.4. TCP Source Port As can be seen from Table 6, clearly asymmetrical inter-flow behaviour is observed for the TCP Source Port field. This behaviour is seen mainly because ports at servers are well-known ports which remain unchanged. +---------------------------------------------+ |Encoded | Delta Range | Incoming |Outgoing | |Bits,n | | Frequency|Frequency| +--------+---------------+----------+---------+ |-16 |[-65535:-32768]| 0.0% | 0.0% | |-15 |[-32767:-16384]| 0.0% | 0.0% | |-14 |[-16383:-8192] | 0.0% | 0.0% | |-13 |[-8191:-4096] | 0.0% | 0.3% | |-12 |[-4095:-2048] | 5.8% | 0.2% | |-11 |[-2047:-1024] | 1.8% | 0.6% | |-10 |[-1023:-512] | 0.1% | 1.7% | |-9 |[-511:-256] | 1.0% | 0.0% | |-8 |[-255:-128] | 0.5% | 0.0% | Cho & Hazra [Page 15] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP |-7 |[-127:-64] | 0.7% | 0.0% | |-6 |[-63:-32] | 0.7% | 0.0% | |-5 |[-31:-16] | 0.0% | 0.0% | |-4 |[-15:-8] | 0.0% | 0.0% | |-3 |[-7:-4] | 0.3% | 0.1% | |-2 |[-3:-2] | 0.0% | 0.1% | |-1 |[-1] | 0.0% | 2.3% | |0 |[0] | 72.0% | 15.8% | |1 |[1] | 0.0% | 31.9% | |2 |[2:3] | 0.0% | 17.4% | |3 |[4:7] | 0.0% | 7.8% | |4 |[8:15] | 0.1% | 4.7% | |5 |[16:31] | 0.1% | 3.3% | |6 |[32:63] | 0.3% | 2.0% | |7 |[64:127] | 0.3% | 3.0% | |8 |[128:255] | 0.7% | 1.1% | |9 |[256:511] | 0.8% | 2.7% | |10 |[512:1023] | 3.0% | 3.2% | |11 |[1024:2047] | 10.5% | 1.5% | |12 |[2048:4095] | 1.2% | 0.1% | |13 |[4096:8191] | 0.0% | 0.3% | |14 |[8192:16383] | 0.0% | 0.0% | |15 |[16384:3276] | 0.0% | 0.0% | |16 |[32768:65535] | 0.1% | 0.0% | +--------+---------------+----------+---------+ Table 6: Frequency distribution of Source Port delta 5.5. TCP Destination Port The inter-flow behaviour of the TCP Destination Port field is shown in Table 7. It can be observed that the trend is the opposite to that of the TCP Source Port presented previously. This can be accounted for obviously because the Destination Ports of æoutgoingÆ packets are the Source Ports of replying æincomingÆ packets. +--------+---------------+-----------+-----------+ |Encoded | Delta Range | Incoming | Outgoing | |Bits,n | | Frequency | Frequency | +--------+---------------+-----------+-----------+ |-16 |[-65535:-32768]| 0.0% | 0.0% | |-15 |[-32767:-16384]| 0.0% | 0.0% | |-14 |[-16383:-8192] | 0.0% | 0.0% | |-13 |[-8191:-4096] | 0.3% | 0.0% | |-12 |[-4095:-2048] | 0.0% | 0.4% | |-11 |[-2047:-1024] | 0.0% | 4.1% | |-10 |[-1023:-512] | 0.0% | 2.0% | |-9 |[-511:-256] | 0.0% | 0.1% | |-8 |[-255:-128] | 0.0% | 0.9% | |-7 |[-127:-64] | 0.0% | 0.4% | Cho & Hazra [Page 16] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP |-6 |[-63:-32] | 0.0% | 0.5% | |-5 |[-31:-16] | 0.0% | 1.9% | |-4 |[-15:-8] | 0.0% | 0.0% | |-3 |[-7:-4] | 0.3% | 0.0% | |-2 |[-3:-2] | 0.2% | 0.2% | |-1 |[-1] | 6.8% | 0.0% | |0 |[0] | 23.3% | 74.3% | |1 |[1] | 33.4% | 0.0% | |2 |[2:3] | 8.4% | 0.1% | |3 |[4:7] | 6.9% | 0.0% | |4 |[8:15] | 3.8% | 0.1% | |5 |[16:31] | 2.8% | 0.1% | |6 |[32:63] | 2.3% | 0.8% | |7 |[64:127] | 3.4% | 0.2% | |8 |[128:255] | 1.2% | 0.4% | |9 |[256:511] | 2.7% | 0.8% | |10 |[512:1023] | 2.4% | 2.1% | |11 |[1024:2047] | 1.4% | 8.2% | |12 |[2048:4095] | 0.0% | 1.8% | |13 |[4096:8191] | 0.4% | 0.4% | |14 |[8192:16383] | 0.0% | 0.1% | |15 |[16384:3276] | 0.0% | 0.0% | |16 |[32768:65535] | 0.0% | 0.0% | +--------+---------------+-----------+-----------+ Table 7: Frequency distribution of Destination Port delta 5.6. TCP Sequence Number and Acknowledgement Number The TCP Sequence Number (SEQNUM) cannot be replicated as the inter- flow delta is random with a uniform probability density function, regardless of the direction of flow. The TCP Acknowledgement Number (ACKNUM) generally follows the randomness of SEQNUM, but a particular behaviour can be exploited for compression of the first packet of most æoutgoingÆ flows. All handshaking packets with SYN set but ACK clear (the first packet of TCP connections) carry ACKNUM with zero value. This is a behaviour unique to æoutgoingÆ flows because service-requesting clients typically initiate the first packet within TCP connections. The first æincomingÆ packet typically carries both SYN and ACK set, and ACKNUM would be non-zero. Because up to the third packet of each flow may be replicated, this represents at least 30% to 100% of all æoutgoingÆ replicated packets. Thus, ACKNUM can at worst be compressed as shown in Fig. 3. Alternatively, instead of basing the specifications on asymmetry, all compressor-decompressor pairs can treat the SYN-set ACK-not-set case as a flag to infer that the value of ACKNUM is 0. These fields are already appropriately handled as prescribed in [ROHC-TCP]. Cho & Hazra [Page 17] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP 5.7. TCP Flags and Urgent Pointer ôTCP Flagsö refers to the TCP group of six flags: URG (Urgent), ACK (Acknowledgement), PSH (Push), RST (Reset), SYN (Synchronize) and FIN (Finish). The URG flag was not found to be set in almost our entire sample, i.e. it is much more likely to be 0 than 1. In some applications, however, the URG flag may be used extensively. Thus, it can be encoded as IRREGULAR(1,100%). The URG flag is also useful for indicating the presence of the Urgent Pointer field. The compressor- decompressor pair can treat this field as IRREGULAR when URG is set and zero when URG is not set. ACK is not set only in the first handshaking packet of all connections (similar to ACKNUM), as well as in some minority packets with RST set. Since the proportion of IR-CR packets carrying an unset ACK can range from 33% to 100%, it should be sent as IRREGULAR(1,100%). PSH was found to be varying unpredictably between 0 and 1, and is thus best left as IRREGULAR(1,100%). There is high correlation between RST, SYN and FIN behaviour, allowing them to be encoded together. RST and FIN are not set in almost 100% of replicated packets. These three flags can therefore encoded as: VALUE(3,2,30%) | VALUE(3,0,65%) | IRREGULAR(3,5%). Equivalently, these three flags can also be encoded as prescribed in [ROHC-TCP] using the ôindexö encoding method, with FIN or RST exclusively set as the two other common values. 5.8. TCP Window Table 8 shows the delta distribution. For flows in both directions, the main peak is at delta = 0, with amplitude 43% for æoutgoingÆ replicated packets and 30% for æincomingÆ packets. We can encode these cases with STATIC encoding. +--------+---------------+-----------+-----------+ |Encoded | Delta Range | Incoming | Outgoing | |Bits,n | | Frequency | Frequency | +--------+---------------+-----------+-----------+ |-16 |[-65535:-32768]| 0.0% | 0.0% | |-15 |[-32767:-16384]| 3.4% | 2.8% | |-14 |[-16383:-8192] | 0.2% | 0.4% | |-13 |[-8191:-4096] | 14.0% | 2.1% | |-12 |[-4095:-2048] | 20.7% | 0.9% | |-11 |[-2047:-1024] | 1.3% | 0.1% | |-10 |[-1023:-512] | 6.6% | 1.7% | Cho & Hazra [Page 18] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP |-9 |[-511:-256] | 4.4% | 2.3% | |-8 |[-255:-128] | 4.1% | 0.8% | |-7 |[-127:-64] | 0.6% | 2.6% | |-6 |[-63:-32] | 0.4% | 1.2% | |-5 |[-31:-16] | 0.2% | 0.7% | |-4 |[-15:-8] | 0.1% | 0.5% | |-3 |[-7:-4] | 0.1% | 0.1% | |-2 |[-3:-2] | 0.2% | 0.0% | |-1 |[-1] | 0.2% | 0.0% | |0 |[0] | 30.4% | 43.2% | |1 |[1] | 0.1% | 0.0% | |2 |[2:3] | 0.1% | 0.1% | |3 |[4:7] | 0.1% | 0.1% | |4 |[8:15] | 0.1% | 0.2% | |5 |[16:31] | 0.2% | 0.2% | |6 |[32:63] | 0.1% | 0.8% | |7 |[64:127] | 0.4% | 1.7% | |8 |[128:255] | 0.2% | 3.4% | |9 |[256:511] | 1.1% | 4.0% | |10 |[512:1023] | 1.1% | 6.8% | |11 |[1024:2047] | 2.0% | 3.0% | |12 |[2048:4095] | 0.5% | 0.1% | |13 |[4096:8191] | 2.3% | 0.3% | |14 |[8192:16383] | 2.5% | 3.2% | |15 |[16384:3276] | 0.1% | 3.5% | |16 |[32768:65535] | 2.2% | 13.1% | +--------+---------------+-----------+-----------+ Table 8: Frequency distribution of Window delta Unlike other fields, Window delta values tend not to cluster near the main peak. This is an expected behaviour. Naturally, LSB would not be a suitable encoding method for the Window field. A number of secondary peaks can be observed in Table 8, which suggests that Windows tend to vary among a few discontinuous but commonly used values. We determine the most common Window values for æincomingÆ and æoutgoingÆ flows separately and obtain a distribution of these common Window values. This is shown in Table 9. It can be observed again that asymmetry is inherent between æincomingÆ and æoutgoingÆ flows. In this case, asymmetry is due to the use of a different range of popular Window values between æincomingÆ and æoutgoingÆ flows. æIncomingÆ advertised Window fields typically come from HTTP servers sending data more than receiving data. Servers typically advertise their receiver window conservatively and are slow to grow their windows, to prevent data overloads from handling multiple clients concurrently, and because of the congestion window Cho & Hazra [Page 19] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP slow start algorithm [RFC-2581]. On the other hand, sources of æoutgoingÆ traffic are normally clients downloading data from servers. To utilize bandwidth efficiently, the advertised window is usually large, usually right from the first packet. This is consistent with recent proposals for increasing the TCP initial Window size [RFC-3390]. +----------------------+----------------------+ | Incoming | Outgoing | +--------+-------------+--------+-------------+ | Value | Probability | Value | Probability | | | (%) | | (%) | +--------+-------------+--------+-------------+ | 1380 | 1.1 | 1460 | 1.6 | | 1460 | 23.5 | 2920 | 1.6 | | 2760 | 1.3 | 8192 | 3.1 | | 2920 | 22.2 | 8280 | 6.6 | | 5840 | 2.2 | 16384 | 10.3 | | 8280 | 11.7 | 16560 | 8.0 | | 11680 | 4.9 | 64240 | 26.3 | | 16384 | 6.9 | 64860 | 8.8 | | 16560 | 2.1 | 65520 | 2.6 | | 65535 | 4.6 | 65535 | 18.3 | +--------+-------------+--------+-------------+ | Total | 80.4 | - | 87.2 | +--------+-------------+--------+-------------+ Table 9: Common Window field values The common values of the Window field, inclusive of all category values found in Table 9, can be typically expressed as either (i) a multiple of the Maximum Segment Size of the end-to-end channel, or (ii) a raised power of 2, with possibly an offset of 1. The Maximum Segment Size (MSS) is negotiated between both TCP endpoints, through the TCP Options in TCP handshaking packets. The negotiated MSS and is in turn derived from the IP Maximum Transfer Unit (MTU) of the underlying network [RFC-1122]. The MTU over Ethernet is 1500 bytes, or 1492 if used with Sub-network Attachment Point (SNAP), or 1300 if used with PPP over Ethernet (for ADSL links). Subtracting 40 bytes for TCP/IPv4 protocol stack, or 60 bytes for the TCP/IPv6 protocol stack, or 120 bytes for maximum TCP/IP header size, typically advertised MSS values are 1460, 1380, 1260, 1440 or 1452 bytes, in decreasing popularity. From the above set of MSS values, 1460 and 1380 are used almost exclusively. Consequently, almost all the Window values found in Table 9 can be expressed either as multiples of 1460 or 1380. Exceptions are 8192, 16384, 65535, which are raised powers of 2 with possibly offset of 1, and 65520, which is a multiple of 1260. Cho & Hazra [Page 20] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP Thus, commonly used Window values not expressible as multiples of the MSS values are raised powers of 2 with possibly an offset of 1. From Table 9, 8192, 16384 and 65535 are 2^13, 2^14 and 2^16 - 1 respectively. Also, the TCP Window is always 0 when RST (Reset flag) is set. Therefore, the decompressor can infer the Window value whenever RST is set and there is no need to send it. The TCP Window field is used in both congestion and flow control. The use of congestion control can account partly for the commonly used values discussed above, as congestion control changes are in multiples of the MSS. However, values due to flow control do not follow the pattern discussed above but are typically small offsets from the above commonly used values. Currently, the Window field is either encoded as STATIC or IRREGULAR for context replication [ROHC-TCP]. The above observations illustrate that current use of encoding methods do not sufficiently make use of the unique behaviour of the Window field. It also provides the motivation for devising a more efficient way of encoding the Window field. This encoding method is elaborated upon in [TCP-WIN]. 5.9. TCP Checksum The TCP Checksum field covers the pseudo-header, payload and TCP header, and varies between packets. Although ROHC packets may contain a CRC field, the CRC does not cover the payload. Since it is important to preserve data integrity, the Checksum field is sent uncompressed as IRREGULAR (16,100%). 5.10. TCP Options TCP options contain a wide variety of optional fields, but commonly used options include the MSS, Window Scale and SACK-Permitted found in handshaking packets. These fields do not change between replicated packets and can thus be compressed efficiently as STATIC for context replication. 5.11. Mean Sizes of Compressed Fields Table 10 shows the TCP/IP fields found in æincomingÆ IR-CR packets and calculates the mean sizes of their encoded forms. Compressed TCP/IP fields take up a mean size of 107.3 bits for æincomingÆ flows. By repeating the calculation based on æoutgoingÆ packet format specifications, it can be shown that the mean æoutgoingÆ IR-CR size is 97.5 bits. Cho & Hazra [Page 21] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP +---------------------+------+--------------------------+-------+ | | Size | Encoded size (bits) & | Mean | | Field | | probability |Encoded| | |(bits)| | Size | | | | | (bits)| +---------------------+------+--------------------------+-------+ |IPv4 Identification | 16 | 3(50%) | 8(17%) | 16(33%)| 8.14 | |IPv4 DonÆt Fragment | 1 | 0(73%) | 1(27%) | 0.27 | |IPv4 Time To Live | 8 | 0(98%) | 8(2%) | 0.16 | |IPv4 Dest. Address | 32 | 0(98%) | 32(2%) | 0.64 | |TCP Source Port | 16 | 0(70%) | 16(30%) | 4.80 | |TCP Dest. Port | 16 | 3(73%) | 8(14%) | 16(13%)| 5.39 | |TCP Sequence Number | 32 | 32(100%) | 32 | |TCP Ack. Num | 32 | 32(100%) | 32 | |TCP flags | 8 | 2(95%) | 5(5%) | 2.15 | |TCP Window | 16 | 0(30%) | 6(47%) | 4(8%) | 5.54 | | | | | 16(15%) | | |TCP Checksum | 16 | 16(100%) | 16 | |TCP Urgent Pointer | 16 | 0(99%) | 16(1%) | 0.16 | +---------------------+------+--------------------------+-------+ |TOTAL | 209 | - | 107.3 | +---------------------+------+--------------------------+-------+ Table 10: Mean Encoded Sizes of æincomingÆ TCP/IP Fields 6. Handling Asymmetrical Inter-flow Behaviour From the previous section, and as summarized in Fig. 3, some TCP/IP fields exhibit inherently asymmetrical behaviour. The issue, then, is to explore various ways of handling such asymmetrical behaviour such that the gain versus complexity tradeoff can be optimized. As observable from the header compression model in Fig. 1 and asymmetrical packet format specifications in Fig. 3, asymmetrical inter-flow behaviour can be handled by asymmetrical header compression. This can be done by configuring compressor-decompressor using a different set of packet format specifications, based on their 'incoming' or 'outgoing' role. While this treatment has the highest compression efficiency, its main disadvantage is that it may be more complicated than symmetrical header compression. Alternatively, asymmetrical behaviour can also be handled using symmetrical packet format specifcations, by expanding the use of the 'multiple_packet_formats' encoding method [ROHC-FN] to cover asymmetrical behaviour, at the cost of using a few more 'discriminator bits'. This is the methodology being adopted in current ROHC drafts. Cho & Hazra [Page 22] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP From Fig. 3, the fields exhibiting significant asymmetrical behaviour are the IP Destination Address, TCP Source Port, Destination Port and Acknowledgement Number. (The behaviour of TCP Window is in fact also asymmetrical, but asymmetry cannot be expressed using current encoding methods) To handle these fields symmetrically, the following packet format specifications can be used instead: Destination_Address ::= STATIC(.) | IRREGULAR(32,.) %1 discriminator % bit Source_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) | IRREGULAR(16,.) %2 discriminator bits Destination_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) | IRREGULAR(16,.) %2 discriminator bits Acknowledgement_Number ::= VALUE(32,0,.) | IRREGULAR(32,.) %1 discriminator bit Fig. 4: Symmetrical packet format specifications for fields with asymmetrical behaviour The asymmetrical behaviour of Window field may be handled efficiently using a proposed encoding method as elaborated in [TCP- WIN]. This encoding method can be either symmetrical or asymmetrical. 7. Security Considerations This document does not bring any new additional security considerations. 8. References [RFC-3390] Allman, M., Floyd, S., Partridge, C.,. ôIncreasing TCPÆs Initial Windowö, RFC 3390, October 2002. [RFC-3095] Bormann, C., Burmeister, C., Degermark, M., Fukushima, H., Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T., Le, K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro, K., Wiebke, T., Yoshimura, T. and H. Zheng, "RObust Header Compression (ROHC): Framework and four profiles: RTP, UDP, ESP, and uncompressed", RFC 3095, July 2001. [RFC-2581] Allman, M., Paxon, V., Stevens, W., ôTCP Congestion Controlö, RFC 2581, April 1999. [RFC-2234] Crocker D, et al, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, 1997. Cho & Hazra [Page 23] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP [RFC-1122] R. Braden, Editor, ôRequirements for Internet Hosts û Communication Layersö, RFC 1122, 1989. [ROHC-TCP] Pelletier, G., Zhang, Q., Jonsson, L-E., Liao, H., West, M., "RObust Header Compression (ROHC): TCP/IP Profile (ROHC-TCP)", Internet Draft (work in progress), , May 2003. [TCP-BEH] West, M. and S. McCann, "TCP/IP Field behavior", Internet Draft (work in progress), , March 2003. [ROHC-CR] Pelletier, G., "RObust Header Compression (ROHC): Context Replication for ROHC Profiles", Internet Draft (work in progress), , October 2003. [ROHC-FN] "Formal Notation for Robust Header Compression (ROHC-FN)", R. Price et al., (work in progress), March 2003 [EPIC-LITE] Price, R., Hancock, R., McCann, S., Surtees, A., Ollis, P., West, M., "Framework for EPIC-LITE", Internet Draft (work in progress), , 2002. [EPIC-IMPL] L. Vidjak, M. Stula, J. Ozegovic, "Program Structures for EPIC-LITE Experimental Implementation", SoftCOM 2002. [TCP-WIN] Cho, C.Y., Hazra, S.K., ôEncoding Method for TCP Window in Context Replicationö, Internet Draft, to be submitted. 9. Authors' Addresses Chia Yuan Cho Institute for Infocomm Research (I2R) 21 Heng Mui Keng Terrace Singapore 119613 Phone: +65 6874 6643 Email: stucyc2@i2r.a-star.edu.sg Cho & Hazra [Page 24] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP Sukanta Kumar Hazra Institute for Infocomm Research (I2R) 21 Heng Mui Keng Terrace Singapore 119613 Phone: +65 6874 1953 Email: sukanta@i2r.a-star.edu.sg Cho & Hazra [Page 25] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP Appendix A. State Transition Threshold The aim of this section is to determine a reasonable range for the number of initial TCP/IP packets possibly converted into IR or IR-CR packets, which is defined as the State Transition Threshold. The compressor state machine controls the type of packet transmitted to the decompressor. As elaborated in [ROHC-TCP], transition from the CR state to CO state at the compressor is initiated optimistically or explicitly through reception of an ROHC ACK from the decompressor. Because at least 1 IR/IR-CR packet must be sent before state transition, the State Transition Threshold, H is such that H: H >= 1. The State Transition Threshold is different from simply the number of context initializing IR/IR-CR packets sent because in uni-directional mode or optimistic bidirectional mode, a single TCP/IP packet may be sent as a number of duplicate IR/IR-CR packets (To allow the compressor to gain the optimistism necessary for upwards transition). A range of suitable values for H is derived the protocol stack nature and channel characteristics. For the TCP/IP protocol stack, we begin by looking at the first few packets exchanged for a TCP connection. Fig. 4 shows a TCP connection using TCP/IP header compression over a low-bandwidth channel. Packets in the forward direction are numbered. The first TCP packet is always converted into an IR/IR-CR packet. In the following analysis, we focus on the compressor at the client and the decompressor at the router. Suppose the channel is full-duplex, and an ROHC ACK is sent upon the successful decompression of the first packet. ROHC ACKs may be piggybacked. The earliest possible ROHC ACK sent is indicated in Fig. 4 as a dotted arrow. When the compressor receives the ROHC ACK, it transits from IR/CR to CO state. Subsequently, it starts sending CO packets instead. If the channel is reliable, then the compressor receives its ROHC ACK before it sends the second TCP/IP packet and only a single TCP/IP packet becomes an IR/IR-CR packet, i.e. H = 1. This is also likely if the router-server RTT >> client-router RTT, for which case even if the first ROHC ACK is lost, the compressor may be offered ample opportunity to receive retransmitted ROHC ACKs before it sends the packet #2. Conversely, if the channel is unreliable, and/or if client-router RTT >> router-server RTT (as is likely the case for cellular links), then it is likely that the ROCH ACK is not received immediately and subsequent TCP/IP packets are still sent as IR-CR packets. However, as seen from Fig. 4, the time lapse between TCP/IP packet #1 and packet #4 is long compared to all subsequent packets (when the TCP sliding window mechanism kicks in), and it is reasonable to assume that the ROHC ACK is received before packet #4 is sent. Thus, a reasonable range is 1 <= H <= 3. Cho & Hazra [Page 26] Internet-document Statistical Inter-flow Field Behaviour February 2004 for Context Replication in ROHC-TCP Client Router Server | | | SYN |--- #1 | | | --- | | | -->|--- | | ...| --- | | ... | -->| +-- ROHC ACK |<.. | ---| SYN,ACK | (best case) | | --- | | | ---|<-- | | | --- | | | |<-- | | | ACK |--- #2 | | | | --- | | | request |--- #3-->|--- | | | --- | --- | | | -->|--- -->| | large | | --- | | time | | -->| | lapse | | ---| reply | | | --- | | | ---|<-- | | | --- | | +--(worst case)|<-- | | |--- #4 | | | --- | | | -->|--- | | | --- | | | -->| Compressor Decompressor |_________|_________| Low Wired Bandwidth or Channel Wireless Fig. 4: TCP handshaking and ROHC ACKs Finally, because TCP/IP contains bi-directional traffic, header compression may occur in both directions and in this case the overall state transition threshold is Ho = 2H. For uni-directional protocol stacks like RTP/UDP/IP, the overall state transition threshold Ho remains at H. Cho & Hazra [Page 27]