Network Working Group Hans Hannu, Ericsson INTERNET-DRAFT Jan Christoffersson, Ericsson Expires: August 2001 Krister Svanbro, Ericsson Sweden February 23, 2001 RObust GEneric message size Reduction (ROGER) Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/lid-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This document is an individual submission to the ROHC working group in IETF. Comments should be directed to the authors or to the ROHC mailing list (rohc@cdt.luth.se). Abstract Using existing ASCII based application signaling protocols over bandwidth limited channels, such as cellular access channels, create problems with e.g. long session setup times, long control times and waste scarce radio resources. This draft provides a robust and efficient compression scheme for ASCII based protocols, which reduces the mentioned problems. Hannu, Christoffersson, Svanbro [Page 1] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 TABLE OF CONTENTS 1. Introduction..................................................4 2. General description...........................................4 3. Terminology...................................................6 4. Compression algorithm.........................................7 4.1. Dictionary build-up and maintenance.........................8 5. Message header................................................9 5.1. Message ID field...........................................10 5.2. Bit-mask...................................................11 5.3. The CRC for messages.......................................11 5.4. Errors in the Dynamic Dictionary...........................12 5.5. Wrap around................................................12 5.6. Avoiding deadlock..........................................13 6. Compressor-Decompressor entities.............................13 6.1. No contact mode............................................13 6.1.1. Compression..............................................14 6.1.2. Decompression............................................14 6.2. Limited contact mode.......................................14 6.2.1. Compression..............................................15 6.2.2. Decompression............................................16 6.3. Full contact mode..........................................16 6.3.1. Compression..............................................17 6.3.2. Decompression............................................17 6.4. Move of table content......................................17 6.4.1. TRD to DD................................................17 6.4.2. TST to DD................................................18 7. Relation to header compression...............................18 7.1. ROHC and ROGER.............................................19 7.1.1. ROHC and ROGER, limited contact mode.....................19 7.1.1.1. Packet types...........................................19 7.1.2. ROHC and ROGER, full contact mode........................19 7.2. ROGER realized outside of ROHC scheme......................20 7.2.1. ROGER realized outside of ROHC scheme, ltd contact mode...20 7.2.2. ROGER realized outside of ROHC scheme, full contact mode..20 8. Evaluation of compression scheme.............................21 9. Conclusion...................................................22 10. Security considerations......................................22 11. IANA considerations..........................................22 Hannu, Christoffersson, Svanbro [Page 2] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 12. Acknowledgments..............................................23 13. Intellectual property considerations.........................23 14. Authors addresses............................................23 15. References...................................................23 Hannu, Christoffersson, Svanbro [Page 3] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 1. Introduction Two communication technologies have become commonly used by the general public in the recent years: cellular telephony and the Internet. Cellular telephony has provided its users with the freedom of mobility - the possibility of always being reachable with reasonable service quality no matter where they are. However, until now the main service provided has been speech. With the Internet, the conditions have been almost the opposite. While flexibility for all kinds of usage has been its strength, its focus has been on fixed connections and large terminals, and the experienced quality of some services (such as Internet telephony) has generally been low. Due to new enhanced technologies this is about to change. Internet and cellular technologies are beginning to merge. Future cellular "phones" will contain an IP-stack and support voice over IP in addition to web-browsing, e-mail, etc. One could say that the Internet is going mobile, or that cellular systems are becoming much more than telephony depending on one's point of view. Commonly used terms in this technical area are "all-IP" and "IP all the way". These terms all relate to the case where IP is used end to end, even if the path involves cellular links. This is done for all types of traffic, both the user data (e.g. voice or streaming) and control signaling data (e.g. SIP or RTSP). A great benefit of this is the service flexibility introduced by IP combined with the freedom provided by continuos mobility. A high cost, on the other hand, is the relative large overhead the IP protocol suite typically introduces, due to large headers and text-based signaling protocols. It is very important in cellular systems to use the scarce radio resources in an efficient way. It must be possible to support a sufficient number of users per cell, otherwise costs will be prohibitive [CELL]. The ROHC (RObust Header Compression) working group has successfully solved the problem of reducing bandwidth requirements for the header parts of e.g. RTP/UDP/IP packets [ROHC]. With this obstacle removed, new challenges of enhancing mobile Internet performance arise. One of these relates to application signaling protocols. Protocols such as SIP [SIP], SDP [SDP] and RTSP [RTSP] will typically be used to setup and control applications also in a mobile Internet. However, the generous size of the protocol messages combined with their request/response nature create delays and waste bandwidth. Compression of these messages should be considered in order to increase spectrum efficiency and reduce transmission delay [APP]. 2. General description This chapter describes compression of protocol data above IP/UDP or IP/TCP. The solution is a framework which is robust to packet loss and will give efficient compression of ASCII based protocol messages. Hannu, Christoffersson, Svanbro [Page 4] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 Furthermore, the compression is transparent, i.e. a compressed message will after decompression be identical to the original message. The framework is especially suitable for SIP/SDP, RTSP and HTTP [HTTP], but could also be used for other ASCII based protocols. Three possible compression/decompression scenarios are identified. The scenarios differ from each other depending on to what extent the compressor and decompressor can communicate. In all three cases it is assumed that the messages are compressed when sent over a narrow band link, such as a cellular link. This means that one of the entities may reside in the terminal equipment (mobile phone, thin client etc.) and the other (somewhere)in the core network of a cellular system. The different cases and how to handle them are described in Chapter 6. The compression scheme enhances dictionary based compression of single messages by compressing messages from designated packet flow(s) in a sequential manner. Already transmitted messages are used when compressing new messages. The great gain in doing so stems from the fact that previous messages will contain much of the information or text strings that are found in later messages. In order to classify messages as belonging to a certain flow the messages must all pass through the points where the compressor/decompressor entities reside. That is, packets that go to different mobile terminals can not in general belong to the same compressed packet flow. The method takes advantage of the possibility to acknowledge received messages. The acknowledgements can either be sent with messages travelling in the opposite direction or using a dedicated backwards channel. All sent and received messages are temporarily stored. Once the messages are acknowledged, they are put in a dictionary which is used for compression/decompression of future messages. Details on where the messages are stored while waiting for acknowledgements is given in Chapter 3. The dictionary management and a more thorough description of the different scenarios is described in Chapter 6. The compression algorithm used for compression is based on the Lempel-Ziv algorithm which replaces strings in the message by references to previous occurrences in the message or dictionary. The use of a dictionary will greatly increase the compression efficiency. More information on the compression algorithm can be found in Chapter 4. An important feature needed in order for the proposed compression scheme to be efficient, is a method to classify flows, that is, a method for identifying the type of protocol which is carried on top of IP/UDP, [IP], [UDP], or IP/TCP, [TCP]. Thus, avoiding compression Hannu, Christoffersson, Svanbro [Page 5] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 of data that is not suitable for this type of compression algorithm. This is not an elementary problem. One solution could be to look for certain protocol characteristic into the data transported by UDP or TCP. 3. Terminology * Static Dictionary (SD) A dictionary which is static, i.e. does not change during or between compression of message flows. The dictionary contains protocol-specific Header field names, Methods, Status-codes etc. The static dictionary is known by the compressor and decompressor prior to compression/decompression at both sides of the link. The SD is used for both compression and decompression. * Dynamic Dictionary (DD) Contains acknowledged messages (or parts of them), which have been transmitted during the session. The dynamic dictionary is known by both the compressor and the decompressor on the opposite side. The DD is empty when the compression begins and is updated according to some specific scheme during the message sequence. The DD is used for both compression and decompression. * Temporary Receiver Dictionary (TRD) Messages (or parts of them), which have traversed the link, are stored at the receiver side in the TRD. When a receiving entity is positive that the opposite side knows that the messages have been received, the messages are moved from the TRD to the DD. The TRD is used for compression only. * Temporary Sender Table (TST) Messages that have been sent over the link are stored in the TST at the sending side until it is positive that the messages have been received at the opposite side, then they are moved to the DD. * Temporary Receiver Dictionary Table (TRDT) The TRDT is used to keep track of when to move a message from the TRD to the DD. When a message has been put in the TRD and an acknowledgement has been sent indicating that the message in the TRD has been received, the sequence number of the sent message is put in the TRDT. When an acknowledgement of the message whose sequence number is in the TRDT arrives, the message in the TRD is moved to the DD. * Headers In order for the two entities to keep track of which messages have been sent from and received by the other entity the compressed messages are supplied with a header. The header holds the sequence number of the present message and the sequence numbers of all the received messages which have not yet been acknowledged. Hannu, Christoffersson, Svanbro [Page 6] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 * Context The context in ROHC, [ROHC], contains the information necessary to perform compression and decompression. The context belongs to a certain flow of packets, which is identified by the IP source and destination address in combination with the source and destination port. For RTP, the SSRC identifier is also used to identify the context. The lifetime of the context is not specified within ROHC (implementation issue). The context of ROGER is the dictionaries and the tables, in this draft called ROGER context. The ROGER context is identified in the same way as ROHC context; IP address and port numbers of UDP or TCP. 4. Compression algorithm The default compression algorithm used to compress messages is a slightly modified LZSS [LZSS], which is of Lempel-Ziv type. The algorithm works by scanning through the file from left to right and replace repeated strings by references to the last previous occurrence in the file. The reference is of the form (offset, length of match) and is typically represented using two bytes. The implementation of LZSS must be done so that it is possible to compress and decompress messages using dictionaries. A logical representation of how this can be achieved is as follows, see also Figure 4.1: Compression 1. Append the message to the dictionary and compress the extended file using LZSS. 2. Separate the part of the compressed file that corresponds to the dictionary from the part which corresponds to the message. This is possible since LZSS processes the file from left to right and the part which has already been compressed does not change as the compression proceeds. That is, compressing the dictionary by itself or compressing it with a message appended to it will produce the same output (apart from the compressed message) Decompression 1. Append the compressed message to the compressed dictionary and decompress the extended file. 2. Separate the message from the dictionary. It is of course of vital importance that the same dictionary is used by both the compressor and decompressor. Hannu, Christoffersson, Svanbro [Page 7] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 +--------------+---------+ +--------+------+ | Dictionary | Message | ---> | CD | CM | +--------------+---------+ Compression +--------+------+ +--------------+ +--------+ | Dictionary | ---> | CD | +--------------+ Compression +--------+ Figure 4.1. Compression of the dictionary with a message appended and the dictionary by itself. CD is the compressed dictionary and CM is the compressed message. The LZSS implementation should be tailored to enable the split of the compressed file into these two parts in a simple fashion. To facilitate the splitting the implementation should not replace a repeated string which runs from the dictionary and into the message with a single reference. The part of the string in the dictionary should be replaced with one reference and the part of the string in the message should be replaced by another reference. Compression with LZSS is valid for virtually all types of protocol data, not just ASCII based. However, compression would probably not be as efficient for other types of data. Note: LZSS is chosen as the default compression algorithm in this draft. However, it is left as an open issue how LZSS could be modified or if some other compression algorithm should be used, in order to enhance the performance of ROGER. 4.1. Dictionary build-up and maintenance The dynamic dictionary is specific to each packet flow. That is, each new packet flow, identified by its IP address and port number, gives rise to its own dynamic dictionary. The dictionary is kept as long as packets arrive. Determining whether a packet flow is still active can be done using a timer. It could also be possible to identify the end of a packet flow from the semantics of the protocol, but this would complicate the compressor scheme by forcing the compressor to know the semantics of the compressed protocols and also to keep track of the type of the transmitted messages. Once a packet flow has ended, the DD, TRD and TST are emptied. Note: How long a dynamic dictionary is kept after the last packet has arrived is decided by the system where ROGER is implemented. However, the ROGER context at the compressor and the decompressor must be kept equally long to avoid that a compressed message can not be decompressed. There are several different strategies which could be used to determine what to update the dynamic dictionary with. The method Hannu, Christoffersson, Svanbro [Page 8] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 should ensure a high compression efficiency while keeping the dictionary size and the complexity of the scheme at a reasonable level. Some possibilities on how to update the dictionary are given below: *Append all messages to the dictionary. This would give very large dictionaries in long sessions with many messages. *Only use the first (or last) n messages of the session. Typically, n would be small to ensure that the dictionary does not grow to an unreasonable size. Still, it would have to be large enough to make the compression efficient. *Append only new strings or rows to the dictionary. This would also ensure a slower growth of the dictionary. However, it remains to be investigated how this will affect the compression efficiency. *Messages, strings or rows could also be deleted from the dictionary to avoid having an unreasonably large dictionary. One strategy for this could be to delete from the beginning of the dictionary, i.e. deleting the oldest parts first. Note: It still remains to be investigated what is the most efficient way to update the dictionary. However, the dictionaries at compressor and decompressor must be updated according to the same procedure, otherwise it is impossible to decompress a compressed message. 5. Message header To achieve robustness and keep track of sent and received messages, a header is added to the compressed message. There are two header types available, one for the basic operation used for the great majority of headers and one extended header used to verify the correctness of the dynamic dictionary. The basic header consists of a message identification field, a bit-mask for indication of previously received messages and finally a cyclic redundancy code (CRC). Figure 5.1 shows the basic header. 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | Message ID | Bit- | +---+---+---+---+ + | mask | +---+---+---+---+---+---+---+---+ | CRC | +---+---+---+---+---+---+---+---+ / Compressed message / +---+---+---+---+---+---+---+---+ Figure 5.1. Basic message header Hannu, Christoffersson, Svanbro [Page 9] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 *Message ID - 4 bits: The message identification field (number). The number is increased with one for each sent message. This number is used by the receiving entity (decompressor) to determine which message it has received. The field can also be used as a code point to signal special actions, see Section 5.1. *Bit-mask - 12 bits: Indicating received messages by using their ID. These messages have been received at the entity generating this message and is stored in its TRD. Once the message has been moved from the TRD it will not be indicated in the bit-mask any further. See Chapter 6 for details. *CRC - 8 bits: The checksum is computed over the uncompressed message and saved in the header. After decompression, the checksum is computed again and compared to the CRC in the header. If these CRC's fail to match, this indicates that an error has occurred. 5.1. Message ID field The bit-mask of 12 bits can indicate 12 received messages. Using 4 bits for the message identification field gives 16 possible numbers. Using 12 numbers for message identification leaves 4 numbers for indication of other actions. * Identification +---+---+---+---+ +---+---+---+---+ | 0 0 0 1 | to | 1 1 0 0 | +---+---+---+---+ +---+---+---+---+ are used for message identification number. * Code points +---+---+---+---+ | 0 0 0 0 | : Flush everything in the DD, TRD and TST and use +---+---+---+---+ this message as message number 1. +---+---+---+---+ | 1 1 0 1 | : The header contains a CRC for the DD. This +---+---+---+---+ implies that the header has a special form, see Section 5.4. +---+---+---+---+ | 1 1 1 0 | : Reserved. +---+---+---+---+ +---+---+---+---+ | 1 1 1 1 | : Do not place any part of this message in the +---+---+---+---+ TRD of the receiving entity or in the TST of the sending entity. Hannu, Christoffersson, Svanbro [Page 10] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 Every sent or received message is put in a suitable table or dictionary. If this should not be the case, it must be signaled to the decompressor using the last of the above code points. 5.2. Bit-mask There are 12 bits in the mask. To indicate that a message has been received, the bit corresponding to the messages identification number is set. Thus, to indicate a message with identification number 1 the bit-mask is set to: 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | Message ID | 0 0 0 0 | +---+---+---+---+ + | 0 0 0 0 0 0 0 1 | +---+---+---+---+---+---+---+---+ | CRC for Message | +---+---+---+---+---+---+---+---+ / Compressed message / +---+---+---+---+---+---+---+---+ To indicate messages with identification numbers 2, 4 and 9 the bit- mask is set to: 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | Message ID | 0 0 0 1 | +---+---+---+---+ + | 0 0 0 0 1 0 1 0 | +---+---+---+---+---+---+---+---+ | CRC for Message | +---+---+---+---+---+---+---+---+ / Compressed message / +---+---+---+---+---+---+---+---+ Note also that a received message should be indicated in the following sent messages until the received message is moved from the TRD to the DD. 5.3. The CRC for messages To discover residual bit errors in the messages an 8 bit CRC is computed over the message before compression. The CRC is then placed in the message header as shown in 5. After decompression, a CRC is computed over the decompressed message and compared to the CRC in the header. If these CRC's do not match, an error has occurred. In this case the message is not placed in the TRD and not acknowledged. The CRC polynomial is given by, C(x) = 1 + x + x^2 + x^8. Hannu, Christoffersson, Svanbro [Page 11] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 5.4. Errors in the Dynamic Dictionary To obtain robustness to errors in the Dynamic Dictionary which would cause the decompression of messages to fail, a CRC can be computed over the dynamic dictionary. This CRC is compared to the CRC computed over the dynamic dictionary at the other entity. This can be useful when the decompressor has failed to decompress several consecutive messages. A CRC computed over the dynamic dictionary is signaled using the code point 1 1 0 1. The extended header used in combination with this code point is as follows: 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 1 1 0 1 | Message ID | +---+---+---+---+---+---+---+---+ | Bit- | + +---+---+---+---+ | Mask | CRC for | +---+---+---+---+ + | Dynamic Dictionary | +---+---+---+---+---------------+ | CRC for Message | +---+---+---+---+---+---+---+---+ / Compressed message / +---+---+---+---+---+---+---+---+ The CRC for the dynamic dictionary is 12 bits and the CRC-polynomial is given by, C(x) = 1 + x^2 + x^3 + x^11 + X^12. 5.5. Wrap around A wrap around problem arises when no acknowledgment has been received for the message ID number that is in turn to be assigned to a new compressed message. There are some possible solutions to this problem; Assign the next following ID number, or if no ID number is free, use the code point which indicates that this message is not to be saved in the TRD or in the TST, thus it should not be used for further compression. This approach might reduce the compression efficiency in case the following messages differ substantially from the previous messages stored in the dictionaries. Message ID numbers could be freed even if no acknowledgment has been received. However, this must be done very carefully to maintain robustness. One approach is to free a message ID if an acknowledgement is received for some later message and that a prescribed "time" has expired. Note: This is solved by the implementation. Hannu, Christoffersson, Svanbro [Page 12] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 5.6. Avoiding deadlock If more than 12 consecutive messages in one direction are lost, the compressor runs out of ID numbers and a deadlock may occur. To avoid this, the following scheme will restart the compression. Messages sent within a prescribed time after the 12'th message are sent with the codepoint 1 1 1 1. This forces the compressor to wait for the acknowledgement of the 12'th message. After the prescribed time period and if no acknowledgment has been received, the code point 0 0 0 0 is sent. This signals that the DD, TRD and TST are emptied and the compression scheme restarts. If no acknowledgement is received for this message the procedure is repeated. Note: The prescribed time period is system dependent and should thus be decided by the implementation for the system. 6. Compressor - Decompressor, entities In the following sections of this draft the compressor/decompressor entities will be referred to as entity-u (entity-uplink) and entity- d (entity-downlink), see Figure 6.1. Entity-u's compressor sends messages to entity-d's decompressor and entity-d's compressor sends messages to entity-u's decompressor. Depending on how the compressor and decompressor at one entity resides, or more specifically, to what extent they are able to communicate, different compression modes are possible. The compression efficiency will vary depending on the applied mode. The following three sections describe the different scenarios. Although Figure 6.1 shows a mobile communicating with a base station, the ROGER scheme could be applied to other types of systems and scenarios. Mobile, entity-d Fixed network, entity-u | ................ | | ++ / +--+ || ............> || ++ /\ / \ Figure 6.1. Placement of the compression entities 6.1. No contact mode The compressor and decompressor residing at the same entity are unable to communicate. Also, the decompressor at entity-u is unable to communicate with the compressor at entity-d and vice versa, making use of acknowledgements impossible. This precludes the use of ROGER headers. Thus, in this particular case no ROGER header is attached to the message. This also implies that the compressor at entity-d never can be positive that a sent message has been received at the Hannu, Christoffersson, Svanbro [Page 13] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 decompressor of entity-u. Consequently, use of a dynamic dictionary would make decompression impossible if a previous packet had been lost. Only the static dictionary can be used without losing robustness against packet losses. Figure 6.2 shows this scenario. Entity-d Entity-u +---------+ +---------+ | Comp. |---------------->| Decomp. | +---------+ +---------+ +---------+ +---------+ | Decomp. |<----------------| Comp. | +---------+ +---------+ Figure 6.2. No information about the arrival of sent messages reaches the compressor. 6.1.1. Compression Compression is carried out by using the static dictionary only. * Compression steps 1) Compress message using SD 2) Send message 6.1.2. Decompression Decompression is carried out by using the static dictionary only. * Decompression steps 1) Decompress message using SD 6.2. Limited contact mode The decompressor at e.g. entity-u can acknowledge messages to the compressor at entity-d. The decompressor at entity-u has a system provided link, which from ROGER's point of view looks like a direct link, to the compressor at entity-d, see Figure 6.3. Entity-d Entity-u +---------+ +---------+ | Comp. |---------------->| Decomp. | | |<-----ACK--------| | +---------+ +---------+ Figure 6.3. Acknowledgements generated by decompressor. Hannu, Christoffersson, Svanbro [Page 14] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 From ROGER's point of view, the basic acknowledgement has the same format as the message header described in Chapter 5, except for the CRC, see Figure 6.4. 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | ACK ID | Bit- | +---+---+---+---+ + | mask | +---+---+---+---+---+---+---+---+ Figure 6.4. Basic acknowledgement. *ACK ID - 4 bits: The acknowledgement identification number. The number is increased with one for each sent acknowledgement. This number is used by the receiving entity (compressor) to indicate to the originator of this message (decompressor at the other entity), that the acknowledgement has been received. *Bit-mask - 12 bits: Indicating received messages by using their ID. These messages have been received at the entity generating this acknowledgement and is stored in its TRD. In situations when the decompressor wants to verify the correctness of its dynamic dictionary, i.e. send code point 1 1 0 1, the extended acknowledgement should be used, see Figure 6.5. 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | 1 1 0 1 | ACK ID | +---+---+---+---+---+---+---+---+ | Bit- | + +---+---+---+---+ | Mask | CRC for | +---+---+---+---+ + | Dynamic Dictionary | +---+---+---+---+---------------+ Figure 6.5. Extended acknowledgement. Section 7.1 describes how ROGER could be realized within ROHC and also how the feedback (acknowledging) is handled with ROHC. 6.2.1. Compression When the session starts, the dynamic dictionary and the TST are empty. The message is compressed using the static and the dynamic dictionary and is also stored in the TST. The message header indicates which message is sent and which acknowledgements that have been received. Hannu, Christoffersson, Svanbro [Page 15] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 * Compression steps: 1) If necessary, move content of TST to DD 2) Compress using SD+DD 3) Put message in TST 4) Attached header 5) Send message 6.2.2. Decompression Decompression is done by first looking at the header attached to the compressed message. The header indicates which messages were in the DD when the message was compressed. That is, the bit-mask indicates which acknowledgements that have arrived to the compressor and the messages corresponding to the acknowledgements are used in the compression process. The decompressor makes sure that the same messages are used for decompression, i.e. moving the content of the TRD to the DD which is indicated in the bit-mask. The received message is put into the TRD and an acknowledgement is sent to the compressor. One could consider to use a sparse acknowledging scheme here. * Decompression steps: 1) If necessary move content of TRD to DD, see Section 6.4.1. 2) Decompress message using SD+DD 3) Put message in TRD 4) Send Acknowledgement 6.3. Full contact mode The compressor and decompressor on both sides reside together. Thus, both sent and received messages can be used in the compression process since the compressor and decompressor share dictionaries. The decompressor uses the compressor, which it resides together with, to inform the compressor on the other side that a message has been received. This is done by an indication in the bitmask of the sent message's header. See Figure 6.6 for scenario. +---------+ +----------+ | Comp./ |---------------->| Decomp./ | | Decomp. |<----------------| Comp. | +---------+ +----------+ Figure 6.6. Compressor and decompressor reside together and are able to share the context. Hannu, Christoffersson, Svanbro [Page 16] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 How this mode is handled within the ROHC scheme is described in Section 7.2. 6.3.1. Compression Compression is done using the SD, the DD and the TRD. The sent message is put into the TST. The message ID is put into the header together with the bit-mask indicating which messages have been received and not yet been put into the DD. * Compression steps: 1) Compress using SD+DD+TRD 2) Put message in TST 3) Attach header 4) Send message 6.3.2. Decompression The decompression starts by reading the message header. If the bit- mask indicates a previous sent message which acknowledged an earlier received message this earlier received message is moved from the TRD to the DD. If the bit mask indicates that a previously sent message has been received at the other entity this previously sent message is moved from the TST to the DD. * Decompression steps: 1) If necessary move content of TRD to DD, see Section 6.4.1. 2) If necessary move content of TST to DD, see Section 6.4.2. 3) Decompress message using SD+DD 4) Put message in TRD 5) Send acknowledgment if needed 6.4. Move of table content This section defines when to move contents from the TRD or TST to the DD. In general the order for movement is; 1) move contents from TRD to DD 2) move contents from TST to DD. The contents of the TRD and TST may be several messages. Only the messages that correspond to a certain acknowledgement are moved. 6.4.1. TRD to DD When a message is sent carrying indications of received messages in the TRD, a mapping between the message ID and the IDs of the messages Hannu, Christoffersson, Svanbro [Page 17] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 stored in the TRDT is made. When a future message is received by this entity, the entity withdraws the acknowledged messages IDs from the received message header. The acknowledged messages IDs are compared with the IDs stored in the TRDT. If there is a match the corresponding contents in the TRD (given by the mapping) is moved to the DD and the mapping is removed from the TRDT. If the next received message carries the same acknowledgment it will not cause difficulties since the mapping has been removed from the TRDT. 6.4.2. TST to DD The contents of the TST is moved to the DD when an acknowledgement is received for the message stored in the TST. The TST must be constructed so that if the next following messages acknowledge the same message there is no move of content from the TST to the DD. 7. Relation to header compression The protocols discussed in this draft, i.e. SIP/SDP, RTSP and HTTP are all carried on UDP/IP or TCP/IP. In order to utilize the benefits of using ROGER, attention should be paid to UDP/IP or TCP/IP compression as well. An efficient method for header compression of UDP/IP is given by ROHC. In the near future ROHC profiles for compressing TCP/IP will also be available. An appealing solution is to handle the UDP/IP compression with ROHC and the compression of the application signaling messages with ROGER. Figure 7.1. shows a packet before and after compression. CM is compressed information e.g. SIP/SDP, H is the ROGER header and R is the ROHC header handling the UDP/IP part of the packet. +--------+---------+ +---+---+----+ | IP/UDP | SIP/SDP |---- compression ---->| R | H | CM | +--------+---------+ +---+---+----+ Figure 7.1. Packet before and after compression. To identify messages that have been compressed with ROGER, there are some alternatives depending on the environment. For example, a link layer identification method could be used, or in conjunction with ROHC, a profile number. As ROGER compresses the part above TCP/IP and UDP/IP, it is assumed that context identification is handled by the underlying TCP/IP or UDP/IP compression scheme. Hannu, Christoffersson, Svanbro [Page 18] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 7.1. ROHC and ROGER The compression scheme ROGER could be made as a part of the ROHC framework. Profile 2 in ROHC is a UDP/IP compression profile. A new profile which has the same functions and packet types as profile 2 and includes ROGER could be defined. Thus, the new profile would also compress the UDP payload with ROGER. A new profile can also be defined in a similar manner using the future ROHC TCP profile. 7.1.1. ROHC and ROGER, limited contact mode The limited contact mode, see Section 6.2, does not require any changes to the ROHC scheme or any additional features from the system that are not already required by ROHC. The next section describes the packet types to fit ROGER into the ROHC scheme. 7.1.1.1. Packet types The packet types of this combination of ROHC and ROGER have the same formats as defined in ROHC for profile 2 with the addition of the ROGER message header, see Section 5. The ROGER message header is placed at the end of the profile 2 ROHC header. The feedback types are the same for ROHC profile 2, with the addition of this option: +---+---+---+---+---+---+---+---+ | Opt Type = 5 | Opt Len = * | +---+---+---+---+---+---+---+---+ | ID Number | Bit- | +---+---+---+---+ + - ROGER feedback | mask | +---+---+---+---+---+---+---+---+ / / - Other types of feedback +---+---+---+---+---+---+---+---+ *If the "Opt Len" field has a value larger than 2 octets there are more feedback options in this packet, which starts after the ROGER feedback. 7.1.2. ROHC and ROGER, full contact mode ROGER gains in compression efficiency if the dictionaries can be shared between the compressor and decompressor at the entities. How to share contexts is not defined in the ROHC scheme and can therefore be regarded as an implementation issue. However, this feature must be implemented on both sides of the link. The use of shared dictionaries Hannu, Christoffersson, Svanbro [Page 19] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 requires that it is possible to associate the entities inbound and outbound flows. The criteria for associating the flows could be the IP-addresses and possibly also the port numbers. It is necessary that both the uplink and downlink flows pass through the same point. To enable the use of shared dictionaries it is up to the underlying system to associate flows going in both directions and pass the ROGER headers with the compressed information to ROGER. 7.2 ROGER realized outside of the ROHC scheme ROGER can of course be realized outside of the ROHC scheme, but this implies somewhat more requirements on the system to use ROGER in. For the three modes defined in this draft there is one common thing that is required from the system (underlying link layer); The system must handle the negotiation of whether ROGER is to be used, and if so in which mode. The following two sections describe the requirements for the two latter modes. The requirements are more of the implementation type and probably not needed to standardize. In the No Contact Mode there are no other requirements since in this mode only the static dictionary is used. 7.2.1. ROGER realized outside of the ROHC scheme, limited contact mode In the limited contact mode the dictionaries are not shared among packet flows in opposite directions. There is one context per direction and flow. The additional link layer requirements in this mode are: *The system must be able to identify the flow(s) that correspond to a certain context. As the IP-header and UDP (or TCP) header are in the clear this can be done by looking at the IP-addresses and port numbers. *As ROGER does not define any piggyback headers, the system must provide the feedback to the ROGER entities, e.g. by a dedicated feedback channel. 7.2.2. ROGER realized outside of the ROHC scheme, full contact mode In the full contact mode the dictionaries are shared among packet flows in both directions. One context (the ROGER dictionaries) can be used by multiple flows in both directions. The additional link layer requirements in this mode are: Hannu, Christoffersson, Svanbro [Page 20] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 *The system must be able to identify the flow(s) that correspond to a certain context. As the IP-header and UDP (or TCP) header are in the clear this can be done by looking at the IP-addresses and port numbers. *The system must be able to associate an uplink packet flow(s) with a downlink packet flow(s), since both sent and received messages are used in the compression process. *Since it is required from this mode that the system can associate uplink flows with downlink flows, acknowledging of messages are handled with the bit-mask in the headers of the compressed messages. 8. Evaluation of compression scheme A small test in the limited contact and full contact mode situations was carried out to evaluate the performance of ROGER. The messages from a SIP trace of a call setup were compressed. The packet flow consisted of 13 messages sent between a client and a SIP proxy. The compression was performed using an implementation of the LZSS algorithm as described in Section 4. To determine the size of the compressed file, the size of the compressed dictionary was compared to the size of the compressed extended file. The static dictionary used for the compression and decompression is built up by header field names e.g.; To:, From:, and Via: In the limited contact mode it was assumed that every message was acknowledged before the next message was sent, i.e. a dedicated channel was available. This gives a slightly better compression than if piggy-backing on SIP messages travelling in the opposite direction is used, since this gives a slower dictionary expansion. The results in terms of compression factors (size uncompressed/size compressed) are given in Table 1. The over all compression factors were 3.3 for the limited contact mode and 4.6 for the full contact mode. Message # Originating source Compression factor Limited Full 1 Client 1.5 1.5 2 Proxy 1.5 5.6 3 Proxy 1.9 3.1 4 Client 3.5 4.9 5 Proxy 5.2 6.4 6 Client 5.7 5.7 7 Proxy 6.8 8.1 8 Proxy 7.4 8.3 Hannu, Christoffersson, Svanbro [Page 21] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 9 Proxy 6.9 7.0 10 Client 7.2 7.4 11 Proxy 7.1 7.8 12 Proxy 6.6 7.9 13 UAC 7.8 7.8 Average: 3.3 4.6 Table 1. As can be seen from the table, compression is more efficient in the full contact mode. This is due to the fact that the dictionaries grow faster since both sent and received messages are used. 9. Conclusions This draft has presented the compression scheme ROGER for application signaling protocols. The scheme is simple to implement and shows promising results for compression of the ASCII based protocols SIP/SDP, and can be expected to have a similar performance on other ASCII based protocols such as RTSP. Depending on what the systems link layer can support there are three different modes of operation; No contact mode, Limited contact mode, and Full contact mode. ROGER can easily fit in the ROHC framework. Requirements on the system if ROGER is run outside of the ROHC framework are listed. 10. Security considerations In general encryption and compression do not go together very well. More specifically, messages that are encrypted are not possible to compress efficiently. It is of course possible to run a loss less compression algorithm like ROGER on an encrypted message, but the compression will most likely not decrease the size of the message. These points also hold for the use of ROGER. The use of ROGER does not effect the possibility to use encryption algorithms. Use of ROGER on encrypted messages is possible, although not believed to result in any size reduction of the encrypted message. 11. IANA considerations If ROGER is to be a part of the ROHC framework a ROHC profile identifier must be reserved by the IANA for the IP/UDP/ROGER profile defined in this document and also a ROHC profile identifier for a future IP/TCP/ROGER profile. Hannu, Christoffersson, Svanbro [Page 22] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 12. Acknowledgements Thanks to Arto Mahkonen, Ericsson LMF, and Lars-Erik Jonsson, Ericsson Erisoft, for their valuable input to this work. 13. Intellectual property rights considerations Ericsson has filed patent applications that might possibly have technical relations to this contribution. See further: http://www.ietf.org/ietf/IPR/ERICSSON-General 14. Author's Addresses Hans Hannu Tel: +46 920 20 21 84 Ericsson Erisoft AB Lulea, Sweden EMail: Hans.Hannu@epl.ericsson.se Jan Christoffersson Tel: +46 920 20 28 40 Ericsson Erisoft AB Lulea, Sweden EMail: Jan.Christoffersson@epl.ericsson.se Krister Svanbro Tel: +46 920 20 20 77 Ericsson Erisoft AB Lulea, Sweden EMail: Krister.Svanbro@epl.ericsson.se 15. References [APP] H. Hannu, J. Christoffersson and K. Svanbro, Application signaling over cellular links, Internet Draft (work in progress), November 2000. [CELL] L. Westberg and M. Lindqvist, Realtime traffic over cellular access networks, Internet Draft (work in progress), November 2000. [HTTP] R. Fielding, et. al., Hypertext Transfer Protocol - HTTP/1.1. RFC 2616, June 1999. [IP] J. Postel, Internet Protocol, RFC 791, September 1981. [LZSS] J.A. Storer and T.G. Szimanski, Data Compression via Textual Substitutions. Journal of the ACM 29, 1982. [ROHC] C. Bormann, Et. al., RObust Header Compression, Internet Draft (work in progress), February 2001. Hannu, Christoffersson, Svanbro [Page 23] INTERNET-DRAFT RObust GEneric message size Reduction Feb 23, 2001 [RTSP] H. Schulzrinne, A. Rao and R. Lanphier, Real Time Streaming Protocol (RTSP). RFC 2326, April 1998. [SDP] M. Handley and V. Jacobson, SDP: Session Description Protocol. RFC 2327, April 1998. [SIP] M. Handley, H. Schulzrinne, E. Schooler and J. Rosenberg, SIP: Session Initiation Protocol. RFC 2543, August 2000. [TCP] J. Postel, Transmission Control Protocol, RFC 793, September 1981. [UDP] J. Postel, User Datagram Protocol, RFC 761, August 1980. This Internet-Draft expires in August 2001. Hannu, Christoffersson, Svanbro [Page 24]