Internet Draft                                               V. Cavanna 
<draft-cavanna-iscsi-crc-vs-cksum-01.txt>          Agilent Technologies 
Expires September 2001                                    March 2, 2001 
                                                                        
 
                    iSCSI Digests û CRC or Checksum? 
 
 
Status of this Memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other documents 
   at any time.  It is inappropriate to use Internet- Drafts as 
   reference material or to cite them other than as "work in progress." 
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt 
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
Copyright Notice 
    
   Copyright (C) Agilent Technologies (2001). All Rights Reserved. 
    
    
1. Abstract 
    
   The iSCSI working group is currently considering the method of 
   protecting iSCSI messages from errors. This I-D presents a proposal 
   to use the Fletcher-32 checksum algorithm or some variant of it 
   rather than Adler-32; if we must use CRC, to make it CRC32 (or 
   CRC32Q) rather than CRC64. 
    
    
2. Introduction 
    
   The iSCSI I-D proposes using a 64 bit CRC to protect the header and 
   payload of the iSCSI message. The overhead and complexity of a 64 
   bit CRC seems inappropriate for the type of errors from which the 
   iSCSI message needs protection. In fact a 32-bit checksum seems 
   adequate for our purposes. 
 
   This document proposes using a 32 bit Fletcher-32 checksum rather 
   than a 64 bit CRC and offers some justification. 
    

Cavanna                                                       [Page 1] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
3. Brief description of the algorithms CRC-32, Fletcher-32, Adler-32 
    
    
3.1. CRC-32 (of Ethernet and FDDI fame) 
    
   CRC-32 treats the data as binary coefficients of a polynomial in x 
   and divides, using mod2 division, by the CRC polynomial P(x) (after 
   multiplication by x^32 to make room for adding CRC bits to the low 
   order bits of the shifted data without affecting the data). The 
   coefficients of the remainder polynomial are then used as the CRC 
   bits. The CRC register is initialized to a value other than zero to 
   detect extra or missing leading zeroes. The CRC field is actually 
   complemented before being transmitted in order to detect extra 
   trailing zeroes in the case where the transmitted CRC would 
   otherwise be zero.  
    
    
3.2. Fletcher-32 checksum 
    
   Fletcher-32 checksum has two sums, S1 and S2 which are initially 
   zero. The sums can be ones complement (modulo 2^16-1) or twos 
   complement (modulo 2^16). S1 is a 16 bit sum of the data taken 2 
   bytes at a time. S2 is a 16 bit sum of the data multiplied by the 
   position of the data from the end of the packet. No multiplication 
   is actually required in the algorithm. The multiplication effect 
   results from the way the sum is accumulated. S2 accumulates values 
   of S1 after S1 is updated; so a given data word appears multiple 
   times in S2. The number of times a word appears in S2 depends on its 
   position from the end of the packet.  
 
    
3.3. Adler-32 checksum 
    
   Adler-32 checksum has two sums S1 and S2. S1 is a 16 bit sum of the 
   data taken one byte at a time and modulo 65521. S2 is a 16 bit sum, 
   modulo 65521, of the data taken one byte at a time multiplied by its 
   position from the end of the packet, and the sequence length. The 
   sequence length enters the picture because of the initial value of 1 
   for S1. No multiplication is actually required in the algorithm. The 
   multiplicative effect results from the way the sum is accumulated as 
   described above in section 3.2. Adler-32 is essentially the same as 
   Fletcher-32 except S1 is initialized to 1 instead of 0 and the sum 
   is modulo 65521. Another difference which has implications on the 
   burst detection capability of Adler-32, as I explain later, is that 
   Adler-32 is defined to sum data one byte at a time whereas Fletcher-
   32 is defined to handle data 2 bytes at a time. 
    

Cavanna                                                       [Page 2] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
4. What a CRC-n polynomial, P(x), can protect from? 
    
   Assuming n check bits, we can claim the following error detection 
   properties without placing much restrictions on the form of P(x). 
    
    
4.1. Individual bit errors 
    
  o All single bit errors provided p(x) has two or more terms and 
     regardless of record length. 
    
  o All double bit errors provided p(x) has a factor with 3 or more 
     terms and record length less than the period of p(x).  
    
  o The period of P(x) depends on its form (specifically the cycle 
     length of the primitive polynomial that is usually one of the 
     factors) and for most standard polynomials is within a factor of 4 
     of 2^n-1. Even for n=32 this number is much larger than the record 
     length (4.3 billion bits). 
    
  o All triple bit errors and odd number of errors provided p(x) 
     contains the factor x^c+1  where c is an integer and regardless of 
     record length. Most CRC polynomials (but not CRC-32) contain the 
     factor x+1. This makes the hamming distance 4! 
    
  o Most occurrences of more than three single bit errors regardless 
     of record length. 
    
   Thus most CRCs, with few restrictions on their form, will detect all 
   possible 3 bit errors (Hamming distance is 4) in practical record 
   lengths, and most occurrences of more than three single bit errors.  
    
   CRC-32 used in Ethernet and FDDI and that I refer to in this 
   document is prime and does not have x+1 as a factor. The implication 
   is that it does not detect ALL odd number of errors (in particular 
   3) regardless of record length.  
    
    
Cavanna                                                       [Page 3] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
4.2. Burst errors 
    
  o All single bursts of length up to and including n, regardless of 
     record length, provided P(x) has an x^0 term. A burst is a range 
     of bits in which at least the first and last bit are in error. The 
     size of the burst is the number of bits including the first and 
     last bits. 
    
  o 1-1/2^(n-1) * 100 % of single bursts of length n+1 if all errors 
     patterns are equally probable. The only pattern that is undetected 
     is that which corresponds to the coefficients of P(x).  
    
  o 1-1/2^(n) * 100 % of single bursts of length greater than n+1 if 
     all errors patterns are equally probable. For n=32 this is 
     99.9999999767% of all single bursts. 
    
  o If P(x) has (x^c+1) as a factor it will also detect double bursts 
     such that the length of the shorter burst does not exceed the 
     degree of p(x) and the sum of the lengths of the two bursts does 
     not exceed c+1, provided the record length including check bits 
     does not exceed the period of p(x). A double burst consists of two 
     bursts separated by an arbitrary number of bits.  
    
  o Most of remaining bursts. 
    
    
   By picking a polynomial with a factor x+1 we guarantee a hamming 
   distance of 4 (without this term hamming distance is at most 3 as 
   long as P(x) has two or more terms and a factor with 3 or more 
   terms). By picking the other factor to be a primitive poly we 
   maximize the cycle length and thus increase our coverage of errors 
   for which the coverage depends on cycle length, such as for double 
   bit errors and for burst detection. Burst detection allows more bits 
   to change than the hamming distance would imply but the bits must be 
   close together. 
    
   If all error patterns are equally probable the form of the 
   polynomial is unimportant and only the number of check bits matters. 
    

Cavanna                                                       [Page 4] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
5. CRC-64 is overkill 
    
   Protection length increases to 2^64 or 1.8 x10^19 bits. This 
   increased protection length is not needed for iSCSI. The fraction of 
   errors that are detected when all errors are equally probable is 1-
   1/2^64. When all errors are equally probable the fraction detected 
   by CRC-32, 1-1/2^32 or 99.9999999767%, is large enough. The size of 
   single burst that is always detected is 64 whereas for CRC-32 it is 
   32 that is, CRC-64 will catch ALL single bursts of size 64 whereas 
   CRC32 will only catch MOST such bursts. Many more errors are, of 
   course, detected by CRC64 than by CRC32. Note that a checksum with 
   64 bits will detect the same fraction as CRC-64 ,1-1/2^64 û when all 
   errors are equally probable [ref3]. 
    
   Since we want to detect errors in buses that are wider than 64 bits 
   the burst protection of CRC-64 is not sufficient to guarantee 100% 
   detection. 
    
    
6. What Fletcher-32/Adler-32 Checksum can protect from? 
    
   Assume 2s complement sum. We can claim the following protection [ref 
   1]: 
    
  o All single bit errors. 
    
  o For Fletcher-32 and for Adler-32 modified to handle 2 bytes at 
     once, all single bursts of less than 31 bits (possibly 32 for 
     Adler-32). 
    
  o Most other errors. 
    
  o 1-1/2^32*100 % of all errors when all errors are equiprobable. 
     This is the same as CRC-32 [ref 3] 
    
  o Adler-32 checks the sequence length so a separate check is not 
     necessary. 
    

Cavanna                                                       [Page 5] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
7. Adler vs. Fletcher for iSCSI. 
    
   The only comparison (RFC-1950) I have seen between Fletcher and 
   Adler involved Fletcher-16 (ones complement version) vs. Adler-32 
   and Adler-32 seemed better but, for the type of errors we are trying 
   to protect from, is primarily due to the increase in the number of 
   check bits rather than the change in algorithm û my opinion. If we 
   compare Adler-32 to 32 bit Fletcher there are only two differences, 
   both of which are minor in the context of error protection for 
   iSCSI. Adler-32 differs in: 
    
  o Initial value for S1 is 1 to make the length of the sequence part 
     of S2 so extra or missing bytes with a value of zero will produce 
     a different checksum. Since the length of the data record is 
     easily verifiable by other means this is not a significant 
     benefit. 
    
  o Modulo is 65521 (a prime number) to avoid (according to RFC 1950) 
     a possible large class of two-byte errors from going undetected. 
     RFC 1950 was referring to ones complement Fletcher-16 which has 
     the blind spot for 16 bit bursts due to its two representations 
     for zero which produce the same checksum. Ref 1 also claims that 
     Fletcher-16 guarantees detection of only 15 bits. My belief is 
     that Fletcher-32 detects all bursts of 31 bits and may have a 
     similar blind spot as Fletcher-16, but for 4-byte errors. 
     Presumably Adler-32, which handles one data byte at a time, 
     guarantees detection of all 2-byte bursts but I am not sure. 
     Presumably Adler-32 if modified to handle two bytes at a time 
     would catch all bursts of 32 bits. Again I am not positive of this 
     but will confirm and report my conclusion. 
    
    
Cavanna                                                       [Page 6] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
8. Example of errors that are not detected by Adler-32, Fletcher-32, 
 CRC-32. 
    
    
8.1. [Ref2] shows some patterns that Fletcher and Adler will not detect 
  but that CRC-32 will detect by virtue of its single burst detection 
  capability. Adler-32 will not detect a change in 3 consecutive bytes 
  from x,y,z to xÆ,yÆ,zÆ when the bytes are related as follows: 
    
   zÆ-z= -(xÆ-x)-(yÆ-y) and 2xÆ+yÆ=2x+y 
    
   For example the 3 byte sequence consisting of 4,2,1 is changed to 
   5,0,2 and the checksum is the same. 
    
   This is only a 3 byte burst. Why is it not detected by Adler-32 when 
   Fletcher-32 detects all bursts of 31? Because Adler-32 has been 
   defined to handle one byte of data at a time. In this respect Adler-
   32 is inferior to Fletcher-32 which is defined to handle 2-bytes of 
   data at a time. I suppose Adler-32 can easily be modified to handle 
   2 bytes of data at a time. 
    
   Note that this change involves 5 bits and requires invocation of the 
   burst detection (since the bits are close together) properties of 
   the Adler-32 checksum as it would of CRC-32 which has a hamming 
   distance as low as 4. 
     
8.2. CRC-32 will not detect any error pattern consisting of its 
  coefficients nor any multiple of it (sum of shifted versions). The 
  shortest such pattern is 33 bits.  
    
   I should point out that the patterns that get by Adler-32 may be 
   easily expressed in a short equation but are not necessarily more 
   likely to occur than the patterns that get through the CRC-32. 
    

Cavanna                                                       [Page 7] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
9. Classes of errors to protect against 
    
    
   Note that iSCSI is not the primary level of protection. In 
   particular we do not need to protect from link type of errors. 
    
  o Errors introduced at layers below iSCSI by sending or receiving 
     node. 
    
  o Software errors (buffer mismanagement). 
    
  o DMA hardware that fails to DMA all the data or DMAs more data than 
     it should, or corrupts the data. 
    
  o Errors introduced by middle boxes that recomputed checksums for 
     layers below iSCSI after introducing intentional and perhaps 
     unintentional changes. 
    
  o Unintentional changes in middle boxes and end nodes 
    
  o Memory errors (random or pattern dependent) 
    
  o Bus errors (widths: 16,32,64,128) 
    
  o Fifo overruns and underruns 
    
  o Summarizing probable error patterns 
    
  o Missing data in chunks of 16, 32, 64, 128 bits 
    
  o Extra data in chunks of 16, 32, 64, 128 bits 
    
  o Changed data in chunks of 1, 16, 32, 64, 128 bits 
    
  o Periodic single bit errors every 16 or 32 or 64 or 128 bits. 
    
    
Cavanna                                                       [Page 8] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
10.     How good is our coverage of such errors? 
    
  o When a single chunk is modified a 32 bit checksum and a 32 bit CRC 
     both provide complete coverage for chunk sizes less than 32 bits. 
     CRC-32 provides complete coverage for chunk sizes of 32. 
    
  o CRC-64 provides complete coverage up to chunk size of 64 bits. 
    
    
  o None provide complete coverage for chunk sizes of greater than 64 
     bits. 
    
  o For single bit errors all provide complete coverage. 
    
    
  o CRC-32 provides complete coverage for 2 and good coverage for 3 
     single bit errors.  
    
    
11.     Comparing CRC with Checksum 
    
  o CRC algorithm uses division rather than addition. 
    
  o A small change in input affects many bits of the CRC. 
    
  o The number of bits is most important. 
    
  o The form is not so important for the types of errors that we need 
     to detect 
    
  o CRCs are more sensitive to the bursty nature of communication line 
     noise and will detect ALL bursts shorter than the size of the CRC.  
     This is not the job of the iSCSI CRC but of the frame level CRC. 
     The ones complement sum is not insensitive to these error 
     patterns, it is just not especially sensitive to them. The extra 
     sensitivity of a CRC to burst errors comes at the expense of 
     lessened sensitivity to other bit pattern errors since the total 
     fraction of errors detected remains the same. See Ref [3] 
    
  o CRC is more likely to detect error patterns displaying regularity 
     such as periodic single bit errors but this comes at the price of 
     decreased sensitivity to other types of errors since the total 
     fraction of error patterns detected, when all are equiprobable, 
     for a given number of check bits is the same for CRC or a 
     checksum. 
    
  o For small enough number of errors CRC is guaranteed to catch ALL 
     patterns as explained in a previous section and thus, if error 
     patterns with small number of errors are more common, then CRC 
     provides better error coverage. Again, this comes at the expense 
     of decreased sensitivity to other error patterns although such 
     patterns tend to be irregular and presumably less common. 
 
 
Cavanna                                                       [Page 9] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
  o CHECKSUM, by contrast has more holes for error patterns with small 
     number of errors, but CHECKSUM compensates by catching errors that 
     CRC will miss. Again, the total fraction of error patterns 
     detected only depends on the number of check bits [ref3]. 
    
  o In a physical link where the primary source of errors is  random 
     noise it is very attractive to use an error protection scheme 
     which guarantees complete protection for the most common errors 
     which are single, double and triple individual bits or bursts 
     (because of group encoding). For example if the bit error rate is 
     low enough that the probability of 3 or more bits in error within 
     a data record is so low that, at the link bit rate, the 
     probability of 3 or more bits is negligible, then the CRC offers 
     practically complete protection. However, to reiterate, it is the 
     function of the link CRC to provide this protection, not the iSCSI 
     message check sequence. 
    
  o For group-encoded data, detecting all occurrences of 2 bits or 
     less implies detecting all occurrences of two bursts of size that 
     depends on the group encoding. For example for 8B:10B encoding, 
     the bursts are 8 bits wide. CRCs can be easily designed to 
     guarantee 100% protection from 2 bursts of 8 bits or less 
     occurring anywhere within a record (of practical length). In fact 
     CRC-32 will catch all double bursts of size 9 for code length up 
     to 13000 [ref4]. This is not as easy to implement with checksums. 
    
  o We need to characterize the types of errors that we need to 
     protect from and confirm that CHECKSUM provides sufficient 
     protection. 
    
    
12.     Checksum partials easier to handle 
    
   When an iSCSI message is composed of multiple blocks that arrive at 
   different time and perhaps out of sequence, the receiver needs to 
   accumulate partial digests (CRC or checksum) from the blocks. 
    
   Combining the partial digests appears to be significantly cheaper 
   and faster for checksums than for CRCs. 
    
    
Cavanna                                                      [Page 10] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
13.     Recommendation 
 
   Fletcher-32 is my first choice. It is cheaper to implement than CRC-
   32 and probably provides adequate protection from the expected 
   errors. Unless Adler-32 is modified to handle 2 bytes at a time 
   Fletcher-32 has superior single burst detection performance. There 
   is not sufficient benefit in Adler-32 to justify the additional 
   complexity of the mod 65521 arithmetic.  
    
   CRC-32 is my second choice. There is not sufficient benefit in CRC-
   64 to justify the additional complexity and overhead. 
 
    
Cavanna                                                      [Page 11] 

Internet-Draft         iSCSI û CRC or Checksum             March 2001 
 
 
14.     References 
    
   [1] Performance of Checksums and CRCÆs over Real Data by Jonathan 
   Stone, Michael Greenwald, Craig Partridge. IEEE/ACM Transactions on 
   Networking, Vol. 6, no. 5, October 1998 
    
   [2] On Fletcher and Adler codes, and classic CRCs. email from Dr. 
   Dafna Sheinwald from the IBM Haifa Research Lab to iSCSI group via 
   Julian Satran of IBM circa 1/25/01. 
    
   [3] FITS Checksum Proposal by R.L. Seaman and W.D. Pence 24 August 
   1995. 
   http://rosat.gsfc.nasa.gov/docs/heasarc/ofwg/docs/general/checksum/n
   ode18.html 
    
   [4] Toru Fujiwara, Tadao Kasami, Shu Lin. Error Detecting 
   Capabilities of the Shortened Hamming codes adapted for error 
   detection in IEEE standard 802.3. IEEE Transactions on 
   Communications, vol. 37, no.9, 1989, pp. 986-989 
    
 
15.     Authors' Addresses 
    
   Vicente Cavanna 
   1101 Creekside Ridge Drive Suite 100 
   Roseville, CA 95661-9051 
   USA 
   Phone: 916-788-5797 
   Email: vince_cavanna@agilent.com 
    
   Matt Wakeley 
   1101 Creekside Ridge Drive Suite 100 
   Roseville, CA 95661-9051 
   USA 
   Phone: 916-788-5670 
   Email: matt_wakeley@agilent.com 
    
    
   Expires September 2001 


Cavanna                                                      [Page 12]