Internet Draft                                               V. Cavanna 
<draft-cavanna-iscsi-crc-vs-cksum-00.txt>          Agilent Technologies 
Expires August 2001                                       February 2001 
                                                                        
 
                    iSCSI Digests û CRC or Checksum? 
 
 
Status of this Memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other documents 
   at any time.  It is inappropriate to use Internet- Drafts as 
   reference material or to cite them other than as "work in progress." 
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt 
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
Copyright Notice 
    
   Copyright (C) Agilent Technologies (2001). All Rights Reserved. 
    
    
1. Abstract 
    
   The iSCSI working group is currently considering the method of 
   protecting iSCSI messages from errors. This I-D presents a proposal 
   to use the Fletcher-32 checksum algorithm. 
    

Cavanna                                                       [Page 1] 

Internet-Draft         iSCSI û CRC or Checksum          February 2001 

2. Introduction 
    
   The iSCSI I-D proposes using a 64 bit CRC to protect the header and 
   payload of the iSCSI message. The overhead and complexity of a 64 
   bit CRC seems inappropriate for the type of errors that the iSCSI 
   message needs protection from.  
 
   This document proposes using a Checksum rather than a CRC and with 
   only 32 bit instead of 64 and attempts to justify the reasoning 
   behind the proposal. 
    

Cavanna                                                       [Page 2] 

Internet-Draft         iSCSI û CRC or Checksum          February 2001 

3. Brief description of the algorithms CRC-32, Fletcher-32, Adler-32 
    
3.1. CRC-32 
    
   CRC-32 treats the data as binary coefficients of a polynomial in x 
   and divides using mod2 division by the CRC polynomial P(x) (after 
   multiplication by x^32 to make room for adding CRC bits to the low 
   order bits of the shifted data without affecting the data). The 
   remainder polynomial has the coefficients of the CRC. 
    
    
3.2. Fletcher-32 checksum 
    
   Fletcher-32 checksum has two sums, S1 and S2. Sums can be ones 
   complement (modulo 2^16-1) or twos complement (modulo 2^16). S1 is a 
   16 bit sum of the data taken 2 bytes at a time. S2 is a 16 bit sum 
   of the data multiplied by the position of the data from the end of 
   the packet. No multiplication is actually required in the algorithm. 
   The multiplication effect results from the way the sum is 
   accumulated. S2 accumulates values of S1 after S1 is updated; so a 
   given data word appears multiple times in S2. The number of times a 
   word appears in S2 depends on its position in the data stream. 
 
    
3.3. Adler-32 checksum 
    
   Adler-32 checksum has two sums S1 and S2. Adler-32 is essentially 
   the same as Fletcher-32 except S1 initialized to 1 and the sum is 
   modulo 65521. S1 is a 16 bit sum modulo 65521 of the data taken 2 
   bytes at a time and with the sum initialized to one instead of zero. 
   S2 is a 16 bit sum modulo 65521 of the data, multiplied by its 
   position from the end of the packet, and the sequence length 
   (because of the initial value of 1 for S1). 
   No multiplication is actually required in the algorithm. The 
   multiplication effect results from the way the sum is accumulated. 
    
4. What a CRC-n polynomial, P(x), can protect from? 
    
   Assuming n check bits we can claim the following error detection 
   properties without placing much restriction on the form of P(x). 
    
4.1. Individual bit errors 
    
4.1.1. All Single bit errors provided p(x) has two or more terms and 
     regardless of record length. 
    
4.1.2. All Double bit errors provided p(x) has factor with 3 or more 
     terms and record length less than the period of p(x). 
    
4.1.2.1.  The period of P(x) depends on its form and for most standard 
      polynomials is within a factor of 4 of 2^n-1. Even for n=32 this 
      number is much larger than the record length (4.3 billion bits). 
    

Cavanna                                                       [Page 3] 

Internet-Draft         iSCSI û CRC or Checksum          February 2001 

4.1.3. All triple bit errors and odd number of errors provided p(x) 
     contains the factor 1+x^c  where c is an integer and regardless of 
     record length. Most CRC polynomials contain the factor 1+x. 
    
    
4.1.4. Most occurrences of more than three single bit errors. 
    
4.1.5. Thus most CRCs, with few restrictions on their form, will detect 
     all possible 3 bits in error in practical record lengths and most 
     occurrences of more than three single bit errors. 
    
4.2. Burst errors 
    
4.2.1. All single bursts of length up to n (size of CRC) regardless of 
     record length. 
    
4.2.2. 1-1/2^(n-1) * 100 % of single bursts of length n+1 if all errors 
     patterns are equally probable. 
    
4.2.3. 1-1/2^(n) * 100 % of single bursts of length greater than n+1 if 
     all errors patterns are equally probable. For n=32 this is 
     99.9999999767% of all single bursts. 
    
4.2.4. If P(x) has (x^c+1) as a factor it will detect single, double 
     and triple bit errors and double bursts such that length of 
     shorter burst does not exceed degree of p(x) and the sum of two 
     bursts lengths does not exceed c+1, provided the record length 
     including check bits does not exceed the period of p(x). 
    
4.2.5. The majority of remaining bursts. 
    
    
Cavanna                                                       [Page 4] 

Internet-Draft         iSCSI û CRC or Checksum          February 2001 

5. CRC-64 is overkill 
    
   Protection length increases to 2^64 or 1.8 x10^19 bits. This 
   increased protection length is not needed for iSCSI. The fraction of 
   errors that are detected when all errors are equiprobable is 1-
   1/2^64. When all errors are equiprobable the fraction detected by 
   CRC32, 99.9999999767%, is large enough. The size of single burst 
   that is always detected is 64 whereas for CRC32 it is 32 i.e. CRC64 
   will catch ALL single bursts of size 64 whereas CRC32 will only 
   catch MOST such bursts. Lots more errors are, of course, detected by 
   CRC64 than CRC32. When all errors are equiprobable the fraction of 
   errors detected is 1-1/2^64 vs 1-1/2^32 for CRC-32. This number was 
   99.9999999767% for CRC-32 and is much higher for CRC-64. Note that a 
   checksum with 64 bits will detect the same fraction û when all 
   errors are equally probable [ref3]. 
    
6. What Fletcher-32/Adler-32 Checksum can protect from? 
    
   Assume 2s complement sum. We can claim the following protection [ref 
   1]. 
    
6.1. All single bit errors. 
    
6.2. All single bursts of less than 32 bits. 
    
6.3. Most other errors. 
    
6.4. 1-1/2^32*100 % of all errors when all errors are equiprobable. 
  This is the same as CRC-32 [ref 3] 
    
7. Why Adler checksum is not much better than Fletcher checksum for 
 iSCSI 
    
  Adler-32 is actually better than Fletcher-16 but primarily due to the 
  increase in the number of check bits. But if we compare to 32 bit 
  Fletcher there are only two differences both of which are minor in 
  the context of error protection for iSCSI: 
    
7.1. Initial value for S1 is 1 to make length of sequence part of S2 so 
  extra or missing zero bytes will produce different checksum. Since 
  the length of the data record is easily verifiable by other means 
  this is not a significant benefit. 
    
7.2. Modulo is 65521 (a prime number) to avoid a possible large class 
  of two-byte errors from going undetected. It is not clear that the 
  type of errors iSCSI needs to protect from falls into this class. I 
  suspect not. 
    
8. Example of errors that are not detected 
    
8.1. [Ref2] shows some patterns that Fletcher and Adler will not 
  detect.  
    

Cavanna                                                       [Page 5] 

Internet-Draft         iSCSI û CRC or Checksum          February 2001 

8.2. Similar patterns can easily be determined for CRC-32. They will 
  likely be more complicated patterns. Next week I will publish an 
  equation similar to that in [Ref 2]. 
    
9. Classes of errors to protect against 
    
  Note that iSCSI is not the primary level of protection. 
    
9.1. Errors introduced at layers below iSCSI by sending or receiving 
  node. 
    
9.1.1. Software errors (buffer mismanagement). 
    
9.1.2. DMA hardware that fail to DMA all the data or DMA more data than 
     they should or corrupt the data. 
    
9.2. Errors introduced by middle boxes that recomputed checksums for 
  layers below iSCSI after introducing intentional and perhaps 
  unintentional changes. 
    
9.3. Unintentional changes in middle boxes and end nodes 
    
9.3.1. Memory errors (random and pattern dependent) 
    
9.3.2. Bus errors (widths: 16,32,64,128) 
    
9.3.3. Fifo overruns and underruns 
    
9.4. Summarizing probable error patterns 
    
9.4.1. Missing data in chunks of 16, 32, 64, 128 bits 
    
9.4.2. Extra data in chunks of 16, 32, 64, 128 
    
9.4.3. Changed data in chunks of 1, 16, 32, 64, 128 
    
9.4.4. Periodic single bit errors every 16 or 32 or 64 or 128 bits. 
    
10.     Comparing CRC with Checksum 
    
10.1.   CRC is Division rather than addition. 
    
10.2.   Small change in input affects many bits of the CRC. 
    
10.3.   Number of bits is most important 
    
10.4.   The form is not so important for the types of errors that we 
  need to detect 
    
10.5.   CRCs are more sensitive to the bursty nature of communication 
  line noise and will detect ALL bursts shorter than the size of the 
  CRC.  This is not the job of the iSCSI CRC but of the frame level 
  CRC. The ones complement sum is not insensitive to these error 
  patterns, it is just not especially sensitive to them. The extra 
  sensitivity of a CRC to burst errors comes at the expense of lessened 
Cavanna                                                       [Page 6] 

Internet-Draft         iSCSI û CRC or Checksum          February 2001 

  sensitivity to other bit pattern errors since the total fraction of 
  errors detected remains the same. See Ref [3] 
    
10.6.   CRC is more likely to detect error patterns displaying 
  regularity such as periodic single bit errors but this comes at the 
  price of decreased sensitivity to other types of errors since the 
  total fraction of error patterns detected, when all are equiprobable, 
  for a given number of check bits is the same for CRC or CHECKSUM. 
    
10.7.   For small enough number of errors CRC is guaranteed to catch 
  ALL patterns as explained in a previous section and thus, if error 
  patterns with small number of errors are more common, then CRC 
  provides better error coverage. Again, this comes at the expense of 
  decreased sensitivity to other error patterns although such patterns 
  tend to be irregular and presumably less common. 
    
10.8.   CHECKSUM, by contrast has more holes for error patterns with 
  small number of errors but CHECKSUM compensates by catching errors 
  that CRC will miss. Again, the total fraction of error patterns 
  detected only depends on the number of check bits [ref3]. 
    
10.9.   In a physical link with a certain bit error rate and random 
  errors it is certainly true that error patterns with small number of 
  errors are more probable and there is great attraction to use a CRC 
  that can guarantee 100% protection from all error patterns that have 
  a significant probability of occurring. For example if the bit error 
  rate is low enough that the probability of 3 or more bits in error 
  within a data record is so low that, at whatever bit rate the link 
  runs, the probability of 3 or more bits is negligible then the CRC 
  offers practically complete protection. Again, it is the function of 
  the link CRC to provide this protection. 
    
10.10.  For group-encoded data, detecting all occurrences of 2 bits or 
  less implies detecting all occurrences of two bursts of size that 
  depends on the group encoding. For example for 8B:10B encoding the 
  bursts are 10 bits wide. CRCs can be easily designed to guarantee 
  100% protection from 2 bursts of 10 bits or less occurring anywhere 
  within a record (of practical length). 
    
10.11.  We need to characterize the types of errors that we need to 
  protect from and confirm that CHECKSUM provides sufficient 
  protection. 
    
    
11.     Checksum partials easier to handle 
    
        When an iSCSI message is composed of multiple blocks that 
        arrive at different time; perhaps out of sequence the receiver 
        needs to accumulate partial digests (CRC or checksum) from the 
        blocks. 
            
        Combining the partial digests appears to be significantly 
        easier (faster) for checksums than for CRCs. 
    
12.     Recommendation 
Cavanna                                                       [Page 7] 

Internet-Draft         iSCSI û CRC or Checksum          February 2001 

    
        Fletcher-32 is first choice. There is not sufficient benefit in 
        Adler-32 to justify the additional complication of the mod 
        65521 arithmetic. 
            
        CRC-32 is second choice. There is not sufficient benefit in 
        CRC-64 to justify the additional complexity and overhead. 
    
13.     References 
    
        [1] Performance of Checksums and CRCÆs over Real Data by 
        Jonathan Stone, Michael Greenwald, Craig Partridge. IEEE/ACM 
        Transactions on Networking, Vol. 6, no. 5, October 1998 
            
        [2] Memo from Dr. Dafna Sheinwald from the IBM Haifa Research 
        Lab to iSCSI group via Julian Satran of IBM. 
            
        [3] FITS Checksum Proposal by R.L. Seaman and W.D. Pence 24 
        August 1995. 
        http://rosat.gsfc.nasa.gov/docs/heasarc/ofwg/docs/general/check
        sum/node18.html 
 
 
14.     Authors' Addresses 
    
   Vicente Cavanna 
   1101 Creekside Ridge Drive Suite 100 
   Roseville, CA 95661 
   USA 
   Phone: 916-788-5797 
   Email: vince_cavanna@agilent.com 
    
   Matt Wakeley 
   1101 Creekside Ridge Drive Suite 100 
   Roseville, CA 95661 
   USA 
   Phone: 916-788-5670 
   Email: matt_wakeley@agilent.com 
    
    
   Expires August 2001 


Cavanna                                                       [Page 8]