SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: SNACK and recovery



    > - Does a 16-bit TCP checksum catch enough of
    > the corruption events to make it acceptable to
    > take drastic measures like aborting a backup
    > when a 32 bit CRC fails on a response that
    > made it through the 16 bit checksum?
    
    Absolutely.
    
    Events which create end-to-end integrity check errors are as handily
    caught by TCP checksum as a CRC.  Link errors are caught by link
    integrity checks, so that is not for the e2e check to protect.  The
    remaining errors which are detectable by an e2e check have a signature
    that most any check that's not blind stupid will detect.  For example,
    back in the day, VMS's clustering software ran on Ethernet, and there
    were many problems as a result of an early generation Ethernet
    controller (my group...) corrupting data.  So, the VMS folks said, to
    heck with performance, we're going to put a checksum on every cluster
    packet.  Problem absolutely solved.  I don't know what the checksum
    algorithm was, but it was not a CRC.  It was more like the TCP
    checksum.
    
    The TCP checksum escape evidence in the papers seems to be primarly in
    paths which are not actually protected by it (host end points).
    
    Looking at it from the other direction, backups have historically
    always had to handle occasional problems, which has resulted in the
    implementation of high-level recovery mechanisms.
    
    Who can say with absolute certainly, and first-hand experience that
    there WILL be a high frequency of checksum escapes which don't also
    escape a CRC?  It seems a somewhat unlikely scenario, and my concern
    is that we're making, complicated, incremental improvements for
    handling a situation which will not occur.
    
    It would be one thing if there were NO e2e check, or if the e2e check
    also had to protect against link errors, or if the existing e2e check
    were completely trivial, but that is just not the case here.
    
    Steph
    


Home

Last updated: Tue Sep 04 01:05:08 2001
6315 messages in chronological order