SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI ERT: data SACK/replay buffer/"semi-transport"



    Steph,
    
    Not to beat a dead horse, the reason link level CRCs may not be of much help
    is because of the following.
    
    The paper "When the CRC and TCP Checksum Disagree" section 5.1 describes the
    data transmission path and potential for error introduction at various
    points in the path.
    
    At a layer 3 device upon you have:
    
    1. The existing link-level CRC verified and stripped.
    
    2. The payload (IP packet) DMA'ed into some buffers, preserving the original
    IP header checksums and TCP checksums.
    
    3. Create a new link-level header.
    
    4. Compute a new CRC.
    
    5. Data sent to the next hop.
    
    If an error is introduced (software or hardware) in steps 2 and 3, the new
    CRC introduced in step 4 isn't of any help. The introduced error can be:
    
    1. In the IP header (such as IP address bytes were munged).
    
    2. In the TCP header (such as the port got corrupted).
    
    3. In the TCP checksum itself.
    
    4. In the payload.
    
    Error categories 1 and 2 may cause the packet to be not delivered at all. It
    is okay if we do not detect these because they are not delivered to the
    iSCSI processing layer. Error 3 would cause the packet to be rejected. Error
    4 should normally catch the error, but at an escape rate of 1 in 10e8
    escapes detection. (Actually I'm not sure if given the error bias to the
    headers, this rate is the rate within the payload of TCP segment). The iSCSI
    header and data digest is present to detect that escape.
    
    In the presence of middle boxes that do more than layer 2 forwarding, (say a
    box that terminates a TCP connection and re-initiates a new connection) and
    if the middle box retains the iSCSI header and data digests but only
    computes a new checksum, the transmission path exposure is similar to 2 and
    3 above. The header and data digests will enable detection of that.
    
    If the middle box does more than just terminate TCP connections and changes
    the iSCSI header and recomputes a new iSCSI header digest and leaves the
    data digest alone, at least the data part is protected, but not the header.
    If it changes both header and data, there is no protection. In order to get
    true end-to-end protection, the application needs to apply a separate
    digest, such as creating a 516-byte data block for every 512-byte sector of
    data and storing that in the media.
    
    So, the escape rate depends quite a bit on number of middle boxes and the
    exposure of data paths. How much do we rely on middle boxes to never
    introduce an error during the exposure? Since the referred papers suggest
    correct end-to-end delivery of TCP segments with checksum errors in them,
    the presence of exposed paths in the middle boxes has been a factor. Still,
    rates quoted (1 in 200 million or 1 in 300 million) suggests that it is
    necessary to have very strong CRC and detection mechanisms, but it may not
    be necessary to optimize the recovery options, so we are able to recover
    with the smallest amount of retransmission of data.
    
    I haven't studied the two other references on the subject, but again I
    suspect there is evidence to suggest that errors will creep in at
    intermediate processing elements.
    
    Venkat Rangan
    Rhapsody Networks Inc.
    http://www.rhapsodynetworks.com
    
    
    -----Original Message-----
    From: Stephen Bailey [mailto:steph@cs.uchicago.edu]
    Sent: Monday, April 09, 2001 11:57 AM
    To: ips@ece.cmu.edu
    Subject: Re: iSCSI ERT: data SACK/replay buffer/"semi-transport" 
    
    
    > Exactly, I've worked in this context (though its been some years now).
    > It was true (at one time) that tape had a tractability limit, e.g.,
    > a tape backup of a terabyte was out of the question.  Has that changed?
    
    I think this is precisely the point.  Existing, off-the-shelf SCSI
    solutions DO NOT presently solve this problem.  Both ||SCSI an FCP
    burp the operation on a expectable, O(days) failure rate.  The rate of
    adoption for the FCP-2 command recovery feature is overwhelming to the
    point that the tape guys have been talking about end-running the
    problem with explicitly addressed commands.
    
    What we have running iSCSI on TCP is such a drastic improvement in
    what you can expect from your SCSI service that we can eventually
    expect a disruptive change.  Trying to engineer it to the point where
    its 2^100 times more disruptive, when we don't really know where it's
    taking us in the first place is meaningless.
    
    [Warning: repetition ahead]
    
    TCP + link layer error detection is engineered precisely to ensure
    reliable data delivery.  It's clear from an engineering stand point
    that it is likely (not guaranteed, what is?) to do this quite well.
    In spite of much research, it seems like nobody here has come up with
    a strong indication that TCP + link layer error detection does NOT do
    its job well.  I do not think this is because nobody has ever looked
    at the problem.
    
    The lack of concrete information to support the case that TCP + link
    layer error detection is inadequate has us chasing our tails.
    
    Given the layer iSCSI occupies in the protocol layer cake, if we don't
    try to solve which is presently assigned to a lower layer, it seems
    quite comfortable to shim additional checks or recovery, or even a
    completely
    different transport substrate underneath if we do discover TCP + link
    layer error detection is not doing the trick, but it really seems like
    folly to engineer based upon an assumption that nobody has done a good
    job documenting.
    
    Steph
    


Home

Last updated: Tue Sep 04 01:05:08 2001
6315 messages in chronological order