SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI: error recovery



    Mike,
    
    Michael Krause wrote:
    
    > Acknowledgements should not be generated until a responder has received the
    > data and placed it into the fault zone, i.e. the location where if a
    > failure occurs, the session is aborted.  If the NIC generates the
    > acknowledgement, then it should have either delivered it to host memory
    
    If a NIC presents the data to its (PCI) bus, how is it supposed to know the data
    really made it across perhaps multiples of bus bridges to the memory before it
    sends the "ack"?
    
    > or upon its failure detection, the host will fail the session.
    
    ... but the goal is to NOT fail the session, but rather "fail over" to another NIC
    (hence a different TCP connection on the same session).
    
    Since the NIC presented the data to the (PCI) bus, it did all it could to transfer
    the data.  It has no way to know that a bus bridge broke or something, and the data
    didn't make it, so it "acked" the data.
    
    Using Julian's current scheme, the initiator will send the "retry" command to the
    target (over the new NIC and TCP connection), but the target thinks it "knows
    what's best" for the initiator and only sends the data it thinks the initiator
    didn't get.
    
    >  To do anything
    > else adds undue complexity with little real application benefit.
    >
    > Hence, for fail-over from one set of hardware to another, there should be a
    > clean indication of where one restarts the operation.
    
    And I says the easiest and cleanest restart position is at the beginning.
    
    >  In general, a
    > sequence number on all data units can provide a faster recovery by no
    > repeating the entire data set's retransmission.  Is this worth it?  For
    > large transfers, i.e. measured in MB, yes;
    
    Why is the answer "yes"?  Remember, this only happens very rarely in the first
    place.  Why optimize for it?
    
    > for small transfers, no.  Again,
    > there should be only one way to accomplish this in the spec and my
    > preference would be to always sequence number all of these transactions
    
    unless you want to "stich together" pieces of I/Os, there is no need to number data
    PDUs.
    
    > and
    > have the command interpretation decide whether to enforce that sequence
    > number and the recovery starting point upon failure.  Simplifies hardware
    > and provides future flexibility.
    >
    > Mike
    
    -Matt
    
    
    


Home

Last updated: Tue Sep 04 01:06:32 2001
6315 messages in chronological order