SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: iSCSI: error recovery



    At 02:46 PM 10/30/00 -0800, Matt Wakeley wrote:
    >julian_satran@il.ibm.com wrote:
    >
    > > Matt,
    > >
    > > I think I read your note and I still maintain that the target will fare
    > > better and the initiator does not have to do anything different.
    > >
    > > When failing over the initiator will reissue the command (including all
    > > scatter gather lists) to the new HBA. It is the target that will send only
    > > the buffers he has and as long as the initiator is not scoreboarding it
    > > does not have to do anything different the second time than first.
    >
    >You are making the *big* assuption that an iSCSI initiator will "confirm" the
    >receipt of this "numbered" data after the data has been transfered to 
    >initiator
    >host memory.  What if it's buffered on the card somewhere, and the card dies
    >and the system fails over to a different card?  (or perhaps an I/O subsystem
    >fails and the system "fails over" to a standby subsystem) How is the initiator
    >going to be absolutely sure that the "partial" I/O on the first card plus the
    >"partial" I/O on the second card equal a complete error free I/O?
    
    Acknowledgements should not be generated until a responder has received the 
    data and placed it into the fault zone, i.e. the location where if a 
    failure occurs, the session is aborted.  If the NIC generates the 
    acknowledgement, then it should have either delivered it to host memory or 
    upon its failure detection, the host will fail the session.  To do anything 
    else adds undue complexity with little real application benefit.
    
    Hence, for fail-over from one set of hardware to another, there should be a 
    clean indication of where one restarts the operation.  In general, a 
    sequence number on all data units can provide a faster recovery by no 
    repeating the entire data set's retransmission.  Is this worth it?  For 
    large transfers, i.e. measured in MB, yes; for small transfers, no.  Again, 
    there should be only one way to accomplish this in the spec and my 
    preference would be to always sequence number all of these transactions and 
    have the command interpretation decide whether to enforce that sequence 
    number and the recovery starting point upon failure.  Simplifies hardware 
    and provides future flexibility.
    
    Mike 
    
    


Home

Last updated: Tue Sep 04 01:06:34 2001
6315 messages in chronological order