RE: iSCSI: error recovery

To: "Michael Krause" <krause@cup.hp.com>, <ips@ece.cmu.edu>
Subject: RE: iSCSI: error recovery
From: "Douglas Otis" <dotis@sanlight.net>
Date: Tue, 31 Oct 2000 13:58:31 -0800
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;charset="iso-8859-1"
Importance: Normal
In-Reply-To: <4.2.2.20001031053620.00c957a0@hpindlm.cup.hp.com>
Sender: owner-ips@ece.cmu.edu

Matt,

Failures could be just a periodic network routing flap.  At the transport
level, recovery may try an alternative IP as the first effort in a recovery
process.  For this recovery to be meaningful, the protocol should allow a
different adapter to pickup from where another adapter left off.  Failing at
the SCSI layer should be the last choice for recovery.

Would the adapter provide data numbering in a local sense as I think you
once suggested?  If the adapter provides data numbering, how would a
different adapter know what was acknowledged or how data was numbered?
"Pickup from where?" would be the question difficult to answer.  Assume one
adapter is by 3Com and another is by Intel with various levels of
acceleration.  How would these adapters communicate to each other?  How is
this determined within the specification?  Should the specification make
transport recovery clear in all cases?  Requiring connection allegiance
assumes isolation of resource information.  What portions of this
information must not be isolated for recovery?

Doug

> At 02:46 PM 10/30/00 -0800, Matt Wakeley wrote:
> >julian_satran@il.ibm.com wrote:
> >
> > > Matt,
> > >
> > > I think I read your note and I still maintain that the target
> will fare
> > > better and the initiator does not have to do anything different.
> > >
> > > When failing over the initiator will reissue the command
> (including all
> > > scatter gather lists) to the new HBA. It is the target that
> will send only
> > > the buffers he has and as long as the initiator is not
> scoreboarding it
> > > does not have to do anything different the second time than first.
> >
> >You are making the *big* assuption that an iSCSI initiator will
> "confirm" the
> >receipt of this "numbered" data after the data has been transfered to
> >initiator
> >host memory.  What if it's buffered on the card somewhere, and
> the card dies
> >and the system fails over to a different card?  (or perhaps an
> I/O subsystem
> >fails and the system "fails over" to a standby subsystem) How is
> the initiator
> >going to be absolutely sure that the "partial" I/O on the first
> card plus the
> >"partial" I/O on the second card equal a complete error free I/O?
>
> Acknowledgements should not be generated until a responder has
> received the
> data and placed it into the fault zone, i.e. the location where if a
> failure occurs, the session is aborted.  If the NIC generates the
> acknowledgement, then it should have either delivered it to host
> memory or
> upon its failure detection, the host will fail the session.  To
> do anything
> else adds undue complexity with little real application benefit.
>
> Hence, for fail-over from one set of hardware to another, there
> should be a
> clean indication of where one restarts the operation.  In general, a
> sequence number on all data units can provide a faster recovery by no
> repeating the entire data set's retransmission.  Is this worth it?  For
> large transfers, i.e. measured in MB, yes; for small transfers,
> no.  Again,
> there should be only one way to accomplish this in the spec and my
> preference would be to always sequence number all of these
> transactions and
> have the command interpretation decide whether to enforce that sequence
> number and the recovery starting point upon failure.  Simplifies hardware
> and provides future flexibility.
>
> Mike
>

References:
- Re: iSCSI: error recovery
  - From: Michael Krause <krause@cup.hp.com>

Prev by Date: Re: Keep-alive traffic (was iSCSI: more on StatRN)
Next by Date: Re: iSCSI: error recovery
Prev by thread: Re: iSCSI: error recovery
Next by thread: Re: iSCSI: error recovery
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:06:32 2001
6315 messages in chronological order