SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: remove recovery from transport-layer connection failure(?)



    Somesh,
    
    Are you referring to 802.1Q, 802.1D, or HSRP or just EtherChannel in
    general?
    
    Doug
    
    
    > -----Original Message-----
    > From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
    > GUPTA,SOMESH (HP-Cupertino,ex1)
    > Sent: Monday, October 02, 2000 1:48 PM
    > To: julian_satran@il.ibm.com; ips@ece.cmu.edu
    > Subject: RE: iSCSI: remove recovery from transport-layer connection
    > failure(?)
    >
    >
    > Julian,
    >
    > If the scenario you point out is correct (a single command lasting
    > for such a long time), then of course we need a mechanism where
    > we can restart the command from the approximate point of failure.
    > However that would be failures lasting for "more than a fraction
    > of a sec".
    >
    > First of all, a TCP connection does not indicate a failure that
    > quickly. Secondly, there are ways to recover from a path failure
    > and still preserve a TCP connection in High-Availability environments.
    > I am sure most system vendors would be implementing such techniques.
    >
    > Somesh
    >
    > -----Original Message-----
    > From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
    > Sent: Sunday, October 01, 2000 1:52 AM
    > To: ips@ece.cmu.edu
    > Subject: Re: iSCSI: remove recovery from transport-layer connection
    > failure(?)
    >
    >
    >
    >
    > Steph,
    >
    > Assume than in the new wonderfull SAN world you have started a
    > disk-to-tape
    > (or disk-to-disk) long third party copy. The SAN is fine and the copy
    > proceeds for an hour
    > but the lousy initiator-to-copy-manager link (on which
    > accidentally no data
    > transfer took place) fails for a fraction of a second.
    > Should we restart the command under-the-cover or drop it or ask
    > the parties to provide state information to a specific SCSI
    > restart driver?
    >
    > And we can build many similar scenarios.
    >
    > I think that whatever we can do simplify exception handling we should do
    > (the same arguments that hold for multiple connections hold here too).
    >
    > I would add that in Ideal world - I would like to have transport
    > "splice" a
    > new TCP
    > connection with an old TCP connection but failing this to happen (again
    > SCTP is doing it already or not?) we should take care that simple events
    > like a cable taken-out
    > in some obscure part of the network will only seldom affect higher layers.
    >
    > Julo
    >
    > Stephen Bailey <steph@cs.uchicago.edu> on 27/09/2000 07:16:12
    >
    > Please respond to Stephen Bailey <steph@cs.uchicago.edu>
    >
    > To:   ips@ece.cmu.edu
    > cc:    (bcc: Julian Satran/Haifa/IBM)
    > Subject:  Re: iSCSI: remove recovery from transport-layer connection
    >       failure(?)
    >
    >
    >
    >
    > > Currently, iSCSI is spec'ed to recover from transport-layer
    > > connection failures.
    > >
    > > The main motivation for this decision was to support tape backup
    > > applications that are quite sensitive to any failures that get
    > > propogated to their layer.
    > >
    > > So, perhaps we can remove the requirement of recovering from
    > > transport-layer connection failures in iSCSI. This would simplify
    > > the protocol somewhat.
    > >
    > > Thoughts?
    >
    > I'm all for eliminating command recovery.
    >
    > There seem to be several reasons advanced for command recovery.
    >
    > The first seems to be based upon an inappropriate analogy to FCP.
    > Command recovery had to be added to FCP-2 because the FC layer is
    > unreliable.  A single dropped FC frame leads to a failed FCP command.
    > This clearly upsets tape operation even when the link is performing
    > nominally.  In FCP, without command recovery, with some observable
    > frequency, you will get an expected error that leads to complete,
    > irrecoverable failure of a transfer stream.  The other thing that
    > makes FCP-2 command recovery work well is when you are doing a write,
    > which is 90% (maybe it's 99%?) of tape operation, the target can
    > return an early indication of most frame drops, rather than waiting
    > for a timer to expire.
    >
    > TCP's reliability solves this problem in another way.  By the time you
    > get a TCP connection failure, you have already exhausted a set of
    > reliability mechanisms which guarantee, with high certainty, that
    > further data can not be transferred between the two endpoints.
    >
    > `the two endpoints' phrase suggests the other reason advanced for
    > command recovery.  That is, to permit path failover for commands which
    > are not idempotent, such as tape write sequential.  The
    > problem with this, is that it is not clear HOW iSCSI command recovery
    > can actually work properly, given a TCP connection failure indication.
    > It takes a long time for a TCP connection to fail, and by that time,
    > I'm not sure recovery would reasonably be possible.  Perhaps I'm in
    > error on this assumption.  Can a tape guru (Joe from Exabyte?) comment
    > on whether recovery would be possible after many seconds (tens,
    > hundreds) have elapsed?
    >
    > The SCSI layer has never been solely responsible for ensuring reliable
    > backup.  Macro scale things go wrong with tape (run off the end, get
    > eaten, etc..) with relatively high frequency.  A low level backup
    > engine like tar or dump will fail on a SCSI error, and that's OK.
    > There must also be a higher level software component like Amanda,
    > which manages retries, including operator intervention, to ensure
    > reliable backup.
    >
    > It seems like whether iSCSI has a command recovery mechanism should be
    > a function of whether somebody can stand up and say for sure that it
    > solves a real problem.  So far it only seems like it MIGHT solve a
    > problem.  Who can say `this solves MY problem!'?
    >
    > Steph
    >
    
    


Home

Last updated: Tue Sep 04 01:06:54 2001
6315 messages in chronological order