RE: iSCSI: remove recovery from transport-layer connection failure(?)

To: Douglas Otis <dotis@sanlight.net>, "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com>, julian_satran@il.ibm.com, ips@ece.cmu.edu
Subject: RE: iSCSI: remove recovery from transport-layer connection failure(?)
From: "GUPTA,SOMESH (HP-Cupertino,ex1)" <somesh_gupta@am.exch.hp.com>
Date: Mon, 2 Oct 2000 18:15:45 -0600
Content-Type: text/plain;charset="iso-8859-1"
Sender: owner-ips@ece.cmu.edu

The techniques take advantage of the protocol layering. A
tcp connection is tied to an IP address and the route table
tells you the next hop and the arp table specifies the
IP address to MAC address translation.

Without giving you the exact details (I don't know the whether
proper behavior on the mailing list requires me to spill the beans),
you can see that there is significant flexibility to recover from
any kind of failure.

-----Original Message-----
From: Douglas Otis [mailto:dotis@sanlight.net]
Sent: Monday, October 02, 2000 3:25 PM
To: GUPTA,SOMESH (HP-Cupertino,ex1); julian_satran@il.ibm.com;
ips@ece.cmu.edu
Subject: RE: iSCSI: remove recovery from transport-layer connection
failure(?)


Somesh,

Are you referring to 802.1Q, 802.1D, or HSRP or just EtherChannel in
general?

Doug


> -----Original Message-----
> From: owner-ips@ece.cmu.edu [mailto:owner-ips@ece.cmu.edu]On Behalf Of
> GUPTA,SOMESH (HP-Cupertino,ex1)
> Sent: Monday, October 02, 2000 1:48 PM
> To: julian_satran@il.ibm.com; ips@ece.cmu.edu
> Subject: RE: iSCSI: remove recovery from transport-layer connection
> failure(?)
>
>
> Julian,
>
> If the scenario you point out is correct (a single command lasting
> for such a long time), then of course we need a mechanism where
> we can restart the command from the approximate point of failure.
> However that would be failures lasting for "more than a fraction
> of a sec".
>
> First of all, a TCP connection does not indicate a failure that
> quickly. Secondly, there are ways to recover from a path failure
> and still preserve a TCP connection in High-Availability environments.
> I am sure most system vendors would be implementing such techniques.
>
> Somesh
>
> -----Original Message-----
> From: julian_satran@il.ibm.com [mailto:julian_satran@il.ibm.com]
> Sent: Sunday, October 01, 2000 1:52 AM
> To: ips@ece.cmu.edu
> Subject: Re: iSCSI: remove recovery from transport-layer connection
> failure(?)
>
>
>
>
> Steph,
>
> Assume than in the new wonderfull SAN world you have started a
> disk-to-tape
> (or disk-to-disk) long third party copy. The SAN is fine and the copy
> proceeds for an hour
> but the lousy initiator-to-copy-manager link (on which
> accidentally no data
> transfer took place) fails for a fraction of a second.
> Should we restart the command under-the-cover or drop it or ask
> the parties to provide state information to a specific SCSI
> restart driver?
>
> And we can build many similar scenarios.
>
> I think that whatever we can do simplify exception handling we should do
> (the same arguments that hold for multiple connections hold here too).
>
> I would add that in Ideal world - I would like to have transport
> "splice" a
> new TCP
> connection with an old TCP connection but failing this to happen (again
> SCTP is doing it already or not?) we should take care that simple events
> like a cable taken-out
> in some obscure part of the network will only seldom affect higher layers.
>
> Julo
>
> Stephen Bailey <steph@cs.uchicago.edu> on 27/09/2000 07:16:12
>
> Please respond to Stephen Bailey <steph@cs.uchicago.edu>
>
> To:   ips@ece.cmu.edu
> cc:    (bcc: Julian Satran/Haifa/IBM)
> Subject:  Re: iSCSI: remove recovery from transport-layer connection
>       failure(?)
>
>
>
>
> > Currently, iSCSI is spec'ed to recover from transport-layer
> > connection failures.
> >
> > The main motivation for this decision was to support tape backup
> > applications that are quite sensitive to any failures that get
> > propogated to their layer.
> >
> > So, perhaps we can remove the requirement of recovering from
> > transport-layer connection failures in iSCSI. This would simplify
> > the protocol somewhat.
> >
> > Thoughts?
>
> I'm all for eliminating command recovery.
>
> There seem to be several reasons advanced for command recovery.
>
> The first seems to be based upon an inappropriate analogy to FCP.
> Command recovery had to be added to FCP-2 because the FC layer is
> unreliable.  A single dropped FC frame leads to a failed FCP command.
> This clearly upsets tape operation even when the link is performing
> nominally.  In FCP, without command recovery, with some observable
> frequency, you will get an expected error that leads to complete,
> irrecoverable failure of a transfer stream.  The other thing that
> makes FCP-2 command recovery work well is when you are doing a write,
> which is 90% (maybe it's 99%?) of tape operation, the target can
> return an early indication of most frame drops, rather than waiting
> for a timer to expire.
>
> TCP's reliability solves this problem in another way.  By the time you
> get a TCP connection failure, you have already exhausted a set of
> reliability mechanisms which guarantee, with high certainty, that
> further data can not be transferred between the two endpoints.
>
> `the two endpoints' phrase suggests the other reason advanced for
> command recovery.  That is, to permit path failover for commands which
> are not idempotent, such as tape write sequential.  The
> problem with this, is that it is not clear HOW iSCSI command recovery
> can actually work properly, given a TCP connection failure indication.
> It takes a long time for a TCP connection to fail, and by that time,
> I'm not sure recovery would reasonably be possible.  Perhaps I'm in
> error on this assumption.  Can a tape guru (Joe from Exabyte?) comment
> on whether recovery would be possible after many seconds (tens,
> hundreds) have elapsed?
>
> The SCSI layer has never been solely responsible for ensuring reliable
> backup.  Macro scale things go wrong with tape (run off the end, get
> eaten, etc..) with relatively high frequency.  A low level backup
> engine like tar or dump will fail on a SCSI error, and that's OK.
> There must also be a higher level software component like Amanda,
> which manages retries, including operator intervention, to ensure
> reliable backup.
>
> It seems like whether iSCSI has a command recovery mechanism should be
> a function of whether somebody can stand up and say for sure that it
> solves a real problem.  So far it only seems like it MIGHT solve a
> problem.  Who can say `this solves MY problem!'?
>
> Steph
>

Prev by Date: RE: iSCSI sessions: Let's try again
Next by Date: RE: iSCSI: Flow Control
Prev by thread: RE: iSCSI: remove recovery from transport-layer connection failure(?)
Next by thread: RE: An IPS Transport Protocol (was A Transport Protocol Without ACK)
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:06:52 2001
6315 messages in chronological order