Re: iSCSI: more on StatRN

To: ips@ece.cmu.edu
Subject: Re: iSCSI: more on StatRN
From: Stephen Bailey <steph@cs.uchicago.edu>
Date: Tue, 24 Oct 2000 18:03:43 -0500
In-Reply-To: Message from julian_satran@il.ibm.com of "Sat, 21 Oct 2000 14:35:40 +0300." <C125697F.0042957B.00@d12mta02.de.ibm.com>
Sender: owner-ips@ece.cmu.edu

Julian,

> The reason I suggested dropping connections after several format errors was
> tolerance to software "glitches".

'tolerating' software glitches usually means detecting them where
possible and making sure that you don't go off in the weeds as a
result of them.  Unfortunately, most (? should we vote by distinct
glitches, glitch occurences, or maybe the amount of time (wall clock?
programmer?) wasted by glitches %^) software glitches are not
recoverable by mere retry.  They require explicit work-around.
Therefore, I think the appropriate stance is to specify that the
detector should hit the source of the glitch with the biggest possible
hammer (connection reset) immediately.

Obviously, work-arounds will happen, and as a result, they'll violate
the SHALLs in the spec, but the fact is they're already addressing
other violations of the SHALL.  No big deal.

> The Check Condition is meant for cases in which SCSI can act - and yes from
> the transport POV the command has finished.

I guess the only point I'm trying to make is that I don't think SCSI
status should be used for conditions which are not already defined in
SAM/T10.  FCP and SST both define a `response' status mechanism which
is used to report conditions which can be reported in-line, but are
not SCSI generic.  For example, conflicting option flag settings in
the CMD PDU (other than those in the CDB).  A key point (of which
you're probably already aware), is that any error which CAN be
reported in-line should be reported in-line, to improve overall
responsiveness.

If you're already on top of all that, and I'm preaching to the choir,
right on.  If not, there it is.

> Dropped PDUs will help us avid DOS attacks with badly formed PDUs.

What's the DOS attack that this addresses?  Certainly PDUs outside a
connection will be dropped, but at the TCP layer before iSCSI ever
sees it.  Once an iSCSI connection is established, I don't see how
you're any more open or protected from a DOS attack.  Specifically,
you initiate a TCP connection close on the first bogus PDU, and while
you're closing you ignore everything that's not part of the close
protocol, right?

> And I will suggest activating the TCP keep alive option for early detection
> of link failures.

TCP keep alive has a chequered history, and may not be the right thing
here.  Stevens said somewhere (TCPI I think), that it's more chic to
have the ULP do keep alive if desired, which is where this whole
discussion started.

As long as you have no outstanding operations on a connection, neither
end probably needs (or wants, if you believe Stevens' arguments) a
keep alive.  Once you have operations in progress, the initiator is
already keeping timers on every operation, so connection failure can
initially be detected in that way.

The reason why we specified a connection viability check on operation
timeout in SST is to improve responsiveness during link failures.  You
don't NEED to do the viability test at all, in which case, each
operation will fail under its own timeout.  However, badly engineered
FC implementations have shown that it's important to detect failure as
early as possible where ever possible.  Otherwise the system can get
extremely sluggish.

And then there's the issue of the target recovering resources in a
bounded amount of time.  In SST we specified that the target shall
perform keep alives for this reason.  In iSCSI, I would suggest that
it would be approprate to specify that targets MAY perform an iSCSI
keep alive when they have live commands on a connection if they care
about recovering their resources.

The key thing to remember about keep alives is that iSCSI endpoints
may have extremely high connectivity degree, but are likely to have
many inactive connections.  Having everybody banging away on each
other with keep alives could have a substantial cost (or was everybody
planning to hardware accelerate the keep alives :-?)

Steph

Follow-Ups:
- RE: iSCSI: more on StatRN
  - From: "Douglas Otis" <dotis@sanlight.net>

References:
- Re: iSCSI: more on StatRN
  - From: julian_satran@il.ibm.com

Prev by Date: Re: some numbering clarifications
Next by Date: RE: iSCSI: more on StatRN
Prev by thread: Re: iSCSI: more on StatRN
Next by thread: RE: iSCSI: more on StatRN
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:06:36 2001
6315 messages in chronological order