RE: Avoiding deadlock in iSCSI

To: ips@ece.cmu.edu
Subject: RE: Avoiding deadlock in iSCSI
From: David Robinson <David.Robinson@EBay.Sun.COM>
Date: Mon, 11 Sep 2000 18:36:15 -0700 (PDT)
Content-MD5: rmz+zsQiFV5U6j95o6mfdA==
Content-Type: TEXT/plain; charset=us-ascii
Reply-To: David Robinson <David.Robinson@EBay.Sun.COM>
Sender: owner-ips@ece.cmu.edu
Thanks for the information, I think part of my confusion is the difference
in mapping SCSI from a datagram protocol onto a reliable stream protocol.
In a datagram protocol if the data is sent without the receiver's
cooperation, the receiver's buffers may not be adequate and the
data must get discarded.  Credits and RTT can be used to handle
this case.

With a reliable stream transport like TCP, you don't get into
this situation because the receiver will never open the TCP
window beyond it's buffer capacity. For low amounts of buffering
it might not be as efficient as using RTT, but there is no
correctness or deadlock issues. Because each sender has its own
connection and own flow control they are independantly handled.
Likewise with seperate data connections, each is also flow controlled
so "unsolicted" data is not an issue.  "Overflow" conditions simply
never occur.

The only major design criteria is that the sender MUST maintain
ordering of data sent on any connection.  Data Dn MUST always be
sent before data Dm where n < m. In particular, if unsolicated
data and RTT is mixed, the sender cannot send data Dm before it
has recieved an RTT for data Dn if both are to use the same connection.

	-David

> I think people have been meaning "unsolicited data" to really mean data sent
> to a receiver without that receiver having first indicated that there is
> enough buffering to hold the data.  For initiators acting as receivers they
> have to verify this before they initiate the command (not enough space for
> the whole command?  Then break up the command.)  For Targets this requires
> something like a credit mechanism with RTTs being used.
> 
> So there is an "unsolicited command" problem and an "unsolicited data"
> problem.  In both cases the sender creates the problem by not first
> reserving with the receiver enough resources for the commands/data.
>   
> In the command case there is no SCSI mechanism to reserve resources (QUEUE
> FULL is used to indicate overflows).  Historically it has been assumed that
> queues of commands do not overflow often in practice.  In reality initiators
> have often artificially limited the number of commands they are willing to
> try and queue at the target in order to avoid this rejection (a loss
> opportunity in my mind).
> 
> In the data case there is no "DATA QUEUE FULL" - instead, an explicit credit
> model of some sort is used to indicate the receiver has reserved space for
> the data (REQs in parallel SCSI, BB credits in Fibre Channel).  In this case
> the assumption was that data overflows would occur a lot otherwise.
> 
> You can solve these problems by rejecting the overflow cleanly (as SCSI does
> with commands), which is low latency and works well under light loads.  Or
> you can do credits.  Credits add latency, or get you into the problem of
> credit allocation, which can be optimized for light load (over allocate
> credits) or heavy loads (allocate only what you have), but not both at once.
> 
> Historically, SCSI has used rejection for commands and credits for data,
> optimized for heavy loads.  But this is only a T10 given rule, not a God
> given rule (although some of us who have served on T10 can get that confused
> at times :-)).
> 
> Hope this helps.
> 
> Jim
> 
> 
> 
> 
> 
> cases there are well known mechanisms to reserve the 
> 
> -----Original Message-----
> From: David Robinson [mailto:David.Robinson@EBay.Sun.COM]
> Sent: Monday, September 11, 2000 3:35 PM
> To: ips@ece.cmu.edu
> Subject: Re: Avoiding deadlock in iSCSI
> 
> 
> I think in following this discussion the terminology has been
> confusing me.  When I read "unsolicited data" I interpreted that
> to mean data for which no command has yet been sent. In general
> I consider that to be a bug and the receiver should just drop the
> data on the floor.  The only possible scenerio where it might
> not be a bug is if a command was sent on one connection and the
> data on the data connection arrived first, thus it is unsolicited.
> My first assumption is that the sender would not send commands
> C1 and C2 and data D2 and D1 on the same connection. Doing that
> creates nasty ordering problems we want to avoid.  So if the
> receiver simply allows the data connection TCP window to shrink
> the unsolicted data will flow control to a stop until the command
> queue catches up.  With multiple data connections, some may flow
> control but the active command will be able to make progress on
> one connection. This may not be the most efficient mechanism but
> it is "safe".  Preferably the data will either follow the command
> on the same data/command connection or the sender will request a
> RTT (aka R2T). It is also a sender bug to request a connection
> for data transfer that it has already sent "unsolicited" data.
> 
> Unless my assumptions and definitions are wrong, I don't see the issue.
> 
> 	-David
> 	
> > The problem:
> > 
> > iSCSI, as currently spec'ed, allows SCSI commands and data to be
> > interleaved fairly freely on a TCP connection. A target that stops
> > reading from a TCP connection to avoid reading more command packets
> > also prevents itself from reading data packets.  Those data packets
> > may be criticial to making progress on the currently executing
> > command.
> > 
> > Note the issue appears with one TCP connection for control and data
> > and even appears in many of the multiple connection schemes.
> > 
> > Data in iSCSI comes in two forms:
> > 
> > 	1) solicited - data requested by target via RTT 
> > 	             - data requested by initiator via a SCSI command
> > 	2) unsolicited - data sent by initiator without having received an
> RTT
> > 
> > The analysis below assumes that unsolicited data travels over the same
> > TCP connection as SCSI commands. Otherwise, you run the risk of receiving
> > unsolicited data before the relevant SCSI command (thus making
> > implementations more complex).
> > 
> > Four solutions:
> > 
> > 1) Don't overflow the command queue (i.e. use credits)
> > 	- and what do you do if a misbehaving initiator overflows
> >         your command queue anyway? Drop the connection?
> > 	
> > 	- requires you to reserve resources per initiator. some people
> >         may want to overcommit
> > 
> > 2) Allow dropping of SCSI commands when queue fills
> > 	- how do you clean up after a dropped SCSI command?
> > 	    - there may be other commands in the pipeline
> > 	
> > 	One approach: On command drop, the target enters an error
> > 	state. While in the error state, all newly received commands
> > 	terminate with an error until the initiator explicitly clears
> > 	the error state using a "clear error state" message.
> > 
> > 	You might think that TASK SET FULL and ACA mechanisms from SCSI
> >         could be used to attack this problem. However, TASK SET FULL
> errors
> > 	don't trigger ACA (in my reading of the SAM). Also, ACA is only
> > 	triggered by the current enabled command, not by random commands
> > 	entered into the task set.
> > 
> > 3) Put solicited data on a dedicated TCP connection. Require that
> > unsolicited data MUST follow the command, ideally in the same iSCSI
> > PDU
> > 
> > 4) (Do it like NFS) Make all transfers from initiator to target
> > unsolicited. Make sure unsolicited data follows the command
> > immediately.
> >    
> > 
> > Of all the options, #1 and #4 sound the easiest to implement. #2 is more
> > sophisticated than #1. #3 is just plain clever but that's rarely a good
> > thing. :)  #4 has large ramifications on current SCSI target designs.
> > 
> > -Costa
Prev by Date: RE: Avoiding deadlock in iSCSI
Next by Date: RE: Avoiding deadlock in iSCSI
Prev by thread: RE: Avoiding deadlock in iSCSI
Next by thread: RE: Avoiding deadlock in iSCSI
Index(es):
- Date
- Thread
Home
Last updated: Tue Sep 04 01:07:22 2001
6315 messages in chronological order