Re: Avoiding deadlock in iSCSI

To: David Robinson <David.Robinson@EBay.Sun.COM>
Subject: Re: Avoiding deadlock in iSCSI
From: "Randall R. Stewart" <randall@stewart.chicago.il.us>
Date: Tue, 12 Sep 2000 22:07:53 -0500
CC: ips@ece.cmu.edu, dotis@sanlight.net
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
References: <200009121807.LAA29970@ha10nwk.EBay.Sun.COM>
Sender: owner-ips@ece.cmu.edu

David:

Question for you... 

Your reply leads me to believe a tight coupling exists between
the iSCSI layer and the TCP stack. I.e. iSCSI is going in
and "tweaking on a dynamic basis" the TCP implemenations 
rwnd. Other email that has crossed on this subject implies
that "a TCP implementation is broken if it offers more
buffers than it has"... This in some ways contradicts iSCSI
controlling TCP rwnd and TCP controlling its own rwnd...


could you please clearify this for me? Is it TCP in control
of rwnd or is there a tight coupling here?

Thanks

R


David Robinson wrote:
> 
> > As you can not predict latency on two TCP connections, there would be no
> > assurance of delivery order.  A skew buffer would seem a requirement.
> 
> If I understand your definition of a "skew buffer", some place to
> hold the data until the command arrives.  This buffer is simply
> the TCP receive buffer if we require data to be sent on order.
> Any TCP implementation that offers a total window size (sum of
> windows on all connections) greater than available buffer space
> is just plain broken, so this is not an issue.
> 
> > Each TCP stream can deliver sequential data, but not within multiple
> > streams.  As such, sequential delivery goes out the window with respect to
> > aggregation.  As far as the wire, TCP is part of the Upper Layer Protocol.
> 
> Yes each connection maintains data ordering, and if we also require
> the all data is sent in order, the early arrival on one connection
> or another does not change the ordering. The receiver simply maintains
> the data buffer and closes the TCP window if flow control is needed
> until the command arrives, on the command connection, to be processed.
> Again not an issue.
> 
> > The balance between commands and data are not within the control of the
> > target via flow-control.  As such, either a data or command resource may
> > become exhausted.  At some point, either data or commands may be stopped
> > without necessarily stopping TCP.  The means for stopping a command is Check
> > Condition, and for data, discarding.
> 
> With multiple data connections, any well implemented target will not
> open the total window size greater than the available buffers.  Once
> a given connection uses up its window it is flow controlled by TCP
> and no more data is sent. Because the commands are on a seperate
> connection and we require data ordering, the commands will make
> progress either using immediate data, or reading data from a
> data connection that is guarenteed to be available or will be available.
> Commands should never need to be stopped by iSCSI and neither should
> the data connections, TCP flow control should be sufficient.
> 
> There may be rare exceptional conditions where an initiator or target
> fails to follow the protocol (bug), drops a connection, or other rare
> event.  The the recovery mechanism should simply handle this.
> 
> > Should there be a TCP connection per LUN, the number of TCP connections
> > would be large if a controller is sitting on 48 LUNs.  With asymmetrical
> > connections that would imply 96 TCP sessions per client possible plus fail
> > over connections.  On the network, TCP shares on a session basis.  This
> > would mean a device with a single connection on the same network would then
> > enjoy only 1% of the bandwidth.  This says nothing about TCP overhead.
> 
> Although I like a connection per LUN, the WG seems to be in concensus that
> the protocol allows multiple LUNs per connection which if you choose
> can implement just one.  But there is no requirement to restrict it
> to just one.  With one connection per LUN there is very little
> reason to have a seperate data connection and instead it is
> best to simply provide all the data immediately after the command.
> Having seperate data connections allows better concurrency when
> commands are for many LUNs, the concurrency for a single LUN is
> like to be low, if not sequential in the common case, so extra data
> connections are not necessary.
> 
> With either a connection per LUN or all LUNs on one connection, if a
> connection can consume 100% of the bandwidth of the link, with 100
> LUNs each will only get 1% of the bandwidth. The only difference is
> if the multiplexing is done at the TCP layer or the iSCSI layer,
> the complexity and overhead is the same (modulo a small constant).
> The only difference is that TCP already has the mux/demux capability
> and we will have to add it to iSCSI. I prefer the simplicity of
> leaving it to TCP, but lost the battle, but it is simply not a big
> enough issue to worry about.
> 
> > The iSCSI means of limiting the amount of data presented in an unsolicited
> > fashion is to discard.  For a given amount of buffer space, the number of
> > commands associated with this space is unknown.  The overhead for staging
> > these commands would add an additional overhead on a command basis not a
> > data basis.  As such, it would be like adding water to a box of rice. Within
> > a margin that stays out of trouble, how much of the buffer would you be
> > wasting to handle all situations?  What if you oops due to the latency of
> > responding?
> 
> Unlike a datagram protocol where you accept either all of the data or
> none of it, a reliable stream allows you to accept as much or as
> little as the receiver desires.  You never overflow because you
> never advertise a TCP window larger than your available buffer space.
> With N connections you never allow any one connection to consume
> more than 1/N of the buffer space. You may waste buffer space but
> memory is cheap and you can control how many connections you allow.
> Latency is not an issue, if data arrives before a command it
> is simply flow controlled, in a really bizarre implementation with
> minimal buffering you could hold all data TCP windows to zero
> until a command arrives that indicates its data is on that connection.
> I perverse method of implementing RTT/R2T but it shows that the scheme
> works in the extreme.
> 
>         -David
> 

-- 
Randall R. Stewart
randall@stewart.chicago.il.us or rrs@cisco.com
815-342-5222 (cell) 815-477-2127 (work)

Follow-Ups:
- Re: Avoiding deadlock in iSCSI
  - From: David Robinson <David.Robinson@EBay.Sun.COM>

References:
- RE: Avoiding deadlock in iSCSI
  - From: David Robinson <David.Robinson@EBay.Sun.COM>

Prev by Date: RE: TCP speed
Next by Date: Re: TCP speed
Prev by thread: RE: Avoiding deadlock in iSCSI
Next by thread: Re: Avoiding deadlock in iSCSI
Index(es):
- Date
- Thread

Home

Last updated: Tue Sep 04 01:07:19 2001
6315 messages in chronological order