SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: Avoiding deadlock in iSCSI



    > As you can not predict latency on two TCP connections, there would be no
    > assurance of delivery order.  A skew buffer would seem a requirement.
    
    If I understand your definition of a "skew buffer", some place to
    hold the data until the command arrives.  This buffer is simply
    the TCP receive buffer if we require data to be sent on order.
    Any TCP implementation that offers a total window size (sum of
    windows on all connections) greater than available buffer space
    is just plain broken, so this is not an issue.
    
    > Each TCP stream can deliver sequential data, but not within multiple
    > streams.  As such, sequential delivery goes out the window with respect to
    > aggregation.  As far as the wire, TCP is part of the Upper Layer Protocol.
    
    Yes each connection maintains data ordering, and if we also require
    the all data is sent in order, the early arrival on one connection
    or another does not change the ordering. The receiver simply maintains
    the data buffer and closes the TCP window if flow control is needed
    until the command arrives, on the command connection, to be processed.
    Again not an issue.
    
    > The balance between commands and data are not within the control of the
    > target via flow-control.  As such, either a data or command resource may
    > become exhausted.  At some point, either data or commands may be stopped
    > without necessarily stopping TCP.  The means for stopping a command is Check
    > Condition, and for data, discarding.
    
    With multiple data connections, any well implemented target will not
    open the total window size greater than the available buffers.  Once
    a given connection uses up its window it is flow controlled by TCP
    and no more data is sent. Because the commands are on a seperate
    connection and we require data ordering, the commands will make
    progress either using immediate data, or reading data from a
    data connection that is guarenteed to be available or will be available.
    Commands should never need to be stopped by iSCSI and neither should
    the data connections, TCP flow control should be sufficient.
    
    There may be rare exceptional conditions where an initiator or target
    fails to follow the protocol (bug), drops a connection, or other rare
    event.  The the recovery mechanism should simply handle this.
    
    > Should there be a TCP connection per LUN, the number of TCP connections
    > would be large if a controller is sitting on 48 LUNs.  With asymmetrical
    > connections that would imply 96 TCP sessions per client possible plus fail
    > over connections.  On the network, TCP shares on a session basis.  This
    > would mean a device with a single connection on the same network would then
    > enjoy only 1% of the bandwidth.  This says nothing about TCP overhead.
    
    Although I like a connection per LUN, the WG seems to be in concensus that
    the protocol allows multiple LUNs per connection which if you choose
    can implement just one.  But there is no requirement to restrict it
    to just one.  With one connection per LUN there is very little
    reason to have a seperate data connection and instead it is
    best to simply provide all the data immediately after the command.
    Having seperate data connections allows better concurrency when
    commands are for many LUNs, the concurrency for a single LUN is
    like to be low, if not sequential in the common case, so extra data
    connections are not necessary.
    
    With either a connection per LUN or all LUNs on one connection, if a
    connection can consume 100% of the bandwidth of the link, with 100
    LUNs each will only get 1% of the bandwidth. The only difference is
    if the multiplexing is done at the TCP layer or the iSCSI layer,
    the complexity and overhead is the same (modulo a small constant).
    The only difference is that TCP already has the mux/demux capability
    and we will have to add it to iSCSI. I prefer the simplicity of
    leaving it to TCP, but lost the battle, but it is simply not a big
    enough issue to worry about.
    
    > The iSCSI means of limiting the amount of data presented in an unsolicited
    > fashion is to discard.  For a given amount of buffer space, the number of
    > commands associated with this space is unknown.  The overhead for staging
    > these commands would add an additional overhead on a command basis not a
    > data basis.  As such, it would be like adding water to a box of rice. Within
    > a margin that stays out of trouble, how much of the buffer would you be
    > wasting to handle all situations?  What if you oops due to the latency of
    > responding?
    
    Unlike a datagram protocol where you accept either all of the data or
    none of it, a reliable stream allows you to accept as much or as
    little as the receiver desires.  You never overflow because you
    never advertise a TCP window larger than your available buffer space.
    With N connections you never allow any one connection to consume
    more than 1/N of the buffer space. You may waste buffer space but
    memory is cheap and you can control how many connections you allow.
    Latency is not an issue, if data arrives before a command it
    is simply flow controlled, in a really bizarre implementation with
    minimal buffering you could hold all data TCP windows to zero
    until a command arrives that indicates its data is on that connection.
    I perverse method of implementing RTT/R2T but it shows that the scheme
    works in the extreme.
    
    	-David
    	
    
    


Home

Last updated: Tue Sep 04 01:07:20 2001
6315 messages in chronological order