SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Flow Control



    Matt,
    
    Comments below
    
    Somesh
    
    > -----Original Message-----
    > From: Matt Wakeley [mailto:matt_wakeley@agilent.com]
    > Sent: Monday, October 09, 2000 4:49 PM
    > To: IPS Reflector
    > Subject: Re: iSCSI: Flow Control
    > 
    > 
    > "GUPTA,SOMESH (HP-Cupertino,ex1)" wrote:
    > 
    > > Hi all,
    > >
    > > Assuming that we have consensus, especially on [1] below (minimum
    > > connections is 1), I think we should try and resolve the flow
    > > control issue.
    > >
    > > It seems to me that there is sufficient consensus that command
    > > flow control is needed -
    > >
    > > [1]   To enable fastest possible flow of commands given the
    > >       capabilities of the target & initiator, and accomodating
    > >       increased latencies of IP networks
    > 
    > Well, the "fastest" possible flow of commands would be to 
    > send the commands on
    > a dedicated command channel.  Otherwise, they will always be 
    > queued up behind
    > data.
    > 
    > >
    > >
    > > [2]   To significantly minimize the queue full condition. And to
    > >       provide a recovery mechanism at the iSCSI level when command
    > >       overflow happens at the target.
    > 
    > If the "MaxCmdRN" mechanism that is already implemented in 
    > the draft is
    > observed, there will be no "dropped commands" because the 
    > target has indicated
    > how many command buffers it has available (as long as the 
    > target doesn't "lie"
    > and the initiator doesn't ignore the target's values).
    
    I agree. The only difference of opinion I have is whether the
    credit/window should be on a per connection basis or a session
    basis.
    
    > 
    > > [4] Another question that comes up is - Should the credit be per
    > >     connection or per session (multiple connections)?
    > >
    > > The current draft does provide for a session wide "flow control"
    > > through MaxCmdRn.
    > 
    > This is simply an artifact of the real purpose of the CmdRN 
    > fields... to
    > enable re-ordering of commands at the target across multiple 
    > (symetric) iSCSI
    > TCP connections.
    > 
    > > I believe that it is better to have flow control on a per 
    > connection basis.
    > 
    > Does this mean you now no longer care about command ordering?
    
    I don't see why the objectives are at odds with each other? If
    you could point that out, I will try to answer.
    
    > 
    > > This enables each connection (which might be different 
    > NICs) to operate
    > > independently of
    > > each other. Having a session wide flow control would cause
    > > sync points in both the initiator and the target.
    > 
    > If you want such independence, why not simply use multiple 
    > iSCSI sessions and
    > use the wedge driver as others have stated?
    
    Again, what I am not trying to subvert the desire to make
    bulk of multiple connections/session functionality standard. Yes
    there will be some field redefinition (new fields, smaller fields)
    whatever.
    
    > 
    > >
    > >
    > > Also a smaller field could be used if it is just to indicate
    > > a credit window.
    > >
    > > [5] The credit should be a "pretty good effort" and not a 
    > "guarantee".
    > >
    > > This allows smart targets to overcommitt as the number of initiators
    > > logged in increases (while reducing the credit available to the
    > > initiators) and increase the credit and reduce overcommittment as
    > > the number of initiators logged in decreases.
    > >
    > > Some mechanism is required to recover from the infrequent case where
    > > command buffers get exhausted and have to be thrown away.
    > >
    > > [6] I would recommend that iSCSI provide a way to recover from
    > > command overflow and also maintain ordering.
    > >
    > > The current proposal does not have a drop notification. It has
    > > an ack mechanism (ExpCmdRn).
    > 
    > And this mechanism tells you what commands got to the target. 
    >  If the command
    > didn't get to the target, you would know by the ExpCmdRn.  
    > Remember, TCP
    > always delivers (bytes) in order, so if command x didn't make 
    > it, neither did
    > all the commands after x.
    
    Remember, we are talking about a case where the command did get
    to the other side based on TCP, but due to some temporary
    cogestion, got thrown away by the app.
    
    > 
    > > I think for the purpose of drop
    > > notification, it is better to be able to indicate the range of
    > > commands dropped. TCP acks do tell me which commands
    > > reached the target,
    > 
    > No, TCP acks tell you nothing, because iSCSI sends the 
    > commands to the TCP
    > layer in the TCP byte stream, and TCP does not tell the 
    > application layer what
    > bytes have been "acked".
    
    What I really meant was (I suppose I should not post on
    Sunday evenings and take two deep breaths any other day)
    was that if it was segment tranmission problem, TCP
    will recover it.
    
    > 
    > > and command responses tell me which were processed.
    > 
    > If the "MaxCmdRN" mechanism is observed, there will be no 
    > "dropped commands"
    > because the target has indicated how many command buffers it 
    > has available.
    > 
    > >
    > >
    > > When a target suffers from command exhaustion, it could behave
    > > in 2 different ways - one is to drop all the commands it receives
    > > till it detects a retransmission. In this case it would send a drop
    > > notification of all commands it receives till it starts receiving
    > > the command from where the drop started.
    > >
    > > The other would be to store all the commands it is able to provide
    > > buffers for and provide NAKs for only those that it has dropped.
    > > This would be more efficient.
    > >
    > > In this case, we should also agree on what the semantics of the
    > > processing of the out or order commands are. Should they be
    > > processed only when the gaps are filled? Or can they be processed
    > > in any order?
    > >
    > > [7] There was some discussion of whether we should propose a slow
    > > start algorithm or a fast start algorithm.
    > >
    > > I think we should a fast start algorithm at this level. At TCP
    > > level, the slow start algorithm is important because the two
    > > ends are unaware of the state of the network and have to probe it.
    > > At the iSCSI level, the target should be reasonably knowledgable
    > > about the its own state and be able to provide a credit or
    > > reduce/increase it per login as the conditions change (hopefully
    > > with some hysteresis built in).
    > >
    > > [8] On flow control of immediate data, should we first work out
    > > the command flow control and then turn our efforts to the
    > > data flow control?
    > 
    > Once again, if the asymetric model is used, with a minimum of two TCP
    > connections, there is no command flow control problem.  There 
    > is no command
    > ordering problem.  There is no data flow control problem.  
    > All commands will
    > flow on one TCP connection.  When the command buffers at the 
    > target become
    > full, the target will simply let TCP flow control itself.  If 
    > there are no
    > data buffers at the target, the target will again simply let 
    > the TCP flow
    > control mechanism kick in.
    
    Once again, this mechanism has very bad performance for all other
    than one scenario. The only scenario it "works" for is where you
    have one (and only one) iSCSI accelerated NIC. Sometime back we
    went through (in this group) about the kind of synchronization
    costs this has for 
    -- software solution
    -- TCP offload adapters
    -- session running across multiple iSCSI adapters
    -- and even a single iSCSI NIC where the no r2t is used
       and data gets to the other side before the command
       (this last case can happen even with a single connection
        - packet drop, but the number of scenarios increase
          quite a bit when using different connections --
          for port aggregation might send them on different
          paths through the fabric since they are on
          different connections.
    
    > 
    > -Matt
    > 
    > >
    > >
    > > Once we can agree on some of the basic issues, then it should be
    > > relatively easy to work out the credit indication/numbering
    > > details etc.
    > >
    > > Somesh
    > 
    > 
    
    Somesh
    


Home

Last updated: Tue Sep 04 01:06:44 2001
6315 messages in chronological order