SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Flow Control



    Matt,
    
    I will try to explain below.
    
    Somesh
    
    > -----Original Message-----
    > From: Matt Wakeley [mailto:matt_wakeley@agilent.com]
    > Sent: Saturday, October 14, 2000 7:18 PM
    > To: IPS Reflector
    > Subject: Re: iSCSI: Flow Control
    > 
    > 
    > Somesh,
    > 
    > I still don't understand what you are trying to solve.
    > 
    > With the iSCSI session wide command credit method, there is a 
    > portion of the
    > iSCSI layer that sits right below the SCSI layer.  It 
    > receives the commands
    > from the SCSI layer and passes the results of each I/O from 
    > each NIC back to
    > the SCSI layer. The MaxCmdRn indicates how many commands the 
    > target (as a
    > whole) can "buffer". The iSCSI layer will "scatter" the 
    > commands to the NICs
    > until it has used up the MaxCmdRn buffers. Each NIC, once 
    > iSCSI has posted a
    > command to it, will attempt to send the command as long as 
    > the TCP window is
    > open. Practically every message sent from the target to the 
    > initiator contains
    > the new MaxCmdRn.  Each in initiator NIC that receives a 
    > message passes this
    > (new) value to the common iSCSI.  This value does NOT have to 
    > be sent to every
    > other NIC, since once a command is posted to a NIC, it is 
    > committed to send
    > it.
    
    What you describe is a good model for the initiator side (even
    though there could be some implementation optimizations). As the
    iSCSI host driver (IHD) receives commands from SCSI layer, it has
    to check the following before it can post the command
    
    Check whether there is space in the host
    queues for each NIC (i.e. the host memory which has been designated
    to be used for posting commands to a NIC - may be limited by NIC
    limitations or host memory limitation). There may be models where
    there is no such limit. This is also the time when (Mike's comment)
    the scatter will be done on some algorithm and is independent of
    the flow control model.
    
    In the session-wide flow control model: The IHD has to be perform the
    additional check of whether the MaxCmdRn is being exceeded or not.
    
    In a connection-wide model: No such check has to be performed as the
    NIC should be able to handle that on its own.
    
    NOTE: There is a cost to performing each of these checks in SMP
    servers if multiple processors are involved - lock and variable
    moving from cache to cache.
    
    --
    Now in cases where the command cannot be posted to the NIC queue,
    it must be left in another queue in the host which is then processed
    when the condition is removed. The condition will be removed
    when a command status is received (also could be RTT but that will
    be useless if the model assumes interrupting the host - you really
    don't want to interrupt the host on RTT) - and the host is interrupted
    
    In a connection-wide model: The interrupt processing routine checks
    the NICs command posting queue (or equivalent status) and if it had
    been full, knows to check the common queue for more commands. If not,
    then it know there is nothing to do for command posting. 
    
    In a session-wide model: Update the global location of MaxCmdRn
    (take a lock and release lock and thrash cache if multiple CPUs
    active). Then always have to check is there are commands
    waiting to be posted (again by checking variable and locks etc).
    If yes, then post those commands - repeating the algorithm that
    was used when the upper layer posted a command.
    
    NOTE: If we feel that the SCSI layer will generate commands faster
    than the session-wide credit then the session-wide credit will
    cause extra processing. It is much more straightforward to be
    able to post from the top half, then to have to try to post from
    top-half and then actually post from the bottom. If there is
    significant credit issue, then the outbound command queues will
     be going through starvation at times.
    
    
    > 
    > Each Target NIC will have a poll of buffers to receive 
    > asynchronous (non DATA)
    > iSCSI messages.  As each (small) command message is received, 
    > it is placed
    > into one of these buffers, processed by common iSCSI and the 
    > CDB is passed to
    > the SCSI layer which stores it into its command buffer. The 
    > message buffer is
    > then given back to the NIC for further messages.
    
    The question is how much credit are you going to hand out to the
    remote side. If there are N buffers posted per card and M cards, will
    you make a credit of N available (underutlization) or N * M (which
    assumes that the send will send evenly and is risky if there
    is sudden congestion on one or more connections). Also the
    same discussion of the system cost of a calculating and using a
    centralized value of MaxCmdRn applies if arrays have multiple
    processors.
    
    > 
    > "GUPTA,SOMESH (HP-Cupertino,ex1)" wrote:
    > 
    > > Yes I am trying to describe the synchronization pts and software
    > > intervention caused by a session wide flow control model
    > 
    > But I still don't understand the "problem" that the credit 
    > per connection
    > solves over the credit per session model.
    > 
    > In your description, the initiator still "scatters" the 
    > commands to the NICs,
    > then the NICs have the burden of trying to figure out if they 
    > can send the
    > command or not.  Furthermore, if some NICs have open TCP 
    > windows, but don't
    > have command credit, the command can't be sent.
    
    Look at it as an opportunity to differentiate and streamline
    performance than as a burden. It would definitely be a feature
    for multi-port NICs where all the ports used for a session
    are on the same NIC. Saves host CPU cycles thereby improving
    the attractiveness of the solution :-)
    
    > 
    > In the iSCSI session wide credit model, the initiator will 
    > not post commands
    > to any NIC if it doesn't have credit.  Any commands posted to 
    > a NIC will be
    > sent as long as it's TCP window is open.
    > 
    > > 1. Post a large enough number at each NIC. OK. The window open up
    > > (indicated through a new MaxCmdRn received on one connection). This
    > > value now must be communicated to the other connections, so that
    > > they can not be flow controlled also. Or the new value must be
    > > received on each connection.
    > 
    > As I indicated above, the goal is to not overflow the SCSI 
    > command buffer, so
    > the command is not discarded causing a lot of error recovery. 
    >  A command CDB
    > is only 16 bytes.  It does not make sense to allocate 16 byte 
    > buffers to NICs
    > for command reception. As I indicated above, the NIC receives 
    > the message, the
    > iSCSI layer strips out the CDB and hands it to SCSI, then 
    > reposts the message
    > buffer to the NIC.
    > 
    > > Also since you have posted a large enough number at each NIC,
    > > you are really not having any benefit at all from the session-wide
    > > value - what is the advantage?
    > 
    > Having a session wide MaxCmdRn allows the initiator to stop 
    > sending SCSI
    > commands, while still enabling non command messages to be 
    > sent.  They are
    > received by each NIC and passed to iSCSI for processing, but 
    > since they are
    > not
    > passed up to SCSI, nothing is overflowed.
    
    Again, there is no benefit over what a connection-wide flow control
    would provide. So that is a tie.
    
    In terms of being flow controlled by TCP window, or ability to scatter
    commands across the connections appropriately or not overflowing, or
    letting data/status packets continue flowing, there is no difference.
    > 
    > 
    > > 2. Have the NICs grab them from a pool through an atomic bus
    > > transaction. That has got to be tougher to implement than it
    > > looks, and the bus performance issues due to the need to maintain
    > > ordering etc?
    > 
    > As indicated above, each NIC passes the iSCSI messages to a 
    > central iSCSI
    > message processor that sends the appropriate SCSI messages to SCSI.
    > 
    > -Matt
    > 
    > 
    


Home

Last updated: Tue Sep 04 01:06:37 2001
6315 messages in chronological order