SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    RE: iSCSI: Flow Control



    At 04:56 PM 9/21/00 -0700, Jim McGrath wrote:
    
    >While memory may be getting cheaper, latency and transfer rates are getting
    >higher.  We have gone from 25 m parallel SCSI buses to transcontinental
    >TCP/IP connections; from 1 MB/s to 100 MB/s (and greater) transfer rates.
    >These combine to make the maximum amount of data in flight that keeps the
    >connection full to be growing much faster than memory cost is declining.
    >(Exponential growth rates are applied to both memory cost and transmission
    >speed; distance also appears to be growing very fast, although perhaps not
    >exponentially).
    >
    >So while your argument is works if you keep the fabric size the same and
    >increase the transfer rate (as it has been with the ATA interface - buffer
    >costs have declined over the years), it does not work if the fabric keeps on
    >growing as well.
    >
    >If a fabric introduces 1 ms (two orders of magnitude less than the worse
    >cases I have heard) at Gbit speed, then we need 100 Kbytes of buffer space
    >for a connection.  We don't have enough buffer to reserve this for all
    >possible connections we could get (Fibre Channel designs could not reserve 4
    >KByte for a smaller number of potential connections until recently).
    
    Something to think about w.r.t. this problem:
    
    RDMA semantics:
       Pros:
         - Sender only targets memory that it knows is available to use and 
    thus does not inject more data than what the receiver can use.  This 
    mitigate the overflow problem.
    
         - End-to-end ULP ACKs provide an implicit credit scheme for the 
    associated target resources.
    
       Con:
         - One must "slice" up the target resources among a set of senders 
    which can create scalability problems depending upon the resources required 
    per session.  This is where SEND semantics have their advantages - one can 
    use statistical access to deal with burst with minimal buffer overflow 
    reserves and combine this with the idea described below.
    
         - RDMA support requires additional buffer access / tracking logic 
    within the endnode to track the impacted memory.  The semantics are not 
    difficult to implement but it is additional cost within the 
    implementation.  Note: SEND semantics have DMA chain costs as well so the 
    actual delta in implementation will vary depending upon the amount of 
    resources one can effectively map /register at a given time.
    
         -  For small messages, RDMA does not always provide any cost/benefit 
    advantage which is why most implementations support SEND and RDMA semantics.
    
    
    >Jim
    >
    >PS if we actually are starting to need windows greater than 64 KBytes, is
    >this a problem?  My understanding is that deployed TCP/IP products do not
    >easily support extremely large windows.  This argues for spreading a single
    >SCSI command across multiple TCP/IP connections for pipelining to overcome
    >latency, not for bandwidth.
    
    Large window support is not difficult to implement and is supported in many 
    endnodes.  However, memory even in large endnodes is still limited and 
    subject to oversubscription so if a link cannot replenish its buffers 
    quickly enough, it drops the incoming packet and the transport 
    retransmission / congestion management takes over and adjusts the injection 
    rate.
    
    The question is whether one would like to implement a WRED (weight random 
    early detection - used today in routing elements) type of system within an 
    endnode (server, storage, etc.) whereby it would drop inbound packets when 
    resources are tight based on some criteria of the inbound packet (IP addr, 
    QoS, TCP port, etc.).  This would allow the endnode to control which 
    services should have priority when the workload approaches / exceeds the 
    available buffer resources.  This would also allow one to vary the amount 
    of "emergency" reserve buffers discussed by others without having to 
    communicate any of this end-to-end or specify it within the architecture 
    beyond the interface and drop value interpretation.
    
    I believe there is value in creating the policy interfaces to communicate 
    whether a given connection has any special policies associated with it and 
    one of these policies can be where it is in the drop priority list when 
    circumstances warrant it.  The actual policy would be outside of iSCSI (see 
    the previous e-mail discussions about QoS and policy from this summer for 
    other areas where a policy interface would have benefit) to keep iSCSI 
    opaque to the upper layer / application requirements.
    
    Mike
    
    


Home

Last updated: Tue Sep 04 01:07:07 2001
6315 messages in chronological order