SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: Multiple TCP connections



    At 11:55 AM 8/9/00 -0700, David Robinson wrote:
    >[This mailing list is acting up so forgive me if this is a repeat]
    >
    >Randy provides a good summary of the "design team's" decision of why
    >they thought multiple connections per session.
    >
    >My contention is that the argument is inverted that you need multiple
    >paths through the fabric to get performance and current link layer
    >technology (802.3ad) only provides concurrency based on TCP layer
    >headers thus multiple connections are needed.  Further given multiple
    >connections per session you can't have a connection per LUN as there
    >will not be enough connections based on simple math.
    >
    >If you work the argument backwards I see a different result.  If you
    >assume instead that we have one connection per LUN we should look at
    >the number of concurrent connections possible.  TCP with its 16 bit
    >port number limit us to 64K ports therefore only 64K active connections
    >per IP interface.
    
    This is per IP address and a given interface can have multiple IP addresses 
    without much complexity.
    
    >Given the high bandwidth of existing drives,
    >especially with the amount of cache appearing in controllers, just 10%
    >of the possible connections active will saturate a link layer
    >technology for the next few decades. Therefore to get to the range of
    >that many LUNs and thus connections you will need multiple IP
    >interfaces.  Even with the existing draft proposal, no sane implementor
    >would throttle 10K+ LUNs with a single IP interface.
    
    Most implementations would be expected to have multiple adapters / chipsets 
    interfaces per storage devices and server - both for load balancing and 
    availability.  Also, it is unlikely that any scheduler in hardware is going 
    to be able to handle 64K connections with the state being constantly 
    updated (an on-chip cache will be needed and then the chip will need to 
    access off-chip memory or host memory for the rest of the state 
    structures).  In any case, this situation will require high memory 
    bandwidth for the state structures independent of the technology being used 
    (TCP, InfiniBand, VI, etc.).
    
    
    >I would propose that the requirement be 64K active sessions.  Given
    >that requirement having a session per LUN makes sense.
    
    Sessions per interface, per host, per endnode?
    
    
    >The next issue is performance.  I agree that to get maximal use of a
    >fabric you need to exploit concurrency. The question becomes where is
    >the correct place to put the concurrency.  If we follow the argument
    >that we should have a session per LUN and the standard semantics of
    >LUNs are in order request/response with minimal concurrency leading to
    >a per connection performance requirement that is on par with today's
    >link level technology and protocol implementations and will likely grow
    >at the same rates. The throughput performance that is really a concern
    >is the aggregate bandwidth of an initiator to multiple LUNs. With a
    >session per LUN, each TCP connection can be placed on a different link
    >layer channel (802.3ad) using TCP layer header information. Therefore
    >the performance will scale with the link layer improvements using the
    >existing link layer aggregation mechanism.  Ultimately the initiator to
    >LUN bandwidth will be a host memory to storage contoller cache memory
    >copy, currently being designed interconnect technology like Infiniband
    >is exactly designed for such types of copies, so as storage devices
    >move towards a memory to memory model the interconnects will exist.
    
    Whether one uses InfiniBand is irrelevant - the issue is whether one uses 
    RDMA semantics in your proposal or not.  The work being discussed here does 
    support the use of RDMA technology and thus would meet your 
    requirements.  InfiniBand's main priority was a PCI-X replacement and 
    limited distance IPC.  Some companies also envision using it as a backbone 
    for other traffic types as well but there are architectural limitations 
    with it being designed for the data center and the associated distances 
    that may limit its value in storage solutions.
    
    
    >The issue no longer becomes trying to figure out how to exploit
    >multi-link concurrency for a single TCP stream, but what the contents
    >of a single TCP stream is, given that we know existing link technology
    >will allow multiple TCP streams to be passed concurrently.  I assert
    >that using a LUN is the natural level of concurrency and the
    >performance demand of a single LUN can be met by existing TCP
    >implementations and should scale over time.
    >
    >The numerical argument given claims that a single storage controller
    >may have 160K concurrent connections.  That is likely to be an extreme
    >case with a poorly balanced set of hardware but I will grant it for the
    >sake of argument. The argument is posed that this will be too expensive
    >to maintain that much TCP state. The proposed cost is ~10MB of memory
    >which today will at about $1 to the cost of a box containing 10,000
    >disk drives (~$1M assuming $100 drives). Not a compelling argument.
    >Furthermore, if you multiplex multiple LUNs per connection you still
    >need sufficient state to mux-demux requests which will be on the same
    >order of magnitude as TCP state. So ultimately the "cost" argument is a
    >wash.
    >
    > > Conclusion: one (or two) TCP connections per LU is both too many (resulting
    > > in too much memory devoted to state records) and too few (insufficient
    > > bandwidth for high-speed IO to controller cache).  Decoupling the number of
    > > TCP connections from the number of LUs is the necessary result.
    >
    >I don't buy the conclusion, the amount of memory devoted to state
    >records is relatively small and is actually constant regardless of
    >whether the mux-demux is done at the TCP layer or session layer.  Also
    >the driver for interconnect technology is memory to memory copying
    >so the advances in storage technology will not likely outgrow the link
    >layer.
    
    Mike
    
    


Home

Last updated: Tue Apr 15 11:19:21 2003
12499 messages in chronological order