SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: TCP RDMA option to accelerate NFS, CIFS, SCSI, etc.



    > From: Costa Sapuntzakis <csapuntz@cisco.com>
    
    > ...
    > Today, you have specialized silicon that for simple bus protocols
    > (SCSI parallel interface and ATA) will directly take transfer blocks
    > between the device and the buffer cache. This is not currently done
    > with TCP, to the best of my knowledge. ...
    
    It might be good to investigate the history of Protocol Engines Inc.,
    including its goals, the reasons for its failure as a business, and what
    it achieved technically.  A skewed history might be:
     1. founded to make silicon for XTP, a nominally faster protocol than TCP.
     2. when XTP protocol and the XTP chips got bogged down, shifted to making
        chips to help TCP go wire speed over FDDI.
     3. other people made TCP go wire speed over FDDI without any special
        silicon or new to protocols.  That took some wind out of XTP's sails,
        and tore the sails driving PEI's TCP acclerator chips.
      4. standard standards committee problems with XTP didn't help PEI's other
        sails.
    
    If you ask me, SCSI/IP and RDMA have striking parallels to #1 and #2. 
    I bet you'll meet parallels to #3 before any real deployment.  You've
    started to see #4 in some of the suggested improvements to RDMA today.
    It's not that the suggestions are not good ideas.  That problem is that
    committees cannot say no to good ideas, while the one thing that matters
    above all in any design task is saying no to almost everything.
    
    Protocol Engines and XTP were based on the unexamined assumption that TCP
    is very difficult to implement and an unavoidably slow protocol.  Most
    people just knew those "facts" 15 years ago.  I think RDMA suffers a
    similar problem.  Instead of starting by assuming that a new protocol is
    needed for a new goal, if you actually look within the existing boundaries,
    you'll often find a solution.  Often the inside solution is better than
    any possible extension of the protocol.  Protocol extensions require more
    bandwidth and more processing on both sender and receiver.  They also have
    problems gaining enough marketshare to survive.
    
    Please don't misunderstand me.  Greg didn't include my name among
    the authors on one of the XTP specs because I said XTP was a stupid
    idea.  I still like lots of XTP.  I also think that many of the
    XTP ideas can be *and have been* applied to TCP implementations.
    
    
    > However, in the case of most storage protocols, you don't want
    > the data in the receive buffer. You want it in the buffer cache, so
    > there is a copy to the buffer cache.
    
    Which NFS implementation written in the last 10 or at least 5 years and
    intended to be fast doesn't move data between the buffer cache near the
    disk and the buffer cache near the application with zero (0) copies?
    Page flipping to and from buffer caches is especially easy, because
    buffer caches tend to be page aligned, and file systems like to move
    data in page-sized or larger chunks.
    
    
    > So, NFS has a  CPU overhead hit as compared to optimized storage host bus
    > adapters. The goal was to eliminate part of this hit, by getting rid of an
    > extra copy.
    
    How can you have fewer than zero copies?
    
    > Now, this proposal doesn't fix the interrupt overhead problem. 
    > Optimized FC/SCSI NICs have one interrupt/transfer or less.
    
    Interrupts are killers, and so for that last 5 or 10 years, a competetive
    NFS system has had about 0.1 interrupts per packet.  The trick is not
    reducing the ratio of interrupts/packet, but reducing it only so far that
    things don't slow down, and increasing the ratio when the total system
    (client & server) moves into a regime that requires more interrupts.
    
    
    ] From: Michael Krause <krause@cup.hp.com>
    
    ] It ain't free and there are plenty of reasons to avoid copying data since 
    ] ...
    ] touching the buffers themselves.  Also, one could use this technology with 
    ] storage devices to bypass the server and send data to one or more NICs for 
    ] remote access - RDMA is still quite good for this type of operation and 
    ] does not involve touching the data.
    
    There are other, much easier ways to separate data and control
    information in the receiver than being forced to parse optional
    new bits in TCP or IP headers.
    
    For 10 years, network interfaces in commercial UNIX systems have been
    putting the headers (including RPC/XDR) of incoming NFS traffic in one
    place (a "small mbuf") and the data in another place (the buffer cache)
    without extra copies, and without parsing any headers, not to mention
    new header bits with the nasty problems of TCP or IP options.
    And this despite the fact that the RPC/XDR stuff is between variable length
    (recall the NFS group list) and a hard to predict length.
    
    
    Vernon Schryver    vjs@rhyolite.com
    


Home

Last updated: Tue Sep 04 01:08:18 2001
6315 messages in chronological order