SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: TCP RDMA option to accelerate NFS, CIFS, SCSI, etc.



    > From: "David R. Cheriton" <cheriton@cisco.com>
    
    > ...
    >   Clearly, data is being received from hardware and software does not
    > get to touch it until it has been stored to some memory.  My 
    > assumption is that the storage system memory is arranged in fixed
    > size pages of disk/file pages.  Without hardware RDMA to the storage
    > level, I believe one requires an extra copy, from whatever the
    > hardware delivers to what the storage system expects.  Either
    > you use twice the bandwidth in the storage system memory system or
    > or else you have a separate memory system for the network, and
    > have software/processor power adequate to copy between at wire
    > speed (with all the associated support facilities for this processor.)
    >  Unless there is something wrong with this reasoning,
    > it seems like a cost issue of providing the above hardware resources
    > vs. providing a NIC chip that can RDMA.  
    
    Depending on how you are counting copies, that reasoning has been wrong
    in commercial UNIX systems for more than 10 years.
    Do you use the RDMA bits before IP checksum, the TCP checksum, and the
    medium FCS or checksum have been checked?  If not, if you receive the
    entire link layer frame into some kind of temporary buffer or FIFO,
    probably in the "network interface card/controller," to check the trailing
    FCS and before using the RDMA bits, then commercial UNIX systems have been
    doing as you say to save copies since the late 1980's.  As I said before,
    such systems were a part of what killed Protocol Engines Inc.
    
    If you do use the RDMA bits in the TCP header after 50-60 bytes of
    the frame have arrived, but before the frame FCS, aren't you worried
    about bit rot in the RDMA?
    
    > My guessitimate is that the software-only approach would be easily
    > 10 times more expensive here at the higher speed rates, of 10 Gbps.
    > If there is serious doubt about the merits of real hardware support,
    > we should try to quantify costs further at these speed ranges, IMHO.
    
    By "expensive," are you talking about dollars or bits/second?
    
    Regardless, if you look at the number of CPU cycles or gates in custom
    silicon required to support incoming page flipping in old, existing
    implementations, I bet you'll find that they are less "expensive" than
    any likely RDMA implementation.  Power of 2 modular arithmetic is awfully
    cheap compared to parsing and validating TCP options.
    
    
    > ...
    > It would help me to have a more careful definition of the types of
    > attacks you have in mind.  In an unsecure network with intruders,
    > presumably I can end up with bad data in the right buffer
    > or right data in the wrong buffer without using RDMA.
    > Do you view we have made things worse, and if so, how?
    > or are you objecting to us not making things better?
    
    Is it possible for a bad guy to use RDMA to put bad data into memory
    that is not a buffer?
    
    If the RID does no more than choose from a safe list of buffers, then how
    does RDMA usefully differ from the old FDDI, ATM, and HIPPI implementations
    that put incoming page-flippable data in buffers that get into user space
    with the data having been seen on the system bus the absolute minimum
    number of times for any scheme, including RDMA, once?
    Systems I've worked on have done mbuf allocation in the network interface
    hardware, including putting page-flippable payloads into page-mbufs that
    can eventually be flipped into user space.  And of course, take care of
    the TCP or UDP checksum.
    
    Given the recently described extensions to readv(), absolutely
    all data received by a system like that would be page-flippable,
    and without needing the silicon or CPU cycles to parse RDMA options
    or requiring the sender to send RDMA options or even know that the
    receiver is being fast.
    
    
    Vernon Schryver    vjs@rhyolite.com
    


Home

Last updated: Tue Sep 04 01:08:17 2001
6315 messages in chronological order