SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: TCP RDMA option to accelerate NFS, CIFS, SCSI, etc.



    ] From: Charles Esson <charlese@cvs.com.au>
    
    ] I must have missed something.
    ]
    ] If we don't have this, you can take the destination port, convert to a
    ] table address, use the sequence number,
    ] do some calculations and come up with a buffer address and an offset. If
    ] you want to mess up the layering
    ] of your stack, they are all things you can do now.
    
    Standards committees don't like hashing.  It looks complicated and
    insufficiently deterministic on an overhead projector.
    
    ] or using RDMA
    ] ...
    
    ] --->An attacker plays with the data so returned.
    ] ...
    
    That's a very good point.  The tagging in RDMA cannot be used until
    after it has been validated by the receiver.  The validating consists
    of looking at sequence numbers, RPC/XDR headers, etc. to figure
    out where the data can and should go, and then checking that the
    sender guessed right.  Why not skip the last part and ignore the
    RDMA tag?  Then why send the RDMA tag?
    
     .......
    
    
    > From: Pete Zaitcev <zaitcev@metabyte.com>
    
    > Very well, but what about its companion document (SCOT)?
    >  http://search.ietf.org/internet-drafts/draft-satran-scot-00.txt
    > It is published, isn't it? It was somewhat disturbing to see the
    > notice, but on the other hand it was honest. IBM could just as
    > easily come up silently with a silly software patent for RDMA option
    > or for SCSI over TCP idea as such.
    
    The IETF's protections against patent games are well intended, but nothing
    to worry about if you want to play them and nothing to rely upon if you
    don't.  The history of IETF patent games demonstrates that the IETF is
    powerless to limit them (or worse), and that they're harder to play than
    the players hope.  (E.g. PPP CCP and PPP 48-bit FCS, respectively)
    
      ......
    
    } From: "Justin T. Gibbs" <gibbs@FreeBSD.org>
    
    } ...
    } >Can you elaborate on this?  Suppose TCP "blindly" does zero copy everything to
    } >an app's buffer (for example, to a web browser's receive buffer) without
    } >RDMA.  Then the browser app looks at the data and displays it.  What is the
    } >difference RDMA makes in this case?  Yes, RDMA can separate different messages
    } >in the buffer.  But this can also be done by the browser app, not by TCP.
    }
    } You seem to be saying that in the common case zero copy is achievable.
    } Most implementations I've seen require the network driver to make
    } a guess about where the payload will be in an incoming packet so the header
    } can be stripped off and the payload dmaed to an aligned area.   A page
    } flip is then performed to get the data where the user wants it,
    } imposing the restriction that your  payload be page sized so you don't
    } leave gaps in the user's destination buffer.
    
    That is required only if you stick to the current API.  Obvious, 
    minor changes in the direction of some operating systems that existed
    before UNIX are sufficient to relax the page boundary requirement.
    To use RDMA, you have to change the API.
    
    }                                               Certainly, with a more
    } intelligent network adapter that knows every protocol you can determine
    } exactly where the data is in each packet.  If you add connection tracking
    } and sequence number sniffing to the nic with a mechanism to register user
    } buffers to connections, you can get zero copy every time*.  Unfortunately
    } this is not very general purpose solution.
    
    Only standards committees and some academics care about "every protocol"
    or optimizing absolutely every application.  The rest of us (including
    academics) only care about optimizing the important stuff.
    
    Also as you say, looking at sequence numbers in the interface and relaxing
    the sockets API rules about not touching any bytes in the buffer except
    those that are actually received lets you avoid copies all of the time.
    I don't see why that is not a general purpose solution, if you want one.
    
    }                                             The point of RDMA seems to be
    } to allow nic manufacturers to add support for a single tcp option that, at
    } the very least, allows the nic to align the payload for you.  Add RID
    } registration with the nic and you get the payload exactly where you want it
    } too.  All without too much state information kept by the nic.
    
    I've been hearing since the mid-1980's proposals to do TSP lookups in the
    network interface instead of software because it is so incredibly difficult
    to find the right TSP quickly in software.  I think those ideas are similar
    to the RDMA idea.  They assume facts not in evidence, that there is a
    problem that needs to be solved, and that the solution is not worse than
    the nominal problem.  There are reasons why such proposals appear in
    standards committees before implementations.
    
     ......
    
    ] From: Lloyd Wood <l.wood@eim.surrey.ac.uk>
    
    ] Note the mentions of SCSI and SCSI/TCP and the tie-in with the
    ] proposed IP Storage efforts (recent ietf general list discussion).
    ]
    ] I'd still like to know _why_.
    
    ] ...
    ] SCSI DMA over TCP? What _is_ all this aiming for - trying to build
    ] distributed RAID arrays with really poor performance that are subject
    ] to WAN outages and DoS attacks?
    
    Why put SCSI over an protocol that measures RTT's, worries about
    congestion in routers, and that expects the error rates that come
    with 5000 miles of wire and 20 routers in the path?  Does anyone
    really think that TCP/IP or even IP with it's 64K bit packet limit
    are remotely close to the right protocol, particularly given the
    existing and commercially available alternatives?
    
    A standards committee is the venue of first and last resort for
    such ideas, especially a committee that is related to currently
    trendy things like the SuperInfoHypeWay.
    
     ....
    
    ) From: julian_satran@il.ibm.com
    
    ) That is not completely accurate. You will need appreciably more silicon to
    ) do what you suggest.   And you can do it only with information that "passes
    ) through the protocol" .
    
    Significantly silicon more than what to do what?  Since the comment
    was addressed to me, I'll assume one 'what' was looking at sequence
    numbers, port numbers, and so forth to page flip.  Clearly it takes more
    silicon to support page flipping in hardware than to not support page
    flipping in hardware.  I will not agree that the required silicon is a
    big deal, not because I have a clue about floor plans and so forth (I
    don't), but because at a previous employeer I fought to keep the hardware
    guys from throwing in gates to do it.  They had the silicon to spare and
    had heard so much about the wonderfulness of page flipping that they wanted
    to get in on the fun.
    
    Doing things in hardware is ok only if you absolutely must.  Software is
    always better when it is good enough, because it is soft.
    
    ) The good thing about the  proposal is that it can TAG whatever the
    ) application wants (and that can be several layers away from the protocol).
    ) You can't "page-flip" to buffers that you are not aware of. And page
    ) flipping wherever is applicable assumes  also page boundaries for buffers.
    
    That's important only if you stick close to the sockets or UNIX read()
    API.  If you are not ultra-conservative, and if you know a little of the
    history of file and device I/O API's, or of you think about such things
    for 10 seconds, then RDMA tagging becomes less interesting.
    To use RDMA tagging, you must abandone the UNIX read() API.  If you change
    the API, then you may as well think about the whole problem instead of
    only a corner.  If you let the operating system tell the application where
    the incoming data arrived, then you don't need elaborate hints from the
    sender to the receivers hardware to say where the receiving software will
    want the data.
    
    
    ) Vernon Schryver <vjs@calcite.rhyolite.com> on 25/02/2000 04:23:47
    )
    ) Please respond to Vernon Schryver <vjs@calcite.rhyolite.com>
    
    I did not write that!
    
     .....
    
    ) From: Alan Cox <alan@lxorguk.ukuu.org.uk>
    )
    ) > flip is then performed to get the data where the user wants it,
    ) > imposing the restriction that your  payload be page sized so you don't
    ) > leave gaps in the user's destination buffer.  Certainly, with a more
    )
    ) Perhaps its about time the world put together an official, sane, ring buffer
    ) style mmap socket api. A lot of the requirement to align data is coming
    ) from the existing socket API. 
    
    The IETF should not get involved in API's.  There are plenty of other
    standards committees in that arena, as well as big commercial outfits
    including one in the U.S. Pacific Northwest.  In other words, do you think
    the IETF would be more successful arguing with Microsoft about winsock
    than the IETF has been in dealing with Microsoft's obviously completely
    stupid and wrong PPP ideas?
    
    If you do get involved in standardizing such things, then *PLEASE* don't
    limit yourself to #$%$#@! ring buffers!  The ancient Execelan and preceding
    (I've a mental block against the name starting with 'I') ring buffer notion
    was ok as an initial hack, but WRONG for something to go fast.  To start,
    you don't need pointers or indeces that must be written by both the
    interface and the host.
    
    
    Vernon Schryver    vjs@rhyolite.com
    


Home

Last updated: Tue Sep 04 01:08:18 2001
6315 messages in chronological order