SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: (iSCSI) A question on Zero Copy



    Stephen (and all others who have replied)
    
    Thanks for the confirmation.. I thought it was
    type (B)... See some comments below...
    
    Stephen Byan wrote:
    > 
    > Randall R. Stewart [mailto:randall@stewart.chicago.il.us] wrote:
    > 
    > > Does the iSCSI layer want:
    > >
    > > A) Plain Zero copy, where the upper layer (iSCSI) asks
    > >    to read the next available "message" from the wire
    > >    into a buffer passed to the transport by iSCSI?
    > >
    > > <OR>
    > >
    > > B) A directed Zero Copy, where the upper layer (iSCSI) asks
    > >    to read a particular request to a specific buffer?
    > 
    > I think most folks implementing iSCSI want class B zero copy, but it is
    > restricted to the case of solicited data. Commands and status can be class A
    > zero copy, or even just copied.
    > 
    > I don't know what people are thinking about unsolicited data; it seems to me
    > that it must be buffered anonymously, and thence copied, but the
    > resource-poor environments with which I am familiar would opt not to support
    > unsolicited data at all.
    > 
    > It's possible to imagine iSCSI implementations that use another kind of
    > zero-copy, where the iSCSI application simply lives with a scatter-gather
    > list of anonymous buffers allocated by the network stack. But I think it's
    > rather hard to implement iSCSI application code on top of the indirection of
    > scatter-gather lists. It's much easier to think about your [file system|disk
    > controller] cache blocks as named, contiguous regions of (possibly virtual)
    > memory, rather than a random collection of bits of anonymous buffers. I
    > think the anonymous buffer approach also has a memory utilization penalty,
    > and so is not too good in memory-constrained environments. So I vote for
    > class B zero-copy, which lets my application manage memory as named
    > contiguous buffers.
    > 
    > I haven't the faintest idea how to achieve class B zero copy, without
    > putting the entire fast-path TCP processing and some of the iSCSI processing
    > into hardware state-machines running at wire-speed.
    > 
    
    This was exactly my thoughts.. how does one achieve this without merging
    TCP and iSCSI together... since in order to get a class B, at any moment
    one must:
    
    A) Be able to tell what buffer a particular segment coming off
       the wire belongs with
    <and>
    
    B) Be able to always maintain the framing.
    
    Now with TCP I am faced with a stream of bytes. So unless you
    have some sort of option (the RDMA proposal) in the TCP
    header <OR> in the buffer being sent itself a direction as
    to what buffer address this goes with the TCP stack has no
    idea what buffer to shove the incoming segment in. In fact if
    you don't have the RDMA option you are stuck unless you totally
    merge TCP into iSCSI... since the TCP stack itself must
    become "iSCSI" aware... very bad in my view.
    
    Even in a SCTP stack, I don't see how this would work. You do
    have more flexibility with the streams and could do some sort
    of stream negotiation to say that stream N is going to supply
    data for this buffer.. but again there is no provision for the
    SCTP stack itself to do this in the API yet. We have no way
    of doing a "threaded blocking read of a stream number" which
    is what would be required. Now I know that this is not disallowed
    by rfc2960 but I don't know of anyones stack heading this way...nor
    did we put it in the sockets mapping draft...
    
    Hmm this is a very interesting problem.
    
    > Absent such wire-speed parsing of the headers, I think we're really talking
    > about a "copy-once" approach on receive, where the packets land in anonymous
    > buffers (possibly located on the ethernet PCI adapter), and then software
    > (possibly running on a processor located on the ethernet PCI adapter) parses
    > the IP, TCP, and iSCSI headers and then sets up a hardware DMA engine to
    > copy the payload to a buffer in main memory, and simultaneously perform the
    > checksum checking. Think of an Alteon Tigon ethernet chip on steriods,
    > running the TCP/IP fast-path code and some iSCSI application-specific code.
    > 
    > I'd appreciate comments, critiques, and info on other approaches to the
    > problem :-)
    > 
    > Regards,
    > -Steve
    > 
    > Steve Byan
    > <stephen.byan@quantum.com>
    > Design Engineer
    > MS 1-3/E23
    > 333 South Street
    > Shrewsbury, MA 01545
    > (508)770-3414
    > fax: (508)770-2604
    
    -- 
    Randall R. Stewart
    randall@stewart.chicago.il.us or rrs@cisco.com
    815-342-5222 (cell) 815-477-2127 (work)
    


Home

Last updated: Tue Sep 04 01:06:11 2001
6315 messages in chronological order