SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: More notes from Haifa



    
    
    I  agree that it would be nice to have to have all the good things of both
    TCP and UDP.
    But there are good reasons why this is unlikely to happened even if we
    start redoing TCP
    over UDP.  For a long time we had assumed that a TCP option (like and
    expanded RDMA) that adds message boundaries to bulk data can solve most of
    the issues we talking about on this
    thread - effects packet loss on memory requirements and throughput.
    At a certain point we even considered of making this option part of iSCSI -
    as a mandated segment header (i.e. build the TCP segment payload segment at
    a  time).
    I wonder if this or another equivalent scheme might not fit what we are
    looking for.
    
    Julo
    
    Matt Wakeley <matt_wakeley@agilent.com> on 01/07/2000 03:10:55
    
    Please respond to Matt Wakeley <matt_wakeley@agilent.com>
    
    To:   ips@ece.cmu.edu
    cc:    (bcc: Julian Satran/Haifa/IBM)
    Subject:  Re: More notes from Haifa
    
    
    
    
    Costa Sapuntzakis wrote:
    
    > On Fri, 30 Jun 2000, Matt Wakeley wrote:
    >
    > > > An optimization to iSCSI was discussed. The suggestion
    > > > was that each TCP segment be the start of a new iSCSI PDU.
    > > > This would be especially valuable for data transfers, as
    > > > each segment would have enough information to place the data
    > > > in memory at the remote end.
    > >
    > > You didn't note the results of this discussion.  I thought I remember
    > > something that this could not be enforced upon TCP.  Something about it
    could
    > > "send" a segment whenever it wants (thus that segment would not have a
    header)
    > > and that the push bit wouldn't work either, because tcp could accept a
    few
    > > more bytes after the push bit.
    >
    > Your recollections match mine. Thanks for filling in the gap.
    >
    > ---------------------------
    >
    > The motivation for the discussion are 1) to describe
    > a simpler fast-path for iSCSI,  2) minimize data buffering needs in
    > the presence of out-of-order reception.
    >
    > The bulk of the traffic on an iSCSI TCP connection will be SCSI data.
    > SCSI data can be delivered to the buffer at each end in pretty much
    > any order, just as long as it is all delivered. So, as an
    > optimization, instead of keeping tons of SCSI data in a TCP receive
    > queue, an iSCSI w/optimized TCP could parse out-of-order segments
    > and deliver the SCSI data in them to the right SCSI buffer.
    
    Yes - but only if there is a iSCSI header in the frame received.  If there
    is no
    header, it can't be determined where to put the data. Yes, it's possible to
    detect
    that this tcp segment is part of an iSCSI message *if* the header was
    previously
    received by inspecting the length field in that header and placing the tcp
    segment
    in the proper buffer.  But this does not cover *all* cases.  So, in the
    case where
    the frame with the iSCSI header is lost (for example, a 64k data pdu), now
    all that
    64K of data must be buffered and copied.  It would be *so* much easier if
    each tcp
    segment contained an iSCSI header immediately after the TCP header, and
    each TCP
    segment fit into exactly one ethernet frame.  This is basically how Fibre
    Channel
    works.  A frame comes in, and from the information in the FC header, the
    payload can
    be placed into memory.
    
    There is also the issue of "getting out of sync".  I know that this "should
    not
    happen" in a good implementation, but there are lots of people nervous
    about this
    issue.
    
    Also, as someone else posted, if every ethernet frame contained an iSCSI
    header, it
    would be real easy to implement a protocol analyzer.  How is a protocol
    analyzer
    supposed to figure out what's in a TCP stream?
    
    These issues are a real detriment to the use of TCP as an iSCSI transport.
    The
    benefits of TCP are it's "reliable" delivery and congestion avoidance.  The
    disadvantage is there is no "framing".
    
    I hate to say it, but if another transport was utilized (say UDP), the same
    proven
    algorithms that make TCPs reliablity and congestion control could be
    implemented in
    the iSCSI layer along with the benefits of UDP framing.
    
    (I wonder why all the proprietary start-ups seem to be using UDP...?)
    
    TCP is great for things like telnet and email, but I'm less and less
    convinced it's
    appropriate for mass storage.
    
    > This would
    > decrease the amount of buffering needed for TCP receive. In the
    > limit, this optimization would decrease memory requirements by up to a
    > factor of 2.
    >
    > As you point out, implementations can't rely on alignment in a TCP stream
    > - to be interoperable they would have to implement unaligned parsing
    > too. However, it would be reasonable to implement aligned parsing
    > in the fast-path and kick unaligned parsing to the slow path.
    >
    > TCP stacks don't know the alignment requirements of applications
    > because most applications fail to communicate message boundaries to the
    > TCP stacks. Historically, this is because the TCP stack ignores
    > message boundaries.
    >
    > However, it is possible to communicate message boundaries through
    > the sockets interface! With the current TCP sockets interface, the
    sendmsg
    > command, coupled with the MSG_EOR (end of record) flag, could be used to
    > communicate iSCSI PDU boundaries to the TCP stack. The getopt
    > command could be used to get the current path MTU of the connection.
    >
    > To communicate message boundaries over the wire, it
    > would be expedient to add a bit to the TCP header saying:
    > this segment is also the start of a new higher-layer PDU. This
    > could help out-of-order parsing of the TCP stream.
    
    Ok, but would this require a "change/enhancement" to the existing TCP
    standard?
    
    >
    >
    > As for the PSH bit, RFC 1122, section 4.2.2.2, states "The PSH bit is not
    > a record marker..."
    >
    > There are many constraints that can be envisioned
    >
    > 1) say nothing about alignment
    > 2) iSCSI PDU headers do not span segments
    
    With existing TCP implementations, how can this be enforced?
    
    
    > 3) if more than one iSCSI PDU header appears in a segment, one
    >    iSCSI PDU header always appears at the start of the segment
    
    See #2.
    
    
    > 4) zero or one iSCSI PDU header per segment and header always
    >    aligned with start of segment.
    >       - potentially creates lots of small segments
    
    >
    > 5) new segment always starts new iSCSI PDU
    >       - implies iSCSI MTU <= TCP path MTU
    >       - creates lots of small segments
    >
    > My preference is #3.
    
    My preference is #5.  But the problem is comment to #2.
    
    >
    >
    > -Costa
    
    -Matt
    
    
    
    
    


Home

Last updated: Tue Sep 04 01:08:11 2001
6315 messages in chronological order