SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: TCP Framing (considered helpful?)



    
    Replies in text below (between [Huff] and  [/Huff]  ).
    
    .
    .
    .
    John L. Hufferd
    Senior Technical Staff Member (STSM)
    IBM/SSG San Jose Ca
    (408) 256-0403, Tie: 276-0403,  eFax: (408) 904-4688
    Internet address: hufferd@us.ibm.com
    
    
    Stephen Bailey <steph@cs.uchicago.edu>@ece.cmu.edu on 05/21/2001 06:50:41
    AM
    
    Sent by:  owner-ips@ece.cmu.edu
    
    
    To:   ips@ece.cmu.edu
    cc:
    Subject:  Re: TCP Framing (considered helpful?)
    
    
    
    John,
    
    > I think we must depend on Markers to insure that everything can operate
    at
    > top speed, and at the lowest cost.
    
    A key question is whether markers actually ensure that everything
    operates at `top speed, and at the lowest cost'.
    
    Matt thinks so.  I (and, presumably those who wrote the framing
    document) think not.
    
    [Huff] I do not think you can say that.  I also support framing (Warp), as
    a much more elegant solution, but I find it inappropriate to depend on it
    actually happening, and being made available in the various OSs in the time
    frame we need.  I do believe, that over time it will be made available, and
    is a better approach for all TCP/IP applications that can use it. [/Huff]
    
    My issue is not even with `lowest cost'.  I don't believe markers will
    allow you to run at top speed.  Specifically:
      1) I doubt the feasibility of implementing the control required for
         an eddy buffer (where you store data you can't place) at 10G.
         Admittedly, the validity of this claim can't really be assessed
         without actually working the implementation, so for 99% of the
         list participants (myself included) this is a `yes it is, no it
         isn't' point.
    
       [Huff] I believe this has had much more work done on it then you
          think.  I have personally stepped through the proposals from
          several vendors that are working on this option for their HW HBAs.
          Usually, because of the iSCSI PDU headers, the data/commands
          can be placed directly into the SCSI Host buffers, almost every
          time. Only when the PDU headers arrive slightly out of order
          (do to normal routing) are the packets unable to be placed
          directly into the Host buffer.  And that requires some, but only
          a small amount, of buffering space.
          It is the packet drops that occur on PDU headers, and resultant
          error retries, that cause the need for large amounts of "on
          HBA/chip" buffering.
          So by using Markers, these HW iSCSI HBAs can limit the amount of
          buffering on the chip/HBAs. [/Huff]
    
      2) an eddy buffer solution requires some substantial speed-up in
         both the NIC data path, and MOST IMPORTANTLY: the host bus.  In
         order to unload the eddy buffer while still handling incoming
         traffic at line rate, clearly the host bus bandwidth must be >
         line rate.
    
         [Huff] This is not an effect of an eddy buffer solution, it is a
         fact that every TCP/IP NIC has to deal with.  Especially at the
         new Speeds.  Our current PCI buss will not support 10 Gigabit, further
         PCI-X will not support it either, even PCI-DDR does not fully support
         the full data rate.  So it needs to rely on the TCP/IP window
         management.  The only other thing you can do is drop the packets.
         this clearly makes the problem worse. [/Huff]
    
    
    I know of at least one general purpose framed solution operating at
    10G which has been available for >3 years (SGI's GSN/ST/XIO NIC).  I'm
    sure there are others.
    
    I can't imagine there's any argument that a framed solution would be
    voted `most likely to run fast and be cheap'.  Every storage network
    and cluster interconnect has been designed that way since antiquity.
    
    The key tradeoff involves the OS vendors, and I'm wondering why we're
    speaking for them.  The question IS, how much more work is it to
    introduce TCP framing over and above what is required to insert iSCSI
    into their network framework.  My experience from writing NIC and
    storage drivers for many commercial UNIX-family OSes is:
      1) it's an easy and well defined process to insert a new SCSI
         transport driver into the SCSI stack.
      2) it's hard and poorly defined process to insert ANYTHING into the
         network stack.
    [Huff] I think you are making my point.  This is the problem with SW
    Stacks.  That is why I believe that it will take a very long time for
    the various vendors to include such changes into their "bet you business"
    TCP/IP SW Stacks.  The point that Matt and I have been trying to make
    is that most OS vendors are NOT creating the iSCSI HW HBAs (NICs).
    These iSCSI HW HBAs (NICs) have the TCP/IP completely on the HBA, and
    they have added the iSCSI processing also so that they can steer the
    packets directly into the approprate SCSI Host buffers. Adding either
    Markers or Framing into the iSCSI HW HBAs is not a big problem.  It is
    only a problem of getting Framing (timely) into Host TCP/IP Stacks.
    [/Huff]
    
    Networking has historically been a user-mode activity.  Architected
    services are only provided to user mode programs.  Kernel clients have
    been few and far between and so are handled on a case-by-case basis.
    For example NFS.  Every OS has hacks to make NFS run fast, but they
    are not stable interfaces for general purpose use.
    
    Even Solaris' SysV-derived STREAMS stack, which is intended precisely
    to provide flexible, crisp interfaces for kernel network clients, does
    not document the relevant (IP stack) intermodule interfaces.
    
    I know that there are more and more kernel network clients, but they
    are coming either on fluid platforms (e.g. linux), in which case the
    argument of `it'll take too long to get OS support' doesn't apply, or
    they are vendor-supplied, in which case a performance iSCSI solution
    in ANY form may take a while, and the choice of framing or markers
    isn't going to make a difference.
    
    [Huff] I think you are saying something I agree with and something I
    do not agree with.  That is, that software changes to TCP/IP in the
    various "Bet you Business" OSs, will take some time.  However, it is
    not true that new iSCSI device drivers will take very long.  Two types
    are being created today.  By Cisco, IBM, Intel, etc. These types are
    iSCSI DD that make calls to normal TCP/IP stacks, and the DD that
    are being written by the iSCSI HW HBA vendors.  These do not require
    the OS vendor to do anything special.  This is happening NOW,
    (Check with CISCO, Intel, and IBM (me?)).  The last thing we want
    is to depend on a TCP/IP change to get in the
    way of our momentum. [/Huff]
    
    I can't say squat about the architecture of Winsock, but the fact that
    there is a Microsoft author of the framing proposal who seems very
    serious about supporting framing and RDMA as quickly as possible
    suggests that framing support should be available on Windows very
    soon.
    
    [Huff] My following statements are not meant as a negative of Microsoft.
    However, they and all producers of Key complicated new Software do
    not quickly bring these to the general market in a way that is as
    pleasing to HW vendors as HW vendors would like.
    
    I believe that Microsoft's heart is in the right place on this issue,
    and that they will do the right thing with framing, over time.
    But it is not clear in what release that will be shipped, nor what support
    pack it will be included.  Also it is not clear how the support
    will be handled for current Win2k, WinNT etc.
    
    This is why I think we should have Framing a Must implement
    and an Optional to use.  It is the easiest thing for SW to
    create, and brings the needed cost reduction to iSCSI HW and
    it is completely under our (iSCSI protocol) control.
    [/Huff]
    
    
    
    Steph
    
    
    
    


Home

Last updated: Tue Sep 04 01:04:38 2001
6315 messages in chronological order