SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: is 1 Gbps a MUST?



    
    Thanks for the clarification. Something still bothers me however.
    If IPSec is a bottleneck (because the policy lookup is done in software)
    then the receiver may be forced to drop packets quite frequently. Such
    behavior could have a dramatic effect on performance as explained in a memo
    that Jonathan Stone posted on 2/5/02 (attached) and in my interpretation
    which I did not post on 2/6/02 (attached). Comments? Thanks.
    
    Vince Cavanna
    Agilent Technologies
    
     <<Re: iSCSI: No Framing >>  <<RE: iSCSI: No Framing >> 
    


    In message <ED8EDD517E0AA84FA2C36C8D6D205C1301CBF2C5@alfred.xiotech.com>,
    "Peglar, Robert" writes:
    
    >The original thread began with a question (paraphrased) about '...what
    >applications could consume a 10G pipe for long periods of time'.  I
    answered
    >that question - disk-disk backup and subsystem replication.
    
    Even disk-to-disk applications or backup applications really want
    approximately BW*RTT worth of buffering.  Hugh Holbrook's recent
    Stanford PhD thesis traces the conventional wisdom back to an email
    from Van Jacobson to the e2e list in 1990.
    
    It's reasonably well-known in the TCP community that TCP slow-start
    generates spiky traffic. It leads to bursts of high buffer occupancy
    (e.g., at the point where the exponential rampup switches to
    congestion avoidance.)  Indeed, that was the motivation behind
    TCP-Vegas, and the recent work on TCP pacing.
    
    The whole debate over framing/marking only makes sense if one views
    outboard NIC buffering of RTT*BW as very expensive (e.g., forcing a
    design from onchip RAM to external SRAM). Adding framing of iSCSI PDUs
    allows the NIC to continue doing direct data placement into host
    buffers, accomodating the BW*RTT of TCP buffering in "cheap" host RAM
    rather than "expensive" NIC RAM.  
    
    But you can't get away from providing the buffers. Not unless you are
    also willing to artificially restrict throughput.  If iSCSI doesn't
    provide some form of framing, then what can a NIC on a MAN with medium
    BW*RTT do, if it sees a drop? It has only a few choices:
    
      1. start buffering data outboard, hoping that TCP fast-retransmit will
         send the missing segment(s) before the outboard buffers  are exhausted;
    
      2. Give up on direct  data placment, and start delivering packets to
        host memory, any old how --at the cost of SW reassembly and alignment
        problems, and a software CRC, once the missing segment is recovered.
    
      3. Start dropping packets, and pay a huge performance cost.
    
    There are some important caveats around the BW*RTT: if we can
    *guarantee* that the iSCSI NICs are never the bottleneck point, or
    that TCP never tries to reach the true link BW*RTT (due to undersized
    windows), then one can get away with less. (See Hugh Holbrook's thesis
    for more concrete details).
    
    But the lesson to take away is that even in relatively well-behaved
    LANs, TCP *by design* is always oscillating around overloading the
    available buffers, causing a drop, then backing off.  See, for
    example, Figure 2 of the paper by Janey Hoe which introduced "New
    Reno"; or Fig. 2 and 3 of the paper by Floyd and Fall. New Reno avoids
    the long-timeouts between each drop, but the drops themselves still
    occur.
    
    Moral: TCP can require significant buffering even on quite modest
    networks.  It __may__ be worth keeping framing, so that host NICs can
    do more of that buffering in host memory rather than outboard; and so
    they can continue performing DDP rather than software reassembly and
    software CRC checking. Storage devices are another issue again.
    
    
    
    
    
    References:
    
    Van Jacobson, modified TCP congestion avoidance algorithm.
    Email to end2end@isi.edu, April 1990.
    
    L Brakmo, , S O'Malley, L Peterson, TCP Vegas: new techniques for congestion
    detection and control, SIGCOMM 94.
    
    J Kulik, R Coulter, D Rockwell, and C Partridge, A
    simulation study of paced TCP. BBN TEchnical Memorandum 1218, BBN,
    August 1999.
    
    J Hoe, Improving the Startup Behaviour of a Congestion Control Scheme
    for TCP,  ACM SIGCOMM 1996, 
    
    S Floyd and K Fall, Simulation-based comparisons of Tahoe, Reno, and SACK
    TCP, Comp. COmm. Review no 6 v 3, April 1996.
    
    H Holbrook.  A Channel Model for Multicast.  PhD Dissertation.
    Department of Computer Science.  Stanford University.  August, 2001.
    http://dsg.stanford.edu/~holbrook/thesis.ps{,.gz}. (See Chapter 5.)
    
    (Holbrook cites Aggrawal, Savage, and Anderson, INFOCOMM 2000, on the
    downsides of TCP pacing; but I haven't read that.  The PILC draft on
    link designs touch the same issue, but the throughput equations cited
    there factor out buffer size.)
    
    
    
    >FC is not sufficient.  Storage-to-storage needs all the advantages as well
    >as that which iSCSI has to offer the host-storage model.
    
    But it will still need approximately BW*RTT of buffering, even for
    low-delay LANS. Or performance will fall off a cliff under
    "congestion" -- e.g., each time some other iSCSI flow starts up,
    begins competing for the same TCP endpoint buffers, on the same iSCSI
    device, and triggering a burst of TCP loss events for the
    storage-to-storage flow.
    




    Hello Jonathan,
    
    Interesting and useful points!
    
    I would appreciate your opinion on the following observation.
    
    It seems to me that the cliff-like drop in performance that is a consequence
    of dropping packets is likely to result, as well, from any other bottleneck
    that may exist in the path to the buffers such as an IPSec engine that is
    not capable of link-speed throughput or large, even if occassional, latency
    in any internal shared medium that lies in the path to the buffers. It is
    easy for the IPSec engine to become a bottleneck (even without considering
    the crypto algorithms) since it has to perform a complex policy database
    lookup on every received packet, secured or not, to confirm that the packet
    was afforded the appropriate security as indicated by the configured
    security policy. The moral I took from your memo is that link speed
    throughput is necessary all the way to the buffers.
    
    Thanks,
    Vince Cavanna
    Agilent Technologies
    
    |-----Original Message-----
    |From: Jonathan Stone [mailto:jonathan@dsg.stanford.edu]
    |Sent: Tuesday, February 05, 2002 7:10 PM
    |To: Peglar, Robert
    |Cc: ips@ece.cmu.edu
    |Subject: Re: iSCSI: No Framing 
    |
    |
    |In message 
    |<ED8EDD517E0AA84FA2C36C8D6D205C1301CBF2C5@alfred.xiotech.com>,
    |"Peglar, Robert" writes:
    |
    |>The original thread began with a question (paraphrased) about '...what
    |>applications could consume a 10G pipe for long periods of 
    |time'.  I answered
    |>that question - disk-disk backup and subsystem replication.
    |
    |Even disk-to-disk applications or backup applications really want
    |approximately BW*RTT worth of buffering.  Hugh Holbrook's recent
    |Stanford PhD thesis traces the conventional wisdom back to an email
    |from Van Jacobson to the e2e list in 1990.
    |
    |It's reasonably well-known in the TCP community that TCP slow-start
    |generates spiky traffic. It leads to bursts of high buffer occupancy
    |(e.g., at the point where the exponential rampup switches to
    |congestion avoidance.)  Indeed, that was the motivation behind
    |TCP-Vegas, and the recent work on TCP pacing.
    |
    |The whole debate over framing/marking only makes sense if one views
    |outboard NIC buffering of RTT*BW as very expensive (e.g., forcing a
    |design from onchip RAM to external SRAM). Adding framing of iSCSI PDUs
    |allows the NIC to continue doing direct data placement into host
    |buffers, accomodating the BW*RTT of TCP buffering in "cheap" host RAM
    |rather than "expensive" NIC RAM.  
    |
    |But you can't get away from providing the buffers. Not unless you are
    |also willing to artificially restrict throughput.  If iSCSI doesn't
    |provide some form of framing, then what can a NIC on a MAN with medium
    |BW*RTT do, if it sees a drop? It has only a few choices:
    |
    |  1. start buffering data outboard, hoping that TCP 
    |fast-retransmit will
    |     send the missing segment(s) before the outboard buffers  
    |are exhausted;
    |
    |  2. Give up on direct  data placment, and start delivering packets to
    |    host memory, any old how --at the cost of SW reassembly 
    |and alignment
    |    problems, and a software CRC, once the missing segment is 
    |recovered.
    |
    |  3. Start dropping packets, and pay a huge performance cost.
    |
    |There are some important caveats around the BW*RTT: if we can
    |*guarantee* that the iSCSI NICs are never the bottleneck point, or
    |that TCP never tries to reach the true link BW*RTT (due to undersized
    |windows), then one can get away with less. (See Hugh Holbrook's thesis
    |for more concrete details).
    |
    |But the lesson to take away is that even in relatively well-behaved
    |LANs, TCP *by design* is always oscillating around overloading the
    |available buffers, causing a drop, then backing off.  See, for
    |example, Figure 2 of the paper by Janey Hoe which introduced "New
    |Reno"; or Fig. 2 and 3 of the paper by Floyd and Fall. New Reno avoids
    |the long-timeouts between each drop, but the drops themselves still
    |occur.
    |
    |Moral: TCP can require significant buffering even on quite modest
    |networks.  It __may__ be worth keeping framing, so that host NICs can
    |do more of that buffering in host memory rather than outboard; and so
    |they can continue performing DDP rather than software reassembly and
    |software CRC checking. Storage devices are another issue again.
    |
    |
    |
    |
    |
    |References:
    |
    |Van Jacobson, modified TCP congestion avoidance algorithm.
    |Email to end2end@isi.edu, April 1990.
    |
    |L Brakmo, , S O'Malley, L Peterson, TCP Vegas: new techniques 
    |for congestion
    |detection and control, SIGCOMM 94.
    |
    |J Kulik, R Coulter, D Rockwell, and C Partridge, A
    |simulation study of paced TCP. BBN TEchnical Memorandum 1218, BBN,
    |August 1999.
    |
    |J Hoe, Improving the Startup Behaviour of a Congestion Control Scheme
    |for TCP,  ACM SIGCOMM 1996, 
    |
    |S Floyd and K Fall, Simulation-based comparisons of Tahoe, 
    |Reno, and SACK
    |TCP, Comp. COmm. Review no 6 v 3, April 1996.
    |
    |H Holbrook.  A Channel Model for Multicast.  PhD Dissertation.
    |Department of Computer Science.  Stanford University.  August, 2001.
    |http://dsg.stanford.edu/~holbrook/thesis.ps{,.gz}. (See Chapter 5.)
    |
    |(Holbrook cites Aggrawal, Savage, and Anderson, INFOCOMM 2000, on the
    |downsides of TCP pacing; but I haven't read that.  The PILC draft on
    |link designs touch the same issue, but the throughput equations cited
    |there factor out buffer size.)
    |
    |
    |
    |>FC is not sufficient.  Storage-to-storage needs all the 
    |advantages as well
    |>as that which iSCSI has to offer the host-storage model.
    |
    |But it will still need approximately BW*RTT of buffering, even for
    |low-delay LANS. Or performance will fall off a cliff under
    |"congestion" -- e.g., each time some other iSCSI flow starts up,
    |begins competing for the same TCP endpoint buffers, on the same iSCSI
    |device, and triggering a burst of TCP loss events for the
    |storage-to-storage flow.
    |
    




Home

Last updated: Fri Feb 22 10:18:08 2002
8845 messages in chronological order