SORT BY:

LIST ORDER
THREAD
AUTHOR
SUBJECT


SEARCH

IPS HOME


    [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

    Re: TCP (and SCTP) sucks on high speed networks



    
    
    Matt
         I think you are assuming (maybe rightly) in this example that the
    advertized window size exactly matches
         the bandwidth-delay product. (Only True if CWND is limiting factor due to
    loss due to congestion)
    
         However if the advertized window was - say - four times the BDP  and CWND
    was similiar
         since there was no packet loss due to congestion recently, then losing one
    packet
         will cut the window down to one half what it was, and send a new packet for
    each duplicate ack
         received. The window size should now be twice BDP and throughput not be
    reduced,
         since CWND is twice the BDP and duplicate acks are arriving.
    
         Losing another packet however, causes the data rate on the line to drop,
    hence congestion
         avoidance.
    
         So I think losing one packet due to CRC type error can be handled.  It's
    when Slow start kicks in
         that the line gets underutilized, and if it kicks in it means that you are
    into a congestion situation
         most likely.
    
         I have not seen any document mandating that the max receiver side
                    advertized window size must match the RTT, or at
         least mandating that it should not be greater than it, or that it should
    track it in some way.
    
         Your right though in stating that TCP did  assume every duplicate ack to
    indicate packet loss due to
         CONGESTION. In effect my understanding is that the fast re-transmit, fast
    recovery, congestion
         avoidance algorithm a kind of  assumes that the lost frame is due to
    re-ordering (not congestion).
         Thus we can use this scheme to handle any single packet loss and not affect
    throughput.
         This can defeat the congestion avoidance algorithm in the single packet
    loss case, If
         I understand it's workings correctly in this case.
    
         I think that the advertized window in the gigabit, vast range of RTT world
    we're entering requires that
         there be an algorithm  which links the RTT estimate to the max advertized
    window size.
         I have not seen any such algorithm discussed. But then I may have not read
    the right RFC's
         /publications etc. Part of the problem is that the RTTs in both directions
    may be different.
         This might be detected though by looking at the line utilization % on the
    receive port in some instances.
    
         This is an area of TCP that I would like to see
    examined/researched/discussed w.r.t iSCSI and the
         requirement for efficent memory usage on TOE type implementations.
    
         Of course if CWND is just at the BDP for the line due to congestion, then,
    of course what you say is
         true. But thats not what we're talking about here.
    
         If I've misunderstood something here then please correct me.
    
    
         URG POINTER/ Framing.
    
         What's the basis for leaving this in the spec ?. Surely you would want
    something better.
         Again I say that I believe that there is not a big memory issue on the LAN,
    and thus not a big cost
         issue. If iSCSI is not successful in the LAN I fail to see how it will be
    successful at all.
         The disaster recovery MAN link has not got a huge memory requirement
    either.
    
         The general problem of 10G links half way around the world.....let's solve
    that when iSCSI
         is successful in the LAN and customers have a real problem paying $100/$200
    for memory
         for their 10G iSCSI adaptor connecting their clear channel link between US
    and Europe.
    
    
    Dick Gahan
    3Com
    
    
    
    
    
    
    Matt Wakeley <matt_wakeley@agilent.com> on 01/12/2000 07:44:09
    
    Please respond to Matt Wakeley <matt_wakeley@agilent.com>
    
    Sent by:  Matt Wakeley <matt_wakeley@agilent.com>
    
    
    To:   end2end-interest@ISI.EDU, ips@ece.cmu.edu
    cc:    (Dick Gahan/IE/3Com)
    Subject:  TCP (and SCTP) sucks on high speed networks
    
    
    
    
    TCP's "congestion avoidance" algorithms are not compatible with high speed,
    long distance networks.  The "cut transmit rate in half on packet loss and
    increase the rate additively" algorithm will simply not work.
    
    Consider a 10Gbs link to a destination half way around the world.  A packet
    drop due to link errors (not congestion or infrastructure products) can be
    expected about every 20 seconds.  However, with a RTT of 100ms (not even
    across the continent), if a TCP connection is operating at 10Gbs, the packet
    drop (due to link error) will drop the rate to 5Gbs.  It will take 4 *MINUTES*
    for TCP to ramp back up to 10Gbps.
    
    Therefore, there needs to be a change to TCP's congestion avoidance algorithm
    for future high speed networks.  Since SCTP is based on the same algorithms,
    it is doomed to the same fate.
    
    -Matt
    
    
    
    
    
    
    PLANET PROJECT will connect millions of people worldwide through the combined
    technology of 3Com and the Internet. Find out more and register now at
    http://www.planetproject.com
    
    
    


Home

Last updated: Tue Sep 04 01:06:14 2001
6315 messages in chronological order