|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: TCP (and SCTP) sucks on high speed networks
Matt
I think you are assuming (maybe rightly) in this example that the
advertized window size exactly matches
the bandwidth-delay product. (Only True if CWND is limiting factor due to
loss due to congestion)
However if the advertized window was - say - four times the BDP and CWND
was similiar
since there was no packet loss due to congestion recently, then losing one
packet
will cut the window down to one half what it was, and send a new packet for
each duplicate ack
received. The window size should now be twice BDP and throughput not be
reduced,
since CWND is twice the BDP and duplicate acks are arriving.
Losing another packet however, causes the data rate on the line to drop,
hence congestion
avoidance.
So I think losing one packet due to CRC type error can be handled. It's
when Slow start kicks in
that the line gets underutilized, and if it kicks in it means that you are
into a congestion situation
most likely.
I have not seen any document mandating that the max receiver side
advertized window size must match the RTT, or at
least mandating that it should not be greater than it, or that it should
track it in some way.
Your right though in stating that TCP did assume every duplicate ack to
indicate packet loss due to
CONGESTION. In effect my understanding is that the fast re-transmit, fast
recovery, congestion
avoidance algorithm a kind of assumes that the lost frame is due to
re-ordering (not congestion).
Thus we can use this scheme to handle any single packet loss and not affect
throughput.
This can defeat the congestion avoidance algorithm in the single packet
loss case, If
I understand it's workings correctly in this case.
I think that the advertized window in the gigabit, vast range of RTT world
we're entering requires that
there be an algorithm which links the RTT estimate to the max advertized
window size.
I have not seen any such algorithm discussed. But then I may have not read
the right RFC's
/publications etc. Part of the problem is that the RTTs in both directions
may be different.
This might be detected though by looking at the line utilization % on the
receive port in some instances.
This is an area of TCP that I would like to see
examined/researched/discussed w.r.t iSCSI and the
requirement for efficent memory usage on TOE type implementations.
Of course if CWND is just at the BDP for the line due to congestion, then,
of course what you say is
true. But thats not what we're talking about here.
If I've misunderstood something here then please correct me.
URG POINTER/ Framing.
What's the basis for leaving this in the spec ?. Surely you would want
something better.
Again I say that I believe that there is not a big memory issue on the LAN,
and thus not a big cost
issue. If iSCSI is not successful in the LAN I fail to see how it will be
successful at all.
The disaster recovery MAN link has not got a huge memory requirement
either.
The general problem of 10G links half way around the world.....let's solve
that when iSCSI
is successful in the LAN and customers have a real problem paying $100/$200
for memory
for their 10G iSCSI adaptor connecting their clear channel link between US
and Europe.
Dick Gahan
3Com
Matt Wakeley <matt_wakeley@agilent.com> on 01/12/2000 07:44:09
Please respond to Matt Wakeley <matt_wakeley@agilent.com>
Sent by: Matt Wakeley <matt_wakeley@agilent.com>
To: end2end-interest@ISI.EDU, ips@ece.cmu.edu
cc: (Dick Gahan/IE/3Com)
Subject: TCP (and SCTP) sucks on high speed networks
TCP's "congestion avoidance" algorithms are not compatible with high speed,
long distance networks. The "cut transmit rate in half on packet loss and
increase the rate additively" algorithm will simply not work.
Consider a 10Gbs link to a destination half way around the world. A packet
drop due to link errors (not congestion or infrastructure products) can be
expected about every 20 seconds. However, with a RTT of 100ms (not even
across the continent), if a TCP connection is operating at 10Gbs, the packet
drop (due to link error) will drop the rate to 5Gbs. It will take 4 *MINUTES*
for TCP to ramp back up to 10Gbps.
Therefore, there needs to be a change to TCP's congestion avoidance algorithm
for future high speed networks. Since SCTP is based on the same algorithms,
it is doomed to the same fate.
-Matt
PLANET PROJECT will connect millions of people worldwide through the combined
technology of 3Com and the Internet. Find out more and register now at
http://www.planetproject.com
Home Last updated: Tue Sep 04 01:06:14 2001 6315 messages in chronological order |