
Go backward to Congestion Control
Go up to Top
Congestion control in TCP
- The first thing to note about his topic is that is might be
a bit surprising that congestion control is part of TCP.
- If congestion does occur it will impact all IP traffic,
not just TCP traffic.
- Congestion can be caused by any source of IP traffic (UDP
for example), not just TCP.
- The logic behind implementing congestion control techniques within
TCP seems to have two components.
- A definitely not "first principles" reason --
TCP is the source of the vast majority of IP traffic.
- A somewhat deeper justification --
- Congestion control is about allocating the bandwidth
available within each of the paths through the
network. This requires some notion of end-to-end
flow of packets. If each packet sent is viewed
as an independent event, there is no notion of
"data rate" or bandwidth requirements.
- While IP and UDP have no explicit notion of a
connection, TCP provides one.
- First, we should observe that TCP already tends to have one good
property as far as congestion control is concerned. The limits
imposed by end-to-end flow control tend to smooth out the data
rate when large quantities of data are transmitted.
- Recall that for flow control the amount of data a sender
can transmit before receiving an ACK is limited by the
window size chosen by the receiver.
- Imagine a machine connected to a high speed network
(100MB ethernet) sending packets through the
Internet to a remote host. Assume the actual bandwidth
the Internet can provide over this path is 50KB.
- Initially, the sender is likely to fire off packets
at something close to 100MB/sec.
- Once the sender reaches this limit, the rate at
which new packets are sent is limited by the
arrival rate of acks.
- Since, if there isn't too much congestion, the arrival
rate of acks has a lot to do with the arrival rate
of the packets being acked, at steady state, the transmission
rate will tend to the rate at which packets are being
delivered, i.e. 50KB.
- This is known as self-clocking.
- Recall from our discussion of sliding window protocols at the
data link layer that if errors are highly unlikely, the perfect
buffer size is the size that allows us just enough leeway
to send packets just a bit longer than the time it takes for
an Ack to return for the first packet sent.
- If we have less buffer space than this, we will have
to periodically pause and wait for acks.
- In the standard data link protocol, if we have more buffer
space, the sender will accept more packets from the
higher level application, but it won't be able to send
any faster.
- Consider what happens at the transport layer if the flow-control
window size allows us to send more data than the system can
actually deliver in the time required for an ack to return.
- The key issue here is that the transmission rate on our
outgoing port may be higher than that of the underlying
network.
- When the sender first begins transmission, it will be able
to send at a rate higher than that at which packets can
actually be delivered. When these packets reach the
bottleneck in the path (the network/router that is actually
limiting the data rate), the will end up buffered and
queued at the router.
- Eventually, the sender will reach the window size limit and
be forced to slow down its transmission to match the
arrival rate of ACKs. When this occurs, it will add
packets to the network at the same rate at which the
bottleneck can process packets.
- As a result, it will add packets to the queues at the
bottleneck just fast enough to prevent the bottleneck
from reducing its queue lengths.
- If instead, the sender somehow knew enough to send
packets at just the rate the bottleneck could handle,
the overall rate of the connection would be the same,
but amount of data the bottleneck would need to buffer
would be reduced to one packet.
- Alternately, if the receiver somehow knew enough to set
the window size just big enough to handle one round-trip
time worth of data, the bottleneck would be able to
empty the initial build-up before the first ack allowed
the sender to start the steady flow.
- This illustrates where and how congestion arises:
- If nodes
send faster than some links in the network can handle their
data, queues build up.
- As more connections become active, the throughput for
a given connection at a router may decrease causing additional
queuing.
- At some point, routers run out of room for buffers and
begin discarding packets.
- It also suggest that it might be completely avoidable:
- If all connections limited their data rate so that no
router had to buffer more than one packet...
but this is misleading:
- When the load on a router lightens up temporarily, we
would like to have packets on hand to soak up the
available bandwidth. If the packets are all sitting
back at the source being send out at what was just the
right rate, opportunity may be lost.
- So, the trick is to overburden the routers somewhat but not too much.
- The other trick is that end-hosts can infer than "too much"
has occurred by monitoring "lost" packets:
- It turns out that most of the Internet is reliable
enough that the main reasons packets get lost is
that they are discarded by overloaded routers.
- Therefore, it is somewhat reasonable to conclude that
congestion has occured if a packet's timeout expires.
- TCP uses this simple feedback to limit congestion by placing
an additional "congestion window" on its transmissions
and adjusting this window's size based on when timeouts occur.
- The limit actually placed on outstanding unacknowledged
packets will be the smaller of the flow-control window
specified by the receiver and the congestion-control
window determined by the sender.
- At the steady state, TCP employs something like Ethernet's
exponential backoff algorithm:
- When a packet times out, TCP assumes congestion has
set it and halves the size of the congestion control
window to reduce the load on the network.
- If this were the only adjustment made to the congestion
control window, it would eventually shrink to nothing.
so in addition:
- TCP places a lower bound of one packet
on the window.
- TCP incrementally adds 1 packet's worth to the
congestion window size every time a full window
worth of packets is acked.
- These policies are known as multiplicative decrease
/additive increase.
- If you think carefully about what actually happens when a packet
gets lost, you will see that we still have a problem. Basically,
the whole connection goes dead so we lose our self-clocking property.
- When one ack is lost, following packets can't be acked
because the acks are cummulative.
- By the time the retransmitted packet arrives, a full
window worth of unackable packets may have been received.
- The retransmitted packet is therefore likely to produce
an ACK for a full window's worth of packets which
the sender will happily transmit as fast as its outgoing
port will allow which is likely to exceed the capacity
of some poor router down steam which will throw away
more packets.
- TCP solves this with an approach called "slow start". This
approach is used (with slight difference) both when a connection
is first established and after a packet is lost.
- In the case of a timeout, the old window size (divided by
two) is saved. In either case the congestion window
is set to 1.
- During the "slow start" period, each time an ack is
received, the congestion window is increased by 1.
That is, each time a full window worth of packets
is correctly acked, the window size is doubled.
- In the case that slow start was brought on by a timeout,
it stops if the window size reaches the saved window size
from before the timeout.
- Otherwise, it continues until a timeout occurs.
- The mechanisms I have described so far represent a (somewhat
imcomplete) description of most current TCP implementations.
- If you think about them for a bit, you should at least conclude
that Berners-Lee wasn't thinking much about how TCP works
when the web protocol HTTP was developed.
- There are some odd tricks routers can perform to help schemes
like this work better.
- One approach is called "Random Early Detection" gateways.
- The idea is to have routers randomly throw away
packets with a probability that increases as the
routers available buffer space decreases.
- The idea is to "trick" the TCP senders whose packets
get lost to slow down before the router is overwhelmed
and forced to drop many packets.
- This is an interesting example of "non-critical state".
That is, while the IP philosophy would inhibit any scheme
in which the routers kept track of information about
connections that would destroy the connections if the
router failed, RED routers just play a helpful but
non-essential role in congestion control.
- Another issue with which routers can help is fairness.
- If you think about how TCP handles congestion it
might strike you that if someone sends lots of data
using UDP, they could convince all the TCP users
to back off by just hogging the network and end up
benefitting.
- One can combat this by using a technique called fair
queueing in the routers.
- The basic idea is to allocate each connection
passing through a router an equal percentage
of the router's capacity.
- Roughly speaking, this is done by taking turns
forwarding packets for connections in a round-robin
fashion.
- At the very least, this is another example of
non-critical state.
- In fact, given the nature of IP, it is not clear
how a router would even identify connections.
Any reasonable heuristic, however, should be
usable since the function is not essential.
Computer Science 336
Department of Computer Science
Williams College