Research
- Transmission Control Protocol
- Open TCP Problems
- Related Work
- References
- Flowgrind
- Autoconfiguration
Open TCP Problems
The dominant transport protocol in the contemporary Internet is TCP. Although it works very effectively, i.e., the Internet works, it has a number of issues. Some of these issues become especially a concern in environments like Wireless Mesh Networks (WMN) as they degrade TCP's performance heavily. This penalizes the advantages of WMNs, like deployability, low cots, etc.
Since TCP has numerous improvements and/or changes the term TCP is somewhat blurry. To overcome this issue, we first define baseline TCP. Baseline TCP pretty much meets what current operating systems (like Windows XP, Linux, MacOS) implement and have enabled by default. Baseline TCP is defined to implement:
- RFC793: Transmission Control Protocol
- RFC813: Window and Acknowledgment Strategy in TCP
- RFC1122: Requirements for Internet Hosts - Communication Layers
- RFC1191: Path MTU discovery
- RFC1323: TCP Extensions for High Performance
- RFC2018: TCP Selective Acknowledgment Options
- RFC2140: TCP Control Block Interdependence
- RFC2883: An Extension to the Selective Acknowledgment (SACK) Option for TCP
- RFC2581: TCP Congestion Control
- RFC2861: TCP Congestion Window Validation
The following paragraphs describe problems with baseline TCP.
Loss Ambiguity
Baseline TCP suffers from the very old and very known loss ambiguity, i.e., TCP takes segment loss as an indication of congestion along the path to the peer it is communicating with. However, packet loss is not necessarily due to congestion. There are networks which drop a non-neglectible amount of packets because their "best effort" is simply not that good as for instance Ethernet's, etc. Famous examples for this type of networks are 802.11, GPRS, UMTS, satellite links. An eminently worse case forms 802.11 along with multi-hop forwarding.
Still, corruption based loss may not necessarily mean non-congestion based loss. A common case where corruption based loss pronounces actual congestion based loss is just before when the 802.11 MAC layer re-adjust its data rate upon too many failed retransmissions, e.g. due to changes in the environment rendering the air link less usable. In general, there is no clear solution on how to react on corruption based loss. One chance to tackle this dilemma in the aforementioned example is to have the involved node decide after the data rate reduction was set what corruption based losses were actual due to (incipient) congestion and have this recorded. These records may then be utilized by techniques mentioned in EddOstAll2004a.
Even slight corruption based loss may cause TCP to perform suboptimal. As soon as the packet loss probability is that high that a full congestion epoch (the time from one congestion loss event to the next in steady state) cannot elapse, TCP is unable to utilize the full bandwidth.
Start-up behavior
Each connection (if not to host cached by
RFC2140) needs to find the
appropriate sending rate. Currently, TCP allows an initial window of between
one and four segments of maximum segment size (MSS) [
RFC2581,
RFC3390]. The TCP
connection then probes the network for available bandwidth using the slow-start
procedure, doubling cwnd during each congestion-free round-trip
time (RTT). This can be time-consuming -- especially over networks with large
bandwidth or long delays. It may take a number of RTTs in slow-start before
the TCP connection begins to fully use the available bandwidth of the network.
For instance, it takes log_2(N) - 2 round-trip
times to build cwnd up to N segments, assuming an initial
congestion window of 4 segments. This time in slow-start is not a problem for
large file transfers, where the slow-start stage is only a fraction of the
total transfer time. However, in the case of moderate-sized transfers, the
connection might carry out its entire transfer in the slow-start phase, taking
many round-trip times, where one or two RTTs might have been sufficient when
using the currently available bandwidth along the path.
Another problem of slow-start is that its exponential growths may cause a vast amount of packet loss in networks with high bandwidth delay product. When slow-start results in a large increase in the congestion window in one round-trip time, a large number of packets might be dropped in the network (so called slow-start overshoot). This drop of a large number of packets in the network can result in unnecessary retransmit timeouts for the TCP connection. The TCP connection could end up in the congestion avoidance phase with a very small congestion window, and could take a large number of round-trip times to recover its old congestion window.
RTT unfairness
In the congestion avoidance phase TCP increases its congestion window by one MSS once a round trip time. Thus, when flows with different round trip times compete for a single bottleneck link, in the average flows with lower round trip time will obtain more bandwidth from the bottleneck link than flows with higher round trip time as they increase their congestion window less often.
Two possible approaches to tackle this issue are to use real time instead of round-trip time or to have the amount of increment per round-trip time depend on the round-trip itself.
Reordering
Segments received out-of-order [ BenParShe1999a] cause immediate DUPACKs to be sent by the receiver. Three DUPACKs trigger fast retransmit/fast recovery at the sender which is spurious in this case, degrading performance/link utilization. Possible reasons for segment reodering comprise:
- Link layer recovery
- Handovers
- Priority blocking
- Route changes
- Multipath routing
- Buggy participants replicating segments
- DiffServ
Delay spikes
As long as the round trip time varies within the boundaries of the RTO,
baseline TCP does not have any issues. If a delay spike does exceed the RTO,
tough, TCP will retransmit the oldest outstanding segment and will enter
slow-start with cwnd set to one MSS. This is obviously bad. First,
starting-over with slow-start will cause poor link utilization. Second, the
spurious retransmissions will trigger out-of-order DUPACKs to be sent by the
receiver triggering fast retransmit/recovery at the sender which in turn will
degrade performance even further. Reasons for delay spikes comprise:
- Link layer recovery
- Handovers
- Priority blocking
- Temporal link outages
- Route changes
Scalability
The scalability issue of baseline TCP is that it does not scale well with the growth of bandwidth modern link technologies provide. The problem stems from the fact that the size of the congestion window oscillates between the bandwidth delay product (plus some space of the router queues along the path) and its halve at a rate of only one MSS per round-trip time. As the bandwidth delay product increases the oscillation time increases. For usual gigabit links and baseline TCP this time is already at the magnitude of 1 (one) hour! The obvious solution to increase the increment rate has the downside of impairing the RTT unfairness.
Weak Interactivity
To detect packet loss baseline TCP relies on either RTO or 3 DUPACK to arrive. For non bulk-data connections the amount of outstanding segments in the network may frequently happen to be under the amount to trigger 3 DUPACK on packet loss. Thus, packet loss can be detected by RTO only. Extensively varying route trip times, coarse system timers or lost retransmission (on very erroneous links) may cause high RTO values, rendering interactive shells less responsive.
No Holistic View
The TCP congestion avoidance mechanisms [RFC2581], while necessary and powerful, are not sufficient to provide good service in all circumstances. Basically, there is a limit to how much control can be accomplished from the edges of the network. Some mechanisms are needed in the routers to complement the endpoint congestion avoidance mechanisms. More in [FloJac1991a]
Good Faith
Baseline TCP expects everyone to be well behaved, i.e., complying to the RFCs. For that reason it is very susceptible to bad behavior of others. There are numerous attacks which enable an attacker to gain more than the fair share of the bandwidth, create local or even global congestion, etc. [SavCarWet1999a, SheBhaBra2005a]
Designed To Congest
Baseline TCP is designed to fill router queues and even to make them overflow. Filling queues increases latency and overflowing queues wastes bandwidth.
Symmetry Assumption
Networks may exhibit arbitrary vast difference on forward and reverse direction in regard to properties like bandwidth, latency, bit error rate, queue size, etc. The general problem here is that for unidirectional traffic the superior link might not get fully utilized because the weak link does not deliver ACKs fast enough. [ BalKatPad199910a]


