Distributed denial-of-service (DDoS) attacks can be a major threat to the availability...
Video is an interesting network application. Many users consume hours of video per day, and an hour of HD video is easily more than a gigabyte in size. Those two factors alone make delivering video a non-trivial exercise. But the real challenge is that most video delivered over the Internet is streaming video.
Transferring large files across the Internet has been routine for decades: FTP and HTTP simply take the bandwidth they can get and the download takes as long as it takes. Which leaves the user waiting an indefinite amount of time for the video to start and the video must be stored somewhere. Downloading obviously doesn’t work for live video, and it’s also at odds with certain rights restrictions such as release windows opening and closing at different times.
So both for users and content producers, streaming video is much better: prerecorded or live video starts playing almost immediately, doesn’t take up any storage, and can be made available/unavailable as needed.
However, streaming video is much harder on the network.
The typical way that video is delivered these days is through a continuous sequence of small HTTP downloads. Under perfect conditions, the video is transferred over the network at exactly the same rate as it’s played back. If the network can’t keep up with video playback, the player usually switches to a lower bandwidth / lower quality version of the video stream. But lack of raw bandwidth is rarely a problem: a reasonable video quality can be achieved at rates of one or two megabits per second. The real challenges with streaming video are packet loss and excessive buffering.
When a packet is lost, the video player application stops receiving data until the lost packet has been retransmitted, even though data packets continue to arrive. For instance, when packet 100 is lost, perhaps packets 101 – 150 are delivered before the retransmitted packet 100 arrives. When that happens, the data from the newly arrived packet 100 as well as that from the buffered packets 101 – 150 is delivered at once. Without further action, this would mean that the video stops playing after packet 99, and then continues when the retransmitted packet 100 arrives.
To avoid such hiccups, video players create a buffer for themselves, allowing them to continue playing video from the buffer while waiting for retransmissions and other lapses in network data delivery. It’s still possible for network performance to be so poor that new data doesn’t arrive before the buffer is empty, and then the video has to pause while the player waits for more data. To avoid having to pause again very quickly, the application first fills up its buffer again before it resumes playing. Some applications indicate that they’re “rebuffering” at this point.
So what can we do at the network level to avoid rebuffering events?
Obviously, having more bandwidth available is helpful, if only because that way catching up after a hiccup is much faster. But increasing bandwidth doesn’t necessarily solve poor video streaming performance. Losing a single packet usually doesn’t lead to a user-visible rebuffering event, as TCP’s fast retransmit algorithm coupled with the video player’s buffer are sufficient to recover from the temporary stall in video data delivery.
However, once a second packet gets lost before the first lost packet has been retransmitted, bad things start to happen. First of all, with each lost packet TCP slows down its transmission rate, so after a few lost packets in quick succession, TCP has slowed down to well below the data rate of the video, so a rebuffering event becomes unavoidable. Second, fast retransmit has its limitations, and recovering from lost packets the non-fast way is, well, not very fast.
A compounding issue can be that a NAT, load balancer or firewall blocks the TCP SACK (selective acknowledgment) or window scale options. SACK allows fast retransmit to work more effectively, and window scale is necessary to reach high transfer rates when network latency gets higher. Some devices in the middle of the network let the window scale option through, but don’t implement it themselves and then block retransmitted packets that look invalid without taking window scale into account. If you’re experiencing frequent video streaming issues, check your NATs, including load balancers, and firewalls for these issues.
The Noction IRP can help mitigate video streaming issues by selecting network paths with low packet loss and low latency. Latency doesn’t directly cause video streaming issues, but as network latency gets higher, retransmissions take longer and there are more opportunities to lose more packets before the original lost packet has been retransmitted successfully.
However, if the place where the packet loss happens is in the “last mile” towards the user, then none of the paths the IRP can choose between are going to avoid the issue. A common source of loss and delay, especially in the last mile, is buffer bloat. TCP’s strategy to deal with latency is to push as many packets down the network as the network will accept, making sure that there are always packets “in flight”. The trouble with that approach is that there are routers, switches and other devices that will buffer packets as those encounter a bottleneck, and all the buffered packets increase latency. Finally, at some point the buffers overflow and multiple packets get lost. TCP now slows down its transmission rate significantly.
A good overview of the impact of buffer bloat on video streaming can be seen in this video at Apple’s World Wide Developer’s Conference, starting at the 15:00 mark. Buffer bloat can be addressed very effectively by using active queue management (AQM) mechanisms, especially CoDel. Unfortunately, AQM needs to be enabled per-hop. So enabling it on your own routers is helpful, but to really solve the problem it must also be enabled at each (bottleneck) hop along the way to the user.
Additionally, it’s useful to enable explicit congestion notification (ECN), which allows routers to signal congestion in the network by setting a bit in the IP header rather than to drop a packet, allowing TCP to adjust its transmission speed more gracefully when needed. ECN has been supported in routers for a long time, but it needs to be enabled by the systems at both ends of the connection. More operating systems are doing this now, so enabling ECN on streaming servers may be helpful.