Internet traffic has a wide range of variability at all times. However, the fourth quarter of the year typically remains the busiest for a lot of businesses out there. That’s specifically true for content providers and e-commerce retailers as online activity usually increases during the holiday season. Failing to plan for increased load on servers and network infrastructure, while neglecting latency and packet loss issues during this time, can lead to negative consequences in the form of poor end user experience, impacting business reputation, and related skyrocketing bandwidth overages.
Let’s have a look at the computed average packet loss and latency for the US Internet transit providers including major T1 carriers (Century, Cogent, GTT, Level 3, NTT, Telia, XO, Zayo, Hurricane Electric) from December 2016 to November 2017. The presented results are based on millions of probes (ICMP, TCP Syn, UDP) performed by the Noction Intelligent Routing Platform (IRP) Lite instances throughout the period.
According to the obtained results (Graph 1), December 2016 was the month with the highest average packets loss – 3.25%. Packet loss being a network performance issue can be caused by various factors. For instance, a network device can drop traffic due to software bugs or insufficient device parameters such as CPU, RAM and NIC throughput. Faulty hardware or cabling may also contribute to packet loss. However, the most common reason for packet loss is link congestion. It occurs when the amount of traffic surpasses the link’s capacity.
Again, there might be different reasons for the high level of packet loss in December, however the increase is most likely connected with the Holiday season than with anything else. Users are turning more to streaming media or to online gaming at this time of the year. It results in traffic spikes that may deliver a “buffering” messages making customers instantly lose patience, and potentially ongoing loyalty.
Graph 1: Average Packet Loss for the US Transit Providers
Latency denotes the time period packets need to traverse a network from source to destination. Similar to packet loss, the lower latency reflects better network performance. The average latency computed for each quartal is the highest in both Q3 and October 2017 – 124ms, followed by the second highest average latency in Q2 – 123 ms. At the first look, the differences in latency values are not significant and they seem to be far away from reaching a maximum one-way latency requirement of 150ms for Gaming or VOIP calls. However they still affect the overall end user experience.
Graph 2: Average Latency for the US Transit Providers
Routing in the global Internet is based on the routing tables filled by the BGP4 routing protocol that does not take into account attributes such as packet loss or latency. Internet Service Providers (ISPs) rely on manual BGP reconfiguration in order to avoid paths through Autonomous Systems with congested links. This approach is not scalable and cannot be done in real time. Moreover, a manual method is susceptible to human errors.
Luckily there is IRP, which is designed to help Service Providers (SA) with assigned Autonomous Systems to optimize a multi-homed BGP network, improving overall network performance.
Noction IRP mitigates congestion and outages, rerouting all prefixes for affected ASs to the best performing alternate provider. Remote prefixes are probed using all available providers and hops are mapped to AS numbers. In case a congested link or outage is detected between certain hops, the affected ASs are declared as a problematic AS-pattern. Once the issue is confirmed by reprobing remote prefixes, prefixes are rerouted by IRP, omitting the affected AS-pattern. IRP also gathers network traffic via mirrored ports on ISP edge routers or using NetFlow/sFlow. The platform passively analyzes captured network traffic and calculates improvements for prefixes based on the obtained data. Improved prefixes with updated next-hops are announced to the edge router via BGP updates and subsequently inserted into the router’s routing table.
IRP reduces costs, setting bandwidth usage below predefined commit levels for each provider connection. Based on these settings, traffic can be rerouted through alternative providers, effectively avoiding higher bandwidth prices at peak times.
IRP deployment is a controlled process, carefully planned in advance, typically going through two stages. Firstly, IRP is deployed in non-intrusive mode. In this mode IRP suggests improvements for prefixes but no improvements are advertised to the edge routers. Non-intrusive mode provides real-time and historical statistics of measurements in form of reports and graphs and the suggested improvements. After running several propagation tests, IRP is then switched to the intrusive mode. This mode unlocks full functionality of IRP, advertising computed improvements to the edge routers.
Network performance issues typically represent an awful ordeal for the IT departments. They can affect and disrupt the whole company’s operations, overloading support lines with hundreds of tickets from the unsatisfied clients and keeping your phone lines ringing off the hook.
BGP multihoming helps in achieving redundancy and avoiding downtime. However, BGP itself has no ability to discover packet loss, latency, throughput, link capacity, congestion and historical reliability. Therefore, routers relying on BGP cannot address these characteristics in their routing decisions. For this reason, deployment of Noction IRP is crucial as it adds necessary intelligence to routing based on dynamic network parameters.