Nowadays, the Internet is made-up of more than 45,000 active Autonomous Systems (ASes), each with a different complexity level and specific configurations. To accomplish efficient traffic flow, a packet must traverse reliable paths through several network nodes, before it reaches its final destination and since these are increasing with the size of the Internet, more network-level issues are likely to happen, causing traffic to flow through unreliable paths. As a result, end users may experience severe degradation of network performance and this might lead to consequences like SLA penalties or customer churn.
Network performance anomalies of any kind can disrupt the efficiency of end-to-end data transmission or cause its degradation. Despite the fact that the very design of the Internet is aimed to be self-repairing, end users may run into unreachable websites or experience noticeable speed drops in service delivery.
Congestion, one of the most common network performance anomalies, occurs when the amount of traffic surpasses the link’s capacity. Congestion often originates from worm propagations, routing instabilities and DDoS attacks. Congested links are usually causing queueing delay and packet loss, causing failure of communication protocols. For instance, TCP throughput decreases when the loss rates increase, leading to several packets retransmits. On a congested link these can lead to a greater loss rate, up to 30%, making TCP unusable, since it spends most time in timeouts.
Routing instability, another well-known network performance anomaly, is mostly defined as the rapid change of network reachability and topology information, having various origins such as human error (router misconfigurations), or router software bugs, or transient physical and data link problems.
Routing instabilities may frequently occur due to human error, during adaptation of the routing protocols to the changes in the network’s policies or topology. Within an AS, link outages may occur due a hardware failure, maintenance or power cut. Routing protocols try to fix the link outage problem by diverting the traffic through other available paths. This kind of rerouting, as well as traffic engineering within the network, results in route changes. AS-level outages are often caused by eBGP session reset or peering link failure leading to inter-domain route changes, which can also arise due to modification of the policies that are eventually incorporated by BGP into the best path selection process.
Routing instabilities may also be a reason for the routing loops’ emergence. In case of routing instability, routing loops occur due to inconsistency of the routing states in different routers, when routers within the same AS exchange their latest reachability data or communicate it to other ASes via routing updates. The duration of the exchange process in different ASes that leads to acquisition of a consistent view on the network topology, may fluctuate from hundreds of milliseconds for IGPs to tens of minutes for BGP routers.
Misconfigurations caused by human error, are often the reason of routing instabilities that occur due to the intricacy of routing protocols. Studies show that almost 6% of BGP updates are inconsistent and unable to show topological changes in network. 70% of the miss-advertised prefixes occur due to BGP misconfigurations causing routing anomalies such as: invalid routes, persistent oscillations, routing loops, and SLA Penalties.
Since Routing Anomalies represent challenging tasks nowadays, several solutions are implemented to help managing a network in the best way. Some of them require partial user interaction, however, the most efficient ones are those detecting and fixing routing anomalies autonomously. This is because those systems do not involve direct user interaction, avoiding miss-configuration at a human-factor level and contributing at making the Internet self-repairing.