BGP over GRE Tunnel

BGP over GRE Tunnel

    In this blog post we are going to explain how Generic Routing Encapsulation (GRE) tunnel might be used in a situation when the Border Gateway Protocol (BGP) speaking routers are connected via the non BGP-speaking routers. We will also discuss the problems with MTU size reduction due to tunnels and the Path MTU discovery technique, as well as possible issues with recursive routes through a tunnel.

    Let’s take a look at the following basic scenario:

    Company A runs Cisco routers CE1, Core1 and CE2 and is a customer of two Internet Service Providers (ISPs) – ISP1 and ISP2. The company is assigned an Autonomous System Number (ASN) 64500 and the BGP is enabled on its routers CE1 and CE2. The router CE1 is connected to the router PE_ISP1 (ASN 64501) and the router CE2 is connected to the router PE_ISP2 (ASN 64502). In addition to the BGP routing protocol, Open Shortest Path First (OSPF) is enabled on all company routers.

    BGP over GRE tunnel

    Picture 1 – Network topology

    Let’s check the configuration of the two ISP routers.

    1. ISP Routers Configuration

    The two ISP routers contain the basic BGP configuration that is needed to establish an eBGP session with CE1 and CE2 in AS 64500. The ISPs and the customer routers peer using their loopback IP addresses. Therefore, the command update-source Loopback0 is used in BGP configuration. In addition to that, a static route on the providers’ and customers’ routers towards their eBGP peers must be configured to initiate the eBGP peer session.  

    The command disable-connected-check is required on both ISPs and customer routers. It disables the connection verification process for eBGP peering sessions that are reachable by a single hop but are configured on a loopback interface.

    The redistribute connected command does what it says. It redistributes all connected routes via the BGP protocol.

    1.1. PE-ISP1 Configuration

    router bgp 64501
    bgp router-id 1.1.1.4
    bgp log-neighbor-changes
    redistribute connected
    neighbor 1.1.1.1 remote-as 64500
    neighbor 1.1.1.1 disable-connected-check
    neighbor 1.1.1.1 update-source Loopback0ip route 1.1.1.1 255.255.255.255 2.2.2.1

    interface Loopback0
    ip address 1.1.1.4 255.255.255.255

    interface GigabitEthernet0/0
    description Link to CORE AS64500
    ip address 2.2.2.2 255.255.255.252

    interface GigabitEthernet0/1
    ip address 8.8.8.1 255.255.255.0

    1.2. PE-ISP2 Configuration

    router bgp 64502
    bgp router-id 1.1.1.5
    bgp log-neighbor-changes
    redistribute connected
    neighbor 1.1.1.2 remote-as 64500
    neighbor 1.1.1.2 disable-connected-check
    neighbor 1.1.1.2 update-source Loopback0ip route 1.1.1.2 255.255.255.255 2.2.2.13

    interface Loopback0
    ip address 1.1.1.5 255.255.255.255

    interface GigabitEthernet0/0
    description Link to CE2 AS64500
    ip address 2.2.2.14 255.255.255.252

    interface GigabitEthernet0/1
    ip address 9.9.9.1 255.255.255.0

    2. Customer Routers Configuration

    Customer’s CE1 and CE2 routers are BGP peers with the providers’ routers. In addition, the routers CE1 and CE2 are configured to establish internal BGP (iBGP) neighborship to each other. To change the next hop attribute from 1.1.1.4 to 1.1.1.1 for the BGP advertisement sent from CE1 to CE2, we must add the command neighbor 1.1.1.2 next-hop-self  to the BGP configuration of the CE1 router. If not, the router CE2 would not add the routes with the next hop 1.1.1.4 to its routing table as it does not have a route to the network 1.1.1.4/32 in its routing table. A similar command must be added under BGP configuration of the CE2 to configure CE2 router as a next hop (1.1.1.2) for advertisement received from ISP2 and sent towards CE1.

    As we have mentioned before, the OSPF protocol is enabled on routers CE1, Core and CE2. To prevent sending OSPF hello messages towards ISP routers, the command passive-interface GigabitEthernet0/0 is configured under OSPF protocol configuration.

    2.1. Router CE1 Configuration

    router bgp 64500
    bgp router-id 1.1.1.1
    bgp log-neighbor-changes
    redistribute connected
    neighbor 1.1.1.2 remote-as 64500
    neighbor 1.1.1.2 update-source Loopback0
    neighbor 1.1.1.2 next-hop-self
    neighbor 1.1.1.4 remote-as 64501
    neighbor 1.1.1.4 disable-connected-check
    neighbor 1.1.1.4 update-source Loopback0ip route 1.1.1.4 255.255.255.255 2.2.2.2

    interface Loopback0
    ip address 1.1.1.1 255.255.255.255

    interface GigabitEthernet0/0
    description Link to PE-ISP1 AS64501
    ip address 2.2.2.1 255.255.255.252

    interface GigabitEthernet1/0
    description Link to CORE
    ip address 2.2.2.6 255.255.255.252

    router ospf 1
    router-id 1.1.1.1
    passive-interface GigabitEthernet0/0
    network 1.1.1.1 0.0.0.0 area 0
    network 2.2.2.0 0.0.0.3 area 0
    network 2.2.2.4 0.0.0.3 area 0

    2.2. Router CE2 Configuration

    router bgp 64500
    bgp router-id 1.1.1.2
    bgp log-neighbor-changes
    redistribute connected
    neighbor 1.1.1.1 remote-as 64500
    neighbor 1.1.1.1 update-source Loopback0
    neighbor 1.1.1.1 next-hop-self
    neighbor 1.1.1.5 remote-as 64502
    neighbor 1.1.1.5 disable-connected-check
    neighbor 1.1.1.5 update-source Loopback0ip route 1.1.1.5 255.255.255.255 2.2.2.14

    interface Loopback0
    ip address 1.1.1.2 255.255.255.255

    interface GigabitEthernet0/0
    description Link to PE-ISP2 AS64502
    ip address 2.2.2.13 255.255.255.252

    interface GigabitEthernet2/0
    description Link to CORE
    ip address 2.2.2.10 255.255.255.252

    router ospf 1
    router-id 1.1.1.2
    passive-interface GigabitEthernet0/0
    network 1.1.1.2 0.0.0.0 area 0
    network 2.2.2.8 0.0.0.3 area 0
    network 2.2.2.12 0.0.0.3 area 0

    2.3. Core Router Configuration

    interface Loopback0
    ip address 1.1.1.3 255.255.255.255interface GigabitEthernet0/1
    description Link to CE1
    ip address 2.2.2.5 255.255.255.252

    interface GigabitEthernet0/2
    description Link to CE2
    ip address 2.2.2.9 255.255.255.252

    router ospf 1
    router-id 1.1.1.3
    network 1.1.1.3 0.0.0.0 area 0
    network 2.2.2.4 0.0.0.3 area 0
    network 2.2.2.8 0.0.0.3 area 0

    3. GRE Tunnel Configuration

    In this part we will discuss where the GRE tunnel configuration is in our scenario. Let’s check a routing table on the router PE-ISP2 first.

    PE-ISP2# show ip route | begin Gateway
    Gateway of last resort is not set1.0.0.0/32 is subnetted, 4 subnets
    B1.1.1.1 [20/0] via 1.1.1.2, 00:02:56
    S1.1.1.2 [1/0] via 2.2.2.13
    B1.1.1.4 [20/0] via 1.1.1.2, 00:02:56
    C1.1.1.5 is directly connected, Loopback0
    2.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
    B2.2.2.0/30 [20/0] via 1.1.1.2, 00:02:56
    B2.2.2.4/30 [20/0] via 1.1.1.2, 00:02:56
    B2.2.2.8/30 [20/0] via 1.1.1.2, 00:02:56
    C2.2.2.12/30 is directly connected, GigabitEthernet0/0
    L2.2.2.14/32 is directly connected, GigabitEthernet0/0
    8.0.0.0/24 is subnetted, 1 subnets
    B8.8.8.0 [20/0] via 1.1.1.2, 00:02:56
    9.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
    C9.9.9.0/24 is directly connected, GigabitEthernet0/1
    L9.9.9.1/32 is directly connected, GigabitEthernet0/1

     
    As we see the router PE-ISP2 successfully put all the presented networks into its routing table. However ping from PE-ISP2 to 8.8.8.1 does not work. When we enable debugging of ICMP packets with debug ip icmp command on the Core router we will see the following message.

    *Apr 25 15:25:30.501: ICMP: dst (8.8.8.1) host unreachable sent to 2.2.2.14

     
    The error message reveals that the Core router does not know where to send a packet destined for the IP address 8.8.8.1. It also sends ICMP message back to the PE-ISP2 router. There are several solutions to this problem. For instance, we can redistribute BGP routes into OSPF protocol on routers CE1 and CE2. Another solution would be to enable BGP on the Core router and peer it with CE1 and CE2 routers. Since we want to discuss the GRE tunnel usage in our scenario we will focus on the second one. First we are going to configure tunnel interfaces on CE1 and CE2 routers.

    Note: We do not need to put the command tunnel mode gre ip into the tunnel configuration as it is a default mode for a tunnel interface.

     

    CE1

    interface Tunnel0
    ip address 192.168.1.1 255.255.255.252
    tunnel source Loopback0
    tunnel destination 1.1.1.2
    tunnel mode gre ip

     
    CE2

    interface Tunnel0
    ip address 192.168.1.2 255.255.255.252
    tunnel source Loopback0
    tunnel destination 1.1.1.1
    tunnel mode gre ip

     
    At this point the tunnel interface is up and we should be able to ping between tunnel IP addresses. The next step is the reconfiguration of CE1 and CE2. We will configure them  to set a particular tunnel IP address as a next-hop path attribute for Network Layer Reachability Information (NLRI) received in BGP update messages. We will create a route-map on CE1 router that changes the next-hop IP address 1.1.1.1 to 192.168.1.1 for updates sent towards the router CE2. Afterwards we will apply the route map for the iBGP neighbor 1.1.1.2.

    CE1

    route-map nexthop_tun_out permit 10
    set ip next-hop 192.168.1.1router bgp 64500
    neighbor 1.1.1.2 route-map nexthop_tun_out out

     
    A brief look at the BGP table for route 8.8.8.0 on the CE2 router proves that a tunnel interface IP 192.168.1.1 is now used as the next hop for the route 8.8.8.0/24.

    CE2# show ip bgp | include 8.8.8.0
    *>i 8.8.8.0/24 192.168.1.1 0 100 0 64501 ?

     
    We will do the same in the opposite direction on the CE2 router.

    CE2

    route-map nexthop_tun_out permit 10
    set ip next-hop 192.168.1.2router bgp 64500
    neighbor 1.1.1.1 route-map nexthop_tun_out out

     
    Let’s check the next-hop for the route 9.9.9.0/24 on the CE1 router.

    CE1# show ip bgp | include 9.9.9.0
    *>i 9.9.9.0/24 192.168.1.2 0 100 0 64502 ?

     
    Now we should be able to ping from ISP2 to ISP1 and vice versa.

    PE-ISP2# ping 8.8.8.1
    Type escape sequence to abort.
    Send!!!!!
    Success rate is 100 percent (5/5), round-trip min/avg/max = 2/2/3 ms

     
    And the output from the traceroute command follows.

    PE-ISP2# traceroute 8.8.8.1
    Type escape sequence to abort.
    Tracing the route to 8.8.8.1
    VRF info: (vrf in name/id, vrf out name/id)
    1 2.2.2.13 1 msec 1 msec 1 msec
    2 192.168.1.1 [AS 64500] 3 msec 2 msec 2 msec
    3 2.2.2.2 [AS 64500] 3 msec 3 msec *
    4. GRE Tunnel and MTU Issues

    The maximum length of an IP packet can be 65 535 bytes. It includes 20 Bytes of IP header plus 65515 Bytes of payload. However the smaller size of IP packet is normally used on links depending on the type of layer 2 protocol. This length is set by the Maximum Transmission Unit (MTU) value. In other words, the MTU says how much data the sending host can put into a packet.

    MTU for Ethernet Link is 1500 Bytes, not including the Ethernet header and trailer. It includes 20 Bytes for TCP header, 20 Bytes for IP header and there are 1460 Bytes left for payload. Once a tunnel interface is created, MTU for the tunnel interface is automatically set to the 1476 Bytes. It is set to this value because the GRE tunnel overhead is 24 Bytes.

    If the packet with MTU more than 1476 Bytes is received, a router needs to fragment this packet into small chunks. Those chunks have MTU set to 1476 Bytes and they are sent with this MTU over the GRE tunnel. The downside of fragmentation is obvious – a router has to generate new headers and it takes its resources (CPU and memory) plus additional requirements for bandwidth. On the other side of the GRE tunnel, router reassemble chunks into an original full size packet.

    The configured MTU on the path from point A to point B can be quickly tested with the ping command with no fragmentation set and using selected MTU.  

    PE-ISP1# ping 9.9.9.1 df-bit size 1477

    *Apr 25 21:38:39.926: ICMP: dst (2.2.2.2) frag. needed and DF set unreachable rcv from 2.2.2.1 mtu:1476.M

     
    The ICMP message received on PE-ISP1 says that a router with IP address 2.2.2.1 (CE1) has the MTU set to 1476 Bytes on one of its links. However we strictly disallow fragmentation so the ping command is not successful.

    The question appears: Why a router bothers to send an ICMP message back to the sending host?

    In case DF (Don’t fragment) flag is set to 1, a router is not allowed to fragment a packet. If the packet exceeds 1476 Bytes, ICMP message type 3 Destination unreachable and code 4 Fragmentation needed is sent to the sending host, containing the information about MTU of next hop 1476 Bytes. The sending host learns this information and sets MTU to 1476 Bytes. This mechanism is called Path MTU Discovery (PMTUD) and is described in RFC 1191. PMTUD is supported by TCP and UDP protocols and it is commonly enabled on the hosts with DF bit set to 1. Thanks to PMTUD a sending host sets the correct MTU (the lowest on a link) and the packets do not get fragmented.
    BGP in Large Networks eBook

    The problem occurs when ICMP messages are blocked somewhere on the path by an access-list. A router with configured lower MTU than the one  of a sent IP packet sends ICMP Destination Unreachable, Fragmentation needed message back to the sending host. However the sending host cannot receive it and learn the correct MTU. The sending host keeps using DF set to 1 in IP headers of the sent packets.  As a result, the router with the configured lower MTU on the link represents an MTU bottleneck on the path. Since it is not allowed to fragment packets, the connection fails.

    5. Recursive Routing

    Recursive routing certainly represents an unwanted issue. It occurs when a router tries to route to the tunnel destination address using the tunnel interface itself. Once a router recognizes a recursive routing, it willingly shuts down a tunnel interface for a certain period of time. Afterwards the tunnel interface is brought back up by the router and the process happens again. As a result, the tunnel interface OSPF and BGP routes are flapping.

    In order to simulate recursive routing, we will enable OSPF protocol on tunnel interfaces of CE1 and CE2 routers.

    router ospf 1
    network 192.168.1.0 0.0.0.3 area 0

     
    Now we will check OSPF neighbors.

    CE1# show ip ospf neighbor

    Neighbor ID     Pri   State           Dead Time   Address         Interface
    1.1.1.2           0   FULL/  –        00:00:34    192.168.1.2     Tunnel0
    1.1.1.3           1   FULL/DR         00:00:38    2.2.2.5         GigabitEthernet1/0

     

    CE2# show ip ospf neighbor

    Neighbor ID     Pri   State           Dead Time   Address         Interface
    1.1.1.1           0   FULL/  –        00:00:36    192.168.1.1     Tunnel0
    1.1.1.3           1   FULL/DR         00:00:32    2.2.2.9         GigabitEthernet2/0

     
    There are two OSPF neighbors presented in the command output. The first OSPF adjacency is formed over the tunnel interface and the second one is formed over Gigabit Ethernet interface. So far, the routing table of the CE1 router does not contain any recursive routes. All the OSPF routes are learned over the interface GigabitEthernet1/0.

    CE1# show ip route | begin Gateway
    Gateway of last resort is not set1.0.0.0/32 is subnetted, 5 subnets
    C1.1.1.1 is directly connected, Loopback0
    O1.1.1.2 [110/3] via 2.2.2.5, 02:38:43, GigabitEthernet1/0
    O1.1.1.3 [110/2] via 2.2.2.5, 04:21:21, GigabitEthernet1/0
    S1.1.1.4 [1/0] via 2.2.2.2
    B1.1.1.5 [200/0] via 192.168.1.2, 02:37:44
    2.0.0.0/8 is variably subnetted, 6 subnets, 2 masks
    C2.2.2.0/30 is directly connected, GigabitEthernet0/0
    L2.2.2.1/32 is directly connected, GigabitEthernet0/0
    C2.2.2.4/30 is directly connected, GigabitEthernet1/0
    L2.2.2.6/32 is directly connected, GigabitEthernet1/0
    O2.2.2.8/30 [110/2] via 2.2.2.5, 04:21:21, GigabitEthernet1/0
    O2.2.2.12/30 [110/3] via 2.2.2.5, 02:38:43, GigabitEthernet1/0
    8.0.0.0/24 is subnetted, 1 subnets
    B8.8.8.0 [20/0] via 1.1.1.4, 04:21:36
    9.0.0.0/24 is subnetted, 1 subnets
    B9.9.9.0 [200/0] via 192.168.1.2, 02:37:44
    192.168.1.0/24 is variably subnetted, 2 subnets, 2 masks
    C192.168.1.0/30 is directly connected, Tunnel0
    L192.168.1.1/32 is directly connected, Tunnel0

     
    Let’s investigate the OSPF cost for the tunnel and Gigabit Ethernet interfaces on the CE1 router

    CE1# show ip ospf interface tunnel 0 | incl Cost
    Process ID 1, Router ID 1.1.1.1, Network Type POINT_TO_POINT, Cost: 1000
    Topology-MTID Cost Disabled Shutdown Topology Name

     

    CE1# show ip ospf interface gi1/0 | incl Cost
    Process ID 1, Router ID 1.1.1.1, Network Type BROADCAST, Cost: 1
    Topology-MTID Cost Disabled Shutdown Topology Name

     
    The Gigabit Ethernet interface is preferred over tunnel interface by OSPF protocol since its cost is 1 compared to the cost 1000, generated for the tunnel interface. To simulate recursive routing we need to modify cost of the tunnel interface to 1 on both CE routers.

    interface tunnel 0
    ip ospf cost 1

     
    After a while we can see the following log messages.

    *May  1 18:44:51.911: %ADJ-5-PARENT: Midchain parent maintenance for IP midchain out of Tunnel0 – looped chain attempting to stack

    *May  1 18:44:59.095: %TUN-5-RECURDOWN: Tunnel0 temporarily disabled due to recursive routing

    *May  1 18:44:59.095: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel0, changed state to down

     
    From this point the tunnel interface and routing tables of routers CE1 and CE2 are flapping indefinitely. Some routes in the routing table of CE1 are now available through the tunnel interface.

    CE1# show ip route | begin Gateway
    Gateway of last resort is not set1.0.0.0/32 is subnetted, 5 subnets
    C1.1.1.1 is directly connected, Loopback0
    O1.1.1.2 [110/2] via 192.168.1.2, 00:00:08, Tunnel0
    O1.1.1.3 [110/2] via 2.2.2.5, 04:32:36, GigabitEthernet1/0
    S1.1.1.4 [1/0] via 2.2.2.2
    B1.1.1.5 [200/0] via 192.168.1.2, 02:48:59
    2.0.0.0/8 is variably subnetted, 6 subnets, 2 masks
    C2.2.2.0/30 is directly connected, GigabitEthernet0/0
    L2.2.2.1/32 is directly connected, GigabitEthernet0/0
    C2.2.2.4/30 is directly connected, GigabitEthernet1/0
    L2.2.2.6/32 is directly connected, GigabitEthernet1/0
    O2.2.2.8/30 [110/2] via 192.168.1.2, 00:00:08, Tunnel0
    [110/2] via 2.2.2.5, 04:32:36, GigabitEthernet1/0
    O2.2.2.12/30 [110/2] via 192.168.1.2, 00:00:08, Tunnel0
    8.0.0.0/24 is subnetted, 1 subnets
    B8.8.8.0 [20/0] via 1.1.1.4, 04:32:51
    9.0.0.0/24 is subnetted, 1 subnets
    B9.9.9.0 [200/0] via 192.168.1.2, 02:48:59
    192.168.1.0/30 is subnetted, 1 subnets
    B192.168.1.0 [200/0] via 192.168.1.2, 00:00:03

     
    What does exactly happen? We will explain it in the following example.

    Previously, we enabled OSPF protocol on the tunnel interface 192.168.1.1 of the router CE1. Let’s say that the CE1 sends OSPF hello message with a source IP address 192.168.1.1 to the OSPF multicast IP address 224.0.0.5. The tunnel interface adds GRE header to the packet and the outer IP header with the source IP address 1.1.1.1 and the destination IP address 1.1.1.2.  The route 1.1.1.2/32  is looked up in the routing table of CE1. However the route was learned via tunnel interface. For this reason, the GRE header and outer IP header would be added again to the previously encapsulated GRE packet. The process would repeat again and again in case the CE1 router did not bring the tunnel interface down beforehand.   

    To avoid recursive routing we will increase the ospf cost for tunnel interface back to the 1000 value by configuring the command no ip ospf cost 1 on CE1 and CE2 routers. It would prevent routers to use routes learned via the tunnel interface.  


    Here at Noction, BGP over GRE might be implemented during the deployment of our Intelligent Routing Platform (IRP) within certain customer’s networks. In specific complex scenarios, traffic from the IRP server should pass multiple routers before getting to the provider. If a separate probing Vlan cannot be configured across all routers, GRE tunnels from IRP to the Edge routers become a solution.

    Boost BGP Preformance

    Automate BGP Routing optimization with Noction IRP

    NO COMMENTS

    Leave a Reply