Overview #
IRP offers failover capabilities that ensure Improvements are preserved in case of planned or unplanned downtime of IRP server.
IRP’s failover feature uses a master-slave configuration. A second instance of IRP needs to be deployed in order to enable failover features. For details about failover configuration and troubleshooting refer Failover Configuration.
 A failover license is required for the second node. Check with Noction’s sales team for details.
A failover license is required for the second node. Check with Noction’s sales team for details.
- slave node running same version of IRP as the master node,
- MySQL Multi-Master replication of ‘irp’ database,
- announcement of the replicated improvements with different LocalPref and/or communities by both nodes,
- monitoring by slave node of BGP announcements originating from master node based on higher precedence of master’s announced prefixes,
- activating/deactivating of slave IRP components in case of failure or resumed work by master,
- syncing master configuration to slave node.
 For exact details about IRP failover solution refer to configuration guides (Failover Configuration, Setup Failover wizard), template files, and (if available) working IRP configurations. For example, some ‘irp’ database tables are not replicated, ‘mysql’ system database is replicated too, some IRP components are stopped.
For exact details about IRP failover solution refer to configuration guides (Failover Configuration, Setup Failover wizard), template files, and (if available) working IRP configurations. For example, some ‘irp’ database tables are not replicated, ‘mysql’ system database is replicated too, some IRP components are stopped. IRP versions 3.5 and earlier do no offer failover capabilities for Inbound improvements. It is advised that in these versions only one of the IRP instances is configured to perform inbound optimization in order to avoid contradictory decisions. In case of a failure of this instance inbound improvements are withdrawn.
 IRP versions 3.5 and earlier do no offer failover capabilities for Inbound improvements. It is advised that in these versions only one of the IRP instances is configured to perform inbound optimization in order to avoid contradictory decisions. In case of a failure of this instance inbound improvements are withdrawn.
- two IRP nodes – Master and Slave,
- grayed-out components are in stand-by mode – services are stopped or operating in limited ways. For example, the Frontend detects that it runs on the slave node and prohibits any changes to configuration while still offering access to reports, graphs or dashboards.
- configuration changes are pushed by master to slave during synchronization. SSH is used to connect to the slave.
- MySQL Multi-Master replication is setup for ‘irp’ database between master and slave nodes. Existing MySQL Multi-Master replication functionality is used.
- master IRP node is fully functional and collects statistics, queues for probing, probes and eventually makes Improvements. All the intermediate and final results are stored in MySQL and due to replication will make it into slave’s database as well.
- Bgpd works on both master and slave IRP nodes. They make the same announcements with different LocalPref/communities.
- Bgpd on slave node monitors the number of master announcements from the router (master announcements have higher priority than slave’s)
- Timers are used to prevent flapping of failover-failback.
Requirements #
- second server to install the slave,
- MySQL Multi-Master replication for the irp database.
 MySQL replication is not configured by default. Configuration of MySQL Multi-Master replication is a mandatory requirement for a failover IRP configuration. Failover setup, and specifically MySQL Multi-Master replication should follow a provided failover script. Only a subset of tables in irp database are replicated. Replication requires extra storage space, depending on the overall traffic and platform activity, for replication logs on both failover nodes.
 MySQL replication is not configured by default. Configuration of MySQL Multi-Master replication is a mandatory requirement for a failover IRP configuration. Failover setup, and specifically MySQL Multi-Master replication should follow a provided failover script. Only a subset of tables in irp database are replicated. Replication requires extra storage space, depending on the overall traffic and platform activity, for replication logs on both failover nodes.- a second set of BGP sessions will be established,
- a second set of PBR IP addresses are required to assign to the slave node in order to perform probing,
- a second set of improvements will be announced to the router,
- a failover license for the slave node,
- Key-based SSH authentication from master to slave is required. It is used to synchronize IRP configuration from master to slave,
- MySQL Multi-Master replication of ‘irp’ database,
- IRP setup in Intrusive mode on master node.
 In case IRP failover is setup in a multiple Routing Domain configuration and IRP instances are hosted by different RDs this must be specified in IRP configuration too. Refer Optimization for Multiple Routing Domains, global.master_rd, global.slave_rd.
 In case IRP failover is setup in a multiple Routing Domain configuration and IRP instances are hosted by different RDs this must be specified in IRP configuration too. Refer Optimization for Multiple Routing Domains, global.master_rd, global.slave_rd.
Failover #
 In order for this mechanism to work IRP needs to operate in Intrusive mode and master’s node announcements must have higher priority then the slave’s.
 In order for this mechanism to work IRP needs to operate in Intrusive mode and master’s node announcements must have higher priority then the slave’s.- master synchronizes its configuration to slave. This uses a SSH channel to sync configuration files from master to slave and process necessary services restart.
- MySQL Multi-Master replication is configured on relevant irp database tables so that the data is available immediately in case of emergency,
- components of IRP such as Core, Explorer, Irppushd are stopped or standing by on slave to prevent split-brain or duplicate probing and notifications,
- slave node runs Bgpd and makes exactly the same announcements with a lower BGP LocalPref and/or other communities thus replicating Improvements too.
 It is imperative that master’s LocalPref value is greater than slave’s value. This ensures that master’s announcements are preferred and enables slave to also observe them as part of monitoring.
It is imperative that master’s LocalPref value is greater than slave’s value. This ensures that master’s announcements are preferred and enables slave to also observe them as part of monitoring. Slave node only considers that master is down and takes over only if master’s Improvements are withdrawn from all edge routers in case of networks with multiple edge routers.
Slave node only considers that master is down and takes over only if master’s Improvements are withdrawn from all edge routers in case of networks with multiple edge routers. This is true only if LocalPref and/or communities assigned to slave node are preferred. If other most preferable announcements are sent by other network elements , no longer announcements from slave node will be best. This defeats the purpose of using IRP failover.
This is true only if LocalPref and/or communities assigned to slave node are preferred. If other most preferable announcements are sent by other network elements , no longer announcements from slave node will be best. This defeats the purpose of using IRP failover.
Failback #
 During failback it is recommended that both IRP nodes are monitored by network administrators to confirm the system is stable.
During failback it is recommended that both IRP nodes are monitored by network administrators to confirm the system is stable.
Recovery of failed node #
 Recovery speed is constrained by restoring replication of MySQL databases. On 1Gbps non-congested links replication for a full day of downtime takes approximately 30-45 minutes with 200-250Mbps network bandwidth utilization between the two IRP nodes. During this time the operational node continues running IRP services too.
Recovery speed is constrained by restoring replication of MySQL databases. On 1Gbps non-congested links replication for a full day of downtime takes approximately 30-45 minutes with 200-250Mbps network bandwidth utilization between the two IRP nodes. During this time the operational node continues running IRP services too.
Upgrades #
 It is imperative that master and slave nodes are not upgraded at the same time. Update one node first, give the system some time to stabilize and only after that update the second
It is imperative that master and slave nodes are not upgraded at the same time. Update one node first, give the system some time to stabilize and only after that update the second






