JNCIS-SP: High Availability.

Junos OS offers:
- Graceful Restart (GR)
- Graceful Routing Engine Switchover (GRES)
- Nonstop Routing (NSR)
- Unified In Service Software Upgrade (ISSU): GRES & NSR must be enabled


Allows uninterrupted packet forwarding and temporary suppression of routing updates when undergoing a restart event of rpd. GR router informs neighbor of restart event requesting a grace period. The restarting GR router can forward traffic during restart event, convergence in the network is not disrupted. Helper routers hide restart event from not-directly connected routers (GR-restarting router POV).
GR can be enabled for restarting and helper mode. By default, Junos can operate as helper router.
GR helper mode is enabled by default on all routers, can be disabled for all protocols using;

set routing-options graceful-restart disable;

GR can also be enabled/disabled per protocol.


During RE mastership change without GRES;
PFE restarts, all hardware and interfaces discovered by new RE. New RE restarts rpd, all adjacencies are aware.

During RE mastership change with GRES;
PFE is not restarted, interface and kernel information is preserved. New RE restarts rpd, adjacencies are aware.

To preserve routing information during switchover, use GRES with GR or NSR.

GRES switchover operation;
1. after GRES is configured, REs synch and keepalives are exchanged.
2. if backup RE does not get keepalive (typically 2 s.), it determines that the master has failed
3. PFE disconnects from old master RE, and connects to new RE; PFE remains operational
4. new RE and PFE sync. Master RE starts sending updates to PFE as needed

To verify GRES, use ‘show system switchover’ (issued from backup RE).

3 states must be replicated for GRES to function properly;
- configuration database
- kernel and related entries
- PFE state


Uses GRES and in addition, rpd runs on backup RE.
NSR is self-contained and does not rely on helpers. NSR & GR cannot be configured together.

To configure NSR, enable GRES and;

set routing-options nonstop-routing

To verify:

show task replication


BFD can rapidly detect failures through simple hellos. It provides a single method and relieves other protocols (which timers can even be increased).
BFD supports OSPF, IS-IS, RIP, BGP, RSVP, PIM & static routes.
BFD is configured under the protocols. You can set the multiplier and transmit interval. To monitor, use;
'show bfd session' or 'show bgp neighbor'.


VRRP is defined in RFC 2338, multicasts to
VRRP-router: any router participating in VRRP
Master router: performs packet forwarding and responds to ARP.
Backup router: available to assume master role
Virtual router: entity that functions as default router. Consists of virtual router-ID & VIP.

Virtual MAC: 00-00-5E-00-01-VRID
Highest prio is master. Default prio is 100. Preemption is enabled by default.

- Initialize: router negotiates VRRP roles, no forwarding
- Master
- Backup
- Transition: router is switching states, no forwarding is performed

set interfaces ge-0/0/4 unit 0 family inet address vrrp-group 10 virtual-address
set interfaces ge-0/0/4 unit 0 family inet address vrrp-group 10 priority 200

- track: track interface or route
- accept-data: make VIP respond to ICMP
- authentication-type/key
- no-preempt

'vrrp-inherit-from' can be used to simplify config and reduce vrrp-traffic by also inheriting state.


To perform ISSU:
1. enable GRES and NSR
2. verify master and backup are running the same software
3. download new software, copy to router
4. 'request system software in-service-upgrade'

Link Aggregation Group (LAG) 802.3AD.

- increases bandiwdth
- provides link efficiency (traffic is balanced accross links)
- created physical layer redundancy

Interface requirements;
- duplex and speed must match
- up to 8 members per LAG
- ports needn't be contiguous, in MC-LAG switches may differ

Processing and forwarding considerations;
- RE generated traffic traverses lowest member link
- IP traffic hashing uses layer 2, 3 and 4 details
- non-IP traffic hashing uses source & destination MAC

In Junos, using 802.3ad on an MX will cause for traffic to be balanced across the member links of an AE-interface. This load-balancing is based on the Layer 3 information carried in the packet.

- performs link monitoring
- controls member links that form a single logical channel
- LACP mode active: initiates transmission of LACP
- LACP mode passive: only responds to LACP

Junos OS provides LACP-link monitoring but not automatic addition and deletion of links.


Enables LAG between two or more devices. MC-LAG offers node level redundancy and multihoming support.
MC-LAG utilizes ICCP to exchange control messages between two network devices.

Implementing LAG:

set chassis aggregate-devices ethernet device-count 2
set interfaces ae unit 0 family bridge
set interfaces ae aggregate-ether-options lacp active
(lacp can be fast (1s.) or slow (30s.). Default is fast)
set interfaces ge-0/0/8 gigether-options 802.3ad ae0
set interfaces ge-0/0/3 gigether-options 802.3ad ae0


- provides sub-50ms, loop-free protection to an Ethernetwork
- must be in a ring topology
- can replace STP
- use CFM on copper links
- defined in ITU-T G.8032

RPL: Ring Protection Link
APS: Automatic Protection Switching

picture will follow.

RPL-owner node:
- controls state of RPL
- initiates Ring-APS messages

Other nodes have no special role. Listen to & forward APS messages. Generate R-APS when a local-link fails.

APS coordinates protection actions through dedicated VLAN. Uses the CFM frame format. APS uses destination MAC 01-19-A7-00-00-01 and opcode 40.

R-APS has the following fields:
- request state: 1011-signal fail, 0000-no request
- RPL-bloacked: RPL-owner only: 1-RPL is blocked, 0-RPL is unblocked
- do not flush bit
- Node-ID: MAC address of node (informational only)

ERP idle state.

pic will follow. - RPL-owner (1) sends R-APS messages out all ports every port every 5 seconds
- Other switches flush MAC-table once (upon first R-APS receipt) while unblocking ring ports

ERP signal failure

pic will follow.

- occurs when failure is detected on unblocked link
- node 2+3:
    • wait for hold-time expiration
    • switch from idle to protect state
    • block failed port and flush MAC-table
    • send 3 R_APS messages first 10ms followed by 1 every 5 seconds until failed condition clears
- all switches except 2+3:
    • switch from idle state to protection state
    • flush MAC-table and stop sending R-APS message
- RPL-owner;
    • unblocks RPL
    • listens for R-APS from node 2+3

Upon restoration node 2+3;
- send out new R-APS messages, telling other nodes the failure is no longer present and that they should flush MAC
- port remains blocked until 1 blocks RPL and signals everyone to go to IDLE state