BGP Route Reflection and BGP confederations.
BGP, being the inter-autonomous system path vector routing protocol that is, uses the AS path attribute to detect loops. When an Update is received, a BGP speaking router checks the AS path attribute to see if its own AS is already listed. If it is, the routing update is discarded. Since the AS path attribute is only updated across EBGP sessions, this mechanism does not work for IBGP. Because there is no AS based loop detection possible, the default IBGP behavior is that routes learned from IBGP neighbors are not advertised to other IBGP neighbors. Because these IBGP learned routes are not advertised to other IBGP peers, a full mesh of IBGP peerings is required. Without the full mesh, some routers would miss out on routing information.
This default behavior does not scale very well.
In a network with dozens or hundreds of routers all sorts of problems can arise and a lot of unnecessary load is put on the routers.
To mitigate these problems, scaling mechanisms were developed for BGP. There are two primary scaling mechanisms;
• route reflection (RFC 4456)
• confederations (RFC 3065)
BGP Route Reflection.
The goal of route reflection is made clear by the appropriately titled RFC 4456: ‘BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)’ By eliminating the full mesh requirement, BGP becomes far more efficient and scalable.
A total of n*(n-1)/2 unique IBGP sessions are needed to build a full mesh in a network. A network with 20 routers would require 190 IGBP sessions and a network with 100 routers would require a total of 4950 IBGP sessions.
By introducing a route reflector into a network, the number of IBGP sessions required per router can easily be reduced to 2 session per router. This is assuming that you want at least 2 route reflectors since having a single route reflector is quite a bad idea.
The basic idea of route reflection is very simple. Take a look at the example below:
In this basic example, R1 sends an Update message to the route reflector across an IBGP session. Since the normal rule of not advertising IGBP learned routes across IBGP sessions does not apply to the route reflector, the route is ‘reflected’ to R2. The net result of this is that the full mesh requirement is alleviated.
Without any policy, a route reflector will not alter any attributes of a BGP route. Because the BGP next-hop attribute is only updated across EBGP sessions, a route reflector will not attract any additional traffic.
A route reflector receiving multiple routes to the same destination will start by applying the normal BGP path selection. A route reflector will only advertise or reflect the best path, just as in normal BGP operation.
For operations inside the autonomous system, the RFC states that the route reflector will operate according to the following rules:
• routes received from a client are reflected to non-client and client peers (of course, the route is not reflected back at the originator)
• a route received from a non-client peer is reflected to clients only
A route reflector exchanging prefix information across an EBGP session will act as a ‘normal’ IGBP router. This means that all EBGP learned prefixes are send across all IGBP sessions (client and non-client alike) and that a route reflector will always advertise routing information towards EBGP peers.
As mentioned before, a route reflector does not change existing IBGP attributes. It does however introduce two new BGP attributes. These attributes are the cluster and originator ID. Both of these attributes were introduced to prevent routing information from looping.
The Originator ID attribute indicates what router originated a reflected route. It is set by a route-reflector that reflects a route. A BGP speaking router should ignore routing information when the Originator ID value matches its own BGP identifier.
The Cluster-list is a sequence of cluster-ID’s that represents the reflection path. Whenever a route-reflector reflects a route, it has to prepend its own cluster-ID value to the cluster-list. If the cluster-list is empty, it must create a new one. By using this attribute, a route-reflector can detect a loop. Whenever it sees that its own cluster-ID is already present in the cluster-list, the route should be ignored by the route-reflector.
The BGP protocol does not offer a means for a BGP speaking router to identify itself as either a client or a route reflector. In the BGP implementations I know of, it is only necessary to configure the route reflector. On the client side, a normal IBGP neighbor relationship is configured.
Many, if not all, networks that use route reflection have a redundant route reflector. This could look as follows:
In this example, there are two route reflectors that have an IBGP session with all of the clients. The route reflectors can peer with each other, either as IBGP peers or as each other’s clients. In either case, a loop is prevented by the Cluster list attribute.
There is a debate on how to configure two route reflectors. It is possible to configure them with the same cluster ID or to configure both route reflectors with different cluster IDs. When the route reflectors are configured with different cluster ID’s and they are configured to reflect routes to each other, they will both carry more routing information. In general, assigning a unique cluster ID to each route reflector seems to be the more preferred design.
It is also possible to introduce a hierarchy of route reflectors:
The previous example shows a route reflector at the top. That route reflector reflects the routes of its clients, who are route reflectors themselves. It’s possible to extend this design with more levels and with redundancy on each level as well.
The second BGP scaling mechanism is the BGP Confederation. The BGP confederation uses a different approach to scaling BGP then route-reflection. Instead of relaxing the rule that states IBGP learned routing information should not be advertised to other IBGP peers, the AS is simply divided into multiple sub-ASes.
The following picture is an example of BGP confederation:
In the example, AS199 is a BGP confederation that consists of 12 routers. Instead of having all 12 routers maintain IBGP sessions with each other in a full mesh, four sub-ASes have been configured. The autonomous system numbers used for these sub-ASes can be private. In every sub-AS, there are three routers that form an IBGP full mesh. Connectivity between the sub-ASes is achieved by confederation BGP sessions (cBGP).
Even though these cBGP sessions are established between two different autonomous systems, they do differ from normal EBGP sessions. In order to prevent routing information from looping, BGP normally uses the AS-path. Even though the approach is a little different, this the case with BGP confederations as well. RFC 3065 gives 2 additional segment types for BGP to use. These two segment types are the AS_CONFED_SEQUENCE and the AS_CONFED_SET. It is possible for BGP to distinguish between the AS path segment types due to the fact that each AS path segment is a triple ( path segment type, path segment length and path segment value). The AS_CONFED_SEQUENCE and the AS_CONFED_SET simply have a different value from the two original or ‘normal’ segment types.
Whenever a cBGP session sends a routing update across an cBGP session, it will prepend its AS to the AS-path attribute and it will do so using the AS_CONFED_SEQUENCE segment type. Whenever routing information is learned across a cBGP session, and the AS-path contains the BGP speakers own AS, the route is treated the same as when an EBGP speaker sees its own AS listed; the route is ignored. This way, routing loops can be prevented.
Whenever a BGP speaker from inside of a confederation advertises routing information across an EBGP session to a neighbor that is that is not a member of the confederation, the AS-path is ‘cleaned up’ and updated in the normal way. By cleaning up, I mean the segment types related to the confederation are stripped and the confederation ID is prepended. The fact that a network is configured as a Confederation is not known to the Internet. BGP speakers in other autonomous systems have no way of telling if a network is a confederation, they view a confedaration as a single, whole AS.
The picture above is an illustration of this process. R1 will prepend its sub-AS to the AS-path attribute when sending Updates to routers in sub-AS 65001. When router R2 advertises information to another AS that is not part of the confederation, all AS information used by the confederation internally is removed and the confederation ID is prepended.
Apart from the different use of the AS-path attribute, there are a few more differences between a cBGP and an EBGP session. These differences are related to the following attributes; next-hop, MED and local preference. The differences are that the MED and the next-hop attributes are unchanged throughout the entire confederation after being learned from an EBGP peer. These attributes are not updated when they are advertised across a cBGP session. Another things is that the local-preference attribute is preserved. After it is set, anywhere, the local-preference value is propagated unchanged and throughout the entire BGP confederation.
Take the following example;
The example above is showing R3 sending a prefix to R2. This exchange of routing information across an EBGP session will make R2 update the next-hop attribute. R2 receives a MED value from R3 and following the default rules, that attribute is left intact. R2 also has an inbound policy configured. This policy will set the local-preference attribute to 200.
R2 will advertise the routing information throughout the sub-AS and eventually, the routing information will be passed over to R1. This exchange will happen across a cBGP session. When information is sent across this cBGP session, the next-hop, the MED and the local preference all remain the same.
BGP Route Reflection and BGP confederations.
The two scaling mechanism, route reflection and BGP confederation, are not mutually exclusive. You can use both at the same time. If a sub-AS contains a lot of routers, it is possible to create a route reflector cluster for that particular sub-AS. This could look something like this:
It is also possible to combine the BGP confederation with route reflection while retaining the possibility to use BGP for multiple address families.
I suppose that in the networks of most operators, there will be a hierarchy route reflector clusters. This because there is quite some pain involved in the deployment of a BGP confederation. Migrating to or from a confederation usually requires a complete BGP shutdown. With a hierarchy of route reflector cluster, any ISP network can grow to hundreds and hundreds of routers.