When adding a new ACI Fabric as a DCI Fabric, often times the question comes up whether the new fabric should be added as a MultiPod Fabric or a MultiSite Fabric.
Though MultiPod is still a very valid way to add a new fabric, MultiSite has many benefits. To name a few:
- Totally Separate Fabrics, so totally Separate Fault Domains
- No need to worry about APIC Placement
- Large Distances, compared to MultiPod
However, pre 4.2 release, some customers were forced to implement MultiPod instead of MultiSite for the following reasons:
- Needed Transit Routing between L3Outs of different Sites
- Needed to backup L3Out of Site by L3Outs of different Sites
- vPOD does not work with MultiSite (will work in lab as long as Multisite is configured first. However any modifications at POD level or bringing up new pod or deleting existing pod will fail)
- Needed a Fabric Integrated stretched VMM domain
The vPOD and the Fabric Integrated stretched VMM domain case is still valid for not going to ACI MultiSite and going to ACI MultiPod instead. The VMM domain case is really not a showstopper, because from vSphere 6.0 you can still do vMotion between different VMM Domains. Please see: “A basic question about VMM domains, Multipod and Multisite“
As for the first 2 items, “Transit Routing between L3 Outs” and “L3Out of different Sites backing up each other”, that is now possible to do from ACI 4.2 onwards using MSO 2.2 and higher
From my observation, the main reason for people not being able to go to MultiSite earlier than ACI 4.2 release is because they had a requirement to bring in the IP based IBM Mainframe/FEP connectivity into ACI. These IBM Mainframe/FEPs generally talk IP with OSPF and these Mainframes needed to talk to each other between different Sites. In other words Transit Routing between different Fabrics / Sites was a hard requirement.
How To configure InterSite L3Out with ACI Fabric
The configuration is really straight forward and just involves a few steps.
- Configure Routable TEP Pools for the Fabrics that will have the Shared L3Outs. In case of Transit L3Out requirement that would mean configure Routable TEP Pool on all the Fabrics that have that requirement.
- Use MSO to configure the objects and apply appropriate contracts
Below I will show a Quick Example on how to do the configuration and show how it works under the hood. I will also show how to enable HBR (Host Based Routing), to achieve ingress path optimization, which is often a requirement for DCI Fabrics
Step 1: Configure the Remote TEP Pools as needed. In my example, I will only do transit routing between 2 Fabrics, so, I just put the the Routable TEP Pools in those 2 Fabrics. Note that Site 1 also happens to be a MultiPod Fabric with 2 pods, so, I would need to put a separate Routable TEP Pool for each of the PODs. These configurations are done from the Site/Configure Infra level of MSO.
Now all you have to do is go to your MSO and configure the rquired Schema/Template objects. In the example below, we configure the Schema/Templates based on the diagram below. I’m not going to show you step by step config in the MSO, because by now everyone is very familiar with that.
Main Things to Note in the diagram above:
- VRF is stretched
- Please pay attention to the L3Out External EPG Prefix Scopes. For an explanation please read “Understanding Scope of Prefixes in L3 Out External EPG in ACI“
- In the Transit Only Case, no BD/EPG is needed
- In my setup, I have L3Out of Site 1 going to a N3K router in a VRF where there I put an IP address 188.8.131.52/24 and Site 2 L3Out goes to the Same N3K to another VRF. This VRF has a IP of 184.108.40.206/24. I’m doing that for testing, since I don’t have access to a IBM Mainframe/FEP
That’s it !
Let’s test from N3K of Site 1 peering router.
How Does it work under the Cover?
Let’s look at Site 1’s Border Leaf, you will notice that the next hop for 220.127.116.11/24 of Site 2 is reachable through the Next Hop of 10.6.0.232 which happens to be the Routable TEP IP assigned to Site 2’s border leaf. When you configured the Routable TEP Pool for Site 2, a IP from that pool was assigned to the Border Leaf of Site 2 automatically.
Now, looking from Spine of Site 1, we observe that BGP AF for VPNv4 peers have been formed with the Spines of Site 2. Recall that before this feature all AF neighbors with neighboring spines were bgp l2vpn evpn peers. Now in addition to that the VPNv4 bgp peers are also formed
We also observe that BGP address family VPNv4 of Spine in Site 1 shows that the next hop to reach 18.104.22.168/24 is 10.6.0.232 which is the Routable TEP IP of Border Leaf in Site 2
Looking from Spine of Site 2 we notice that 22.214.171.124/24 is showing the Next Hop of 10.21.120.64 and not the Routable TEP IP of 10.6.0.232. 10.21.120.64 happens to be the TEP IP of Border Leaf of Site 2.
So, how does Spine of Site 1 think that the next hop for 126.96.36.199/24 is the Routable TEP IP of BL of Site 2 but spine of Site 2 knows that it’s the actual TEP IP of BL in Site 2 ?
The answer ofcourse turns out to be is that when Spine of Site 2 is using a route-map when advertising VPNv4 Prefixes to Spine of Site 1 as you can see below.
The last thing, let’s verify that the BL of Site 2 does indeed have both the Internal TEP IP and the Routable TEP IP, and it does as you can see below.
Deploying HBR (Host Based Routing).
Ingress path optimization is most often a requirement for DCI Fabrics. In the past this could get quite complicated and needed some complicated technologies to do this, like deploying LISP. With ACI you can do HBR with just one click !
Egress Path Optimization is rather simple and can be done on the external peering router or on ACI border leaf, for example for OSPF change link cost and use a route-map to change OSPF external type 2 to external type 1.
For our example let’s use the same topology and create the BD/EPG and do a VMM binding on Site 2. Then bring up a endpoint in Site 2 and IP it to 10.10.10.100/24.
Deploying Host Based Routing is just a one click operation. On MSO, go to the Site Local Config and for the BD turn it on for the desired sites as shown below
In my case, I turned on HBR on the site local for BD on both Site 1 and Site 2. Since I have a host of 10.10.10.100 in Site 2, I can see:
- on the N3K peering side of Site2 I have 10.10.10.100/32 show up in addition to 10.10.10.0/24
- on N3K peering side of Site 1 I have no hosts, so I only see 10.10.10.0/24 show up there in the routing table
Note: By just turning on HBR:
- The Aggregate Route shows up in the peering router
- The Host Route shows up in the peering router
There have been times where customers wanted to just see the HBR advertisement without the aggregate route for some particular scenarios.
This can be easily achieved by deploying simple route map to the L3Out.
In this example I have the aggregate and the Host route show up in the peering router of Site 2 as you can see in Figure 14 above. Let’s say, I did not want to see the aggregate route of 10.10.10.0/24 but only wanted to see the 10.10.10.100/32 host route.
To do this, we first create 2 route map Match Rules (prefix-list) in the Tenant space from APIC.
- prefix-list: match10: match 10.10.10.0/24
- prefix-list : permitAll: match 0.0.0.0/0 and Aggregate = True
Next, Apply default-export Route Map with those 2 prefix-list matches to the L3Out as shown below:
- route-map Deny10.10.10.0 matching prefix-list match10 with action of deny, sequence order 0
- route-map permitAll matching prefix-list permitAll with action of permit, sequence order 1
Let’s check the Routing table on N3K side of Site 2 BL. You will notice that we are now just seeing the route of 10.10.10.100/32. 10.10.10.0/24 is no longer showing !