Questions:
Q1: In a MultiPod Scenario, I have 2 L3 Outs with OSPF for the same Tenant/VRF. Why does a packet going to an external prefix always prefer the locally connected L3 Out from a POD ?
Q2: How Can I influence particular prefixes to go through the L3Out of another POD ?

Answer to Q1:
On POD2, Let’s look at prefix 41.41.41.41/32 that is learnt from outside in VRF CDC
You will notice that we have 3 paths for the prefix 41.41.41.41/32.
10.1.240.34 is the border leaf on POD2
10.0.200.67 is Border Leaf1 on POD1 (ospf peering on pod1 is done on VPC)
10.0.200.64 is Border Leaf2 on POD2 (ospf peering on pod1 is done on VPC)
So, you see that the preferred route for 41.41.41.41/32 is on the local pod (POD2) L3Out

Let’s check at who the next hops are:

Let’s check cost to next hop:

Notice IGP cost is 3

Now, let’s look at the Best Path Selection:

So, you see what’s happening is that it goes through BGP Selection Algorithm and matches on item # 8.
- Weight: Not Set = 0
- Local Pref: 100 on same
- Aggregate: Not Applicable
- Shortest AS Path: Not Applicable, both learnt from OSPF on border router
- Lowest Origin type: ? = incomplete/Unknown
- Lowest MED: Not Set, default = 104
- Admin Distance, eBGP (20) over iBGP(200): Both iBGP
- Lowest IGP metric: 3 Vs. 64, 3 is chosen
Q2: What if you wanted to influence the exit point:
Answer to Question 2:
If you wanted to influence the exit point, you could configure a “Route Profile for Interleak” at the L3Out level and influence MED. Lower MED has preference.
Let’s take a look at this with an example:
We have the following topology:
Pod1 has one L3Out and Pod2 has 2 L3Outs.
10.10.10.10/32 is being learnt from all L3Outs, via OSPF

If we did not maniuplate BGP attriburtes, then you would see that in each POD 10.10.10.10/32 is preferred through the POD’s local L3Out.
Looking at Compute Leaf-101 on POD1:
Leaf-101# show ip route vrf SM-McastRecreate:RCT | grep -A 2 10.10.10.10
10.10.10.10/32, ubest/mbest: 1/0
*via 10.0.200.64%overlay-1, [200/42], 00:01:29, bgp-12121, internal, tag 12121
20.20.20.0/24, ubest/mbest: 1/0, attached, direct, pervasive
Looking at Compute Leaf-301 on POD2:
Leaf-301# show ip route vrf SM-McastRecreate:RCT | grep -A 2 10.10.10.10
10.10.10.10/32, ubest/mbest: 2/0
*via 10.1.208.64%overlay-1, [200/42], 00:05:08, bgp-12121, internal, tag 12121
*via 10.1.176.64%overlay-1, [200/42], 00:05:08, bgp-12121, internal, tag 12121
So, if you look at the above, you will see:
- That POD1 is choosing it’s local L3Out and POD2 is choosing it’s local L3Out to reach 10.10.10.10/32
- The prefix 10.10.10.10/32 is learnt from iBGP, hence admin distance of 200. The med is 42, because we have associated a OSPF interface policy to every L3Out where we’ve changed the link cost to 41 as shown below.

Now, we will add an InterLeak Route Policy and decrease BL-303 BGP MED to a lower number than 42. That way BL-303 will get chosen for the L3out regardless of which POD the endpoint resides on.
The Interleak Route Map is applied on the L3Out as you can see below.

The interleak Route Policy is defined under External Routed Networks/ Route Maps/Profiles section.
in this policy we match on everything (all prefix) and set MED to 10 as you can see below:

To check that the route-map has been applied to Leaf-303 on POD2 we do the followig:
Leaf-303# show bgp process vrf SM-McastRecreate:RCT | grep route-map
direct, route-map permit-all
static, route-map imp-ctx-bgp-st-interleak-2162695
ospf, route-map imp-ctx-proto-interleak-2162695
direct, route-map permit-all
static, route-map imp-ctx-bgp-st-interleak-2162695
Leaf-303# show route-map imp-ctx-proto-interleak-2162695
route-map imp-ctx-proto-interleak-2162695, permit, sequence 201
Match clauses:
Set clauses:
metric 10
The above verifies that the MED was set to 10 when redsitrubting prefixes from OSPF to vpnv4 address family.
Now Looking at prefix for 10.10.10.10/32 from compute Leaf 101 on POD1 we see that the next hop is 10.1.176.64 which is from POD2 Leaf 303.
Also, Notice that the MED is 10 and it’s learnt through iBGP (admin distance 200)
Leaf-101# show ip route vrf SM-McastRecreate:RCT | grep -A 2 10.10.10.10
10.10.10.10/32, ubest/mbest: 1/0
*via 10.1.176.64%overlay-1, [200/10], 00:14:34, bgp-12121, internal, tag 12121
20.20.20.0/24, ubest/mbest: 1/0, attached, direct, pervasive
Leaf-101# acidiag fnvread | grep 10.1.176.64
303 2 Leaf-303 FDO21110KVD 10.1.176.64/32 leaf active 0
To see the details of the best path selection we see from below that the reason for selecting the path as Leaf 303 L3Out is because it has a lower MED than others.
Leaf-101# show bgp vpnv4 unicast 10.10.10.10/32 vrf SM-McastRecreate:RCT
BGP routing table information for VRF overlay-1, address family VPNv4 Unicast
Route Distinguisher: 10.0.200.67:35 (VRF SM-McastRecreate:RCT)
BGP routing table entry for 10.10.10.10/32, version 27 dest ptr 0xaa448c50
Paths: (3 available, best #3)
Flags: (0x08001a 00000000) on xmit-list, is in urib, is best urib route
vpn: version 817, (0x100002) on xmit-list
Multipath: eBGP iBGP
VPN AF advertised path-id 3
Path type: internal 0xc0000018 0x40 ref 56506, path is valid, not best reason: MED
Imported from 10.0.200.64:35:10.10.10.10/32
AS-Path: NONE, path sourced internal to AS
10.0.200.64 (metric 3) from 10.0.200.65 (10.149.194.101)
Origin incomplete, MED 42, localpref 100, weight 0
Received label 0
Received path-id 1
Extcommunity:
RT:12121:2162695
VNID:2162695
COST:pre-bestpath:162:110
Originator: 10.0.200.64 Cluster list: 10.149.194.101
VPN AF advertised path-id 2
Path type: internal 0xc0000018 0x40 ref 56506, path is valid, not best reason: NH metric
Imported from 10.1.208.64:1:10.10.10.10/32
AS-Path: NONE, path sourced internal to AS
10.1.208.64 (metric 64) from 10.0.200.65 (10.149.194.101)
Origin incomplete, MED 42, localpref 100, weight 0
Received label 0
Received path-id 2
Extcommunity:
RT:12121:2162695
COST:pre-bestpath:165:2415919104
VNID:2162695
COST:pre-bestpath:162:110
Originator: 10.1.208.64 Cluster list: 10.149.194.101 10.149.195.254
Advertised path-id 1, VPN AF advertised path-id 1
Path type: internal 0xc0000018 0x40 ref 56506, path is valid, is best path
Imported from 10.1.176.64:4:10.10.10.10/32
AS-Path: NONE, path sourced internal to AS
10.1.176.64 (metric 64) from 10.0.200.65 (10.149.194.101)
Origin incomplete, MED 10, localpref 100, weight 0
Received label 0
Received path-id 2
Extcommunity:
RT:12121:2162695
COST:pre-bestpath:165:2415919104
VNID:2162695
COST:pre-bestpath:162:110
Originator: 10.1.176.64 Cluster list: 10.149.194.101 10.149.195.254
VRF advertise information:
Path-id 1 not advertised to any peer
VPN AF advertise information:
Path-id 1 not advertised to any peer
Path-id 2 not advertised to any peer
Path-id 3 not advertised to any peer
Conclusion:
ACI L3Out Route Maps can be manipulated to obtain desired results. This is no different than regular routing techniques that we have been using for decades.
We influenced path selection by changing BGP MED which is #6 in the BGP selection process. By doing this, we avoid Path Selection using “Lowest IGP metric”, which is #8 in the list.
- Weight: Not Set = 0
- Local Pref: 100 on same
- Aggregate: Not Applicable
- Shortest AS Path: Not Applicable, both learnt from OSPF on border router
- Lowest Origin type: ? = incomplete/Unknown
- Lowest MED: Not Set, default = 104
- Admin Distance, eBGP (20) over iBGP(200): Both iBGP
- Lowest IGP metric // 3 Vs. 64, 3 is chosen
Hi,
I migrated from Dual Fabric to Multipod and each POD has own subnets and own L3out as show below:
POD-1
1. POD-1 has own subnets
2. POD-1 has own L3-Out, and running OSPF with FW
POD-2
1. POD-2 has own subnets
2. POD-2 Has own L3-Out, and running OSPF with FW
What I want to achieve is, if there is a way to attached two L3out (POD-1&2) under all BDs, the local L3out will be always prefer than other L3out for example:
POD-2-BD: always Prefer POD-2 L3out but if there is a failure on POD-2-FW, I want POD-1 L3out
Thanks
Mohammed, so sorry for the late reply. First of all congratulations for moving over from Dual Fabric to Multipod ! Way to go !!! The behavior you ask about is the default behavior in Multipod. If a BD has multiple L3 Outs, from different PODs, it will always choose it’s own L3 Out first. If the prefixes are not learnt from that L3 Out (for whatever reason), then it will use another PODs L3 Out. Ofcourse, you can always manipulate BGP attributes to do whatever you desire. Nothing different from the normal routing that we are used to.
Hi Soumitra,
The challenge or my concern here, when i did a vomition from POD-1 to POD-2, i want this VM use the POD-1 L3out not local L3out (POD-2 L3out),
How can achieve it ?
Regards,