Common BGP Multipod Queston that comes up from Customer

Questions:

Q1: In a MultiPod Scenario,  I have 2 L3 Outs with OSPF for the same Tenant/VRF.  Why does a packet going to  an external prefix always prefer the locally connected L3 Out from a POD ?  

Q2: How Can I influence particular prefixes to go through the L3Out of another POD ?

Answer to Q1:

On POD2, Let’s look at prefix 41.41.41.41/32 that is learnt from outside in VRF CDC
You will notice that we have 3 paths for the prefix 41.41.41.41/32.
10.1.240.34 is the border leaf on POD2
10.0.200.67 is Border Leaf1 on POD1 (ospf peering on pod1 is done on VPC)
10.0.200.64 is Border Leaf2 on POD2 (ospf peering on pod1 is done on VPC)

So, you see that the preferred route for 41.41.41.41/32 is on the local pod (POD2) L3Out

Let’s check at who the next hops are:

Let’s check cost to next hop:

Notice IGP cost is 3

Now, let’s look at the Best Path Selection:

So, you see what’s happening is that it goes through BGP Selection Algorithm and matches on item # 8.

  1. Weight: Not Set  = 0
  2. Local Pref: 100 on same
  3. Aggregate: Not Applicable
  4. Shortest AS Path: Not Applicable,  both learnt from OSPF on border router
  5. Lowest Origin type: ? = incomplete/Unknown
  6. Lowest MED: Not Set,  default = 104
  7. Admin Distance, eBGP (20) over iBGP(200): Both iBGP
  8. Lowest IGP metric: 3 Vs. 64,   3 is chosen

Q2: What if you wanted to influence the exit point:

Answer to Question 2:

If you wanted to influence the exit point, you could configure a “Route Profile for Interleak” at the L3Out level and influence MED. Lower MED has preference.

Let’s take a look at this with an example:

We have the following topology:

Pod1 has one L3Out and Pod2 has 2 L3Outs.
10.10.10.10/32 is being learnt from all L3Outs, via OSPF

If we did not maniuplate BGP attriburtes, then you would see that in each POD 10.10.10.10/32 is preferred through the POD’s local L3Out.

Looking at Compute Leaf-101 on POD1:

Leaf-101# show ip route vrf SM-McastRecreate:RCT | grep -A 2 10.10.10.10
10.10.10.10/32, ubest/mbest: 1/0
*via 10.0.200.64%overlay-1, [200/42], 00:01:29, bgp-12121, internal, tag 12121
20.20.20.0/24, ubest/mbest: 1/0, attached, direct, pervasive

Looking at Compute Leaf-301 on POD2:

Leaf-301# show ip route vrf SM-McastRecreate:RCT | grep -A 2 10.10.10.10
10.10.10.10/32, ubest/mbest: 2/0
*via 10.1.208.64%overlay-1, [200/42], 00:05:08, bgp-12121, internal, tag 12121
*via 10.1.176.64%overlay-1, [200/42], 00:05:08, bgp-12121, internal, tag 12121


So, if you look at the above, you will see:

  • That POD1 is choosing it’s local L3Out and POD2 is choosing it’s local L3Out to reach 10.10.10.10/32
  • The prefix 10.10.10.10/32 is learnt from iBGP, hence admin distance of 200. The med is 42, because we have associated a OSPF interface policy to every L3Out where we’ve changed the link cost to 41 as shown below.

Now, we will add an InterLeak Route Policy and decrease BL-303 BGP MED to a lower number than 42. That way BL-303 will get chosen for the L3out regardless of which POD the endpoint resides on.

The Interleak Route Map is applied on the L3Out as you can see below.

The interleak Route Policy is defined under External Routed Networks/ Route Maps/Profiles  section.

in this policy we match on everything (all prefix) and set MED to 10 as you can see below:

To check that the route-map has been applied to Leaf-303 on POD2 we do the followig:

Leaf-303# show bgp process vrf SM-McastRecreate:RCT | grep route-map
        direct, route-map permit-all
        static, route-map imp-ctx-bgp-st-interleak-2162695
        ospf, route-map imp-ctx-proto-interleak-2162695
        direct, route-map permit-all
        static, route-map imp-ctx-bgp-st-interleak-2162695

Leaf-303#  show route-map imp-ctx-proto-interleak-2162695           
route-map imp-ctx-proto-interleak-2162695, permit, sequence 201
  Match clauses:
  Set clauses:
    metric 10

The above verifies that the MED was set to 10 when redsitrubting prefixes from OSPF to vpnv4 address family.

Now Looking at prefix for 10.10.10.10/32 from compute Leaf 101  on POD1 we see that the next hop is 10.1.176.64 which is from POD2 Leaf 303.

Also, Notice that the MED is 10 and it’s learnt through iBGP (admin distance 200)

Leaf-101# show ip route vrf SM-McastRecreate:RCT | grep -A 2 10.10.10.10
10.10.10.10/32, ubest/mbest: 1/0
    *via 10.1.176.64%overlay-1, [200/10], 00:14:34, bgp-12121, internal, tag 12121
20.20.20.0/24, ubest/mbest: 1/0, attached, direct, pervasive

Leaf-101# acidiag fnvread | grep 10.1.176.64
     303        2             Leaf-303      FDO21110KVD     10.1.176.64/32    leaf         active   0

To see the details of the best path selection we see from below that the reason for selecting the path as Leaf 303 L3Out is because it has a lower MED than others.

Leaf-101# show bgp vpnv4 unicast 10.10.10.10/32 vrf SM-McastRecreate:RCT
BGP routing table information for VRF overlay-1, address family VPNv4 Unicast
Route Distinguisher: 10.0.200.67:35    (VRF SM-McastRecreate:RCT)
BGP routing table entry for 10.10.10.10/32, version 27 dest ptr 0xaa448c50
Paths: (3 available, best #3)
Flags: (0x08001a 00000000) on xmit-list, is in urib, is best urib route
  vpn: version 817, (0x100002) on xmit-list
Multipath: eBGP iBGP

VPN AF advertised path-id 3
  Path type: internal 0xc0000018 0x40 ref 56506, path is valid, not best reason: MED
             Imported from 10.0.200.64:35:10.10.10.10/32
  AS-Path: NONE, path sourced internal to AS
    10.0.200.64 (metric 3) from 10.0.200.65 (10.149.194.101)
      Origin incomplete, MED 42, localpref 100, weight 0
      Received label 0
      Received path-id 1
      Extcommunity:
          RT:12121:2162695
          VNID:2162695
          COST:pre-bestpath:162:110
      Originator: 10.0.200.64 Cluster list: 10.149.194.101

VPN AF advertised path-id 2
  Path type: internal 0xc0000018 0x40 ref 56506, path is valid, not best reason: NH metric
             Imported from 10.1.208.64:1:10.10.10.10/32
  AS-Path: NONE, path sourced internal to AS
    10.1.208.64 (metric 64) from 10.0.200.65 (10.149.194.101)
      Origin incomplete, MED 42, localpref 100, weight 0
      Received label 0
      Received path-id 2
      Extcommunity:
          RT:12121:2162695
          COST:pre-bestpath:165:2415919104
          VNID:2162695
          COST:pre-bestpath:162:110
      Originator: 10.1.208.64 Cluster list: 10.149.194.101 10.149.195.254

Advertised path-id 1, VPN AF advertised path-id 1
  Path type: internal 0xc0000018 0x40 ref 56506, path is valid, is best path
             Imported from 10.1.176.64:4:10.10.10.10/32
  AS-Path: NONE, path sourced internal to AS
    10.1.176.64 (metric 64) from 10.0.200.65 (10.149.194.101)
      Origin incomplete, MED 10, localpref 100, weight 0
      Received label 0
      Received path-id 2
      Extcommunity:
          RT:12121:2162695
          COST:pre-bestpath:165:2415919104
          VNID:2162695
          COST:pre-bestpath:162:110
      Originator: 10.1.176.64 Cluster list: 10.149.194.101 10.149.195.254

VRF advertise information:
Path-id 1 not advertised to any peer

VPN AF advertise information:
Path-id 1 not advertised to any peer
Path-id 2 not advertised to any peer
Path-id 3 not advertised to any peer

Conclusion:  

ACI L3Out Route Maps can be manipulated to obtain desired results. This is no different than regular routing techniques that we have been using for decades.

We influenced path selection by changing BGP MED  which is #6 in the BGP selection process.   By doing this, we avoid Path Selection using “Lowest IGP metric”, which is #8 in the list.

  1. Weight: Not Set  = 0
  2. Local Pref: 100 on same
  3. Aggregate: Not Applicable
  4. Shortest AS Path: Not Applicable,  both learnt from OSPF on border router
  5. Lowest Origin type: ? = incomplete/Unknown
  6. Lowest MED: Not Set,  default = 104
  7. Admin Distance, eBGP (20) over iBGP(200): Both iBGP
  8. Lowest IGP metric      // 3 Vs. 64,   3 is chosen

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.