ACI Best Practice Configurations

The top question all new ACI customers have (or should have), is what are the configurations that should be enabled on my fabric from the beginning? With that in mind, we’re going use this post as a living document with configurations that are considered “Best Practice” to have enabled. We will keep this document updated as new versions come out, so don’t forget to bookmark this page! Wherever possible, we will include the Cisco documentation for the links, or at the very least, a detailed explanation of our reasoning.

 

Global Settings Best Practices:

  1. MCP (per Vlan) should be enabled – MisCabling Protocol (or MCP) detects loops from external sources (i.e., misbehaving servers, external networking equipment running STP) and will err-disable the interface on which ACI receives its own packet.
    1. This can be enabled by going to Fabric > Access Policies > Global Policies > MCP Instance Policy default
    2. Make sure and enable the “Enable MCP PDU per VLAN” option (available after 2.0(2)), which enables MCP to send packets on a per-EPG basis, otherwise, these packets will only be sent on untagged EPGs (which basically makes it useless from a loop-detection perspective).
    3. If you want to read more about, MCP, go check out this post!
  2. Disable Remote EP Learn – This will disable remote IP learning on border leaf switches.
    1. Prior to 3.0, this can be enabled by going to Fabric > Access Policies > Global Policies > Fabric Wide Setting Policy
    2. After 3.0, this can be enabled by going to System > System Settings > Fabric Wide Setting
    3. This is first available starting with 2.2(2e) and all code after
    4. Be aware of CSCvi11291 (fixed in 3.2(1l) and later). This bug will allow remote EP learns on border leaf switches even if Disable Remote EP learn is configured when the switch receives packets with src/dst of tcp 179.
  3. Enforce Subnet Check (will only work on -EX and -FX based leafs)
    1. Prior to 3.0, this can be enabled by going to Fabric > Access Policies > Global Policies > Fabric Wide Setting Policy
    2. After 3.0, this can be enabled by going to System > System Settings > Fabric Wide Setting
    3. This is first available starting with 2.2(2q) and all 2.2(x) code after
      1. Not available in 2.3(x)
      2. First available for 3.0 starting with 3.0(2k) and after
    4. Enforce Subnet Check is somewhat like “Limit IP Learning to subnet”, but on steroids. You might remember that the “Limit IP Learning to subnet” BD configuration option prevents the learning of IP endpoints if they are not a subnet configured on the BD. “Limit IP Learning to subnet” does NOT drop the packet, it just stops it from being learning on the BD. The packet can still be learned on a leaf that does not have the BD configured (i.e., a border leaf). This can be problematic, and thus, the need for the Enforce Subnet configuration option. When enabled, we will not learn the IP component at the VRF level as well.
    5. Be aware of CSCvh17285 (fixed in 3.2(1l) and later). When Enforce Subnet is enabled, any Bridge Domains which are configured as L2-only -AND- have L2 Unknown unicast set to proxy will result in mac addresses not being learned from ARP/GARP packets. The workaround is to have the L2 BD configured for L2 Unknown Unicast = Flood.
  4. EP Loop Detection
    1. While the EP Loop Detection configuration has good intentions, (i.e., finding a loop, and killing it), I have found that it is triggered as often (or more) by false positives, such as Vmotions of VMs, as it finds true loops. For this reason, while I would leave it enabled, I would make sure that both of the actions (i.e., BD learn disable, Port disable) were disabled (not checked). With both of the actions disabled, EP loops will still generate faults and be sent to your Syslog/SNMP Trap server, if configured.
    2. Prior to 3.0, this can be enabled (or disabled) by going to Fabric > Access Policies > Global Policies > EP Loop Detection Policy
    3. After 3.0, this can be enabled by going to System > System Settings > Endpoint Controls > EP Loop Detection
  5. IP Aging should be enabled
    1. When IP Aging is not enabled (which is the default), if multiple IP’s are learned on a single MAC, then as long as the MAC is active then all IP’s will stay learned on the fabric. Cosmetically, this is undesirable in scenarios where DHCP enabled hosts get a new IP address but both IP’s are still shown within the EPG operational tab as tied to that MAC.  This feature will age each IP separately to address that scenario. At 75% of the endpoint retention timer, a directed ARP is sent to the IP component of the endpoint, and if unanswered, ACI will allow the IP endpoint to age out.
    2. Prior to 3.0, this can be enabled by going to Fabric > Access Policies > Global Policies > IP Aging Policy
    3. After 3.0, this can be enabled by going to System > System Settings > Endpoint Control > IP Aging (look to the right for this tab)
    4. This is first available starting with 2.1(1h) and all code after
  6. Rogue Endpoint Detection should be enabled.
    1. Starting with 3.2, Rogue Endpoint detection will lessen the impact from flapping endpoints.
    2. When Rogue Endpoint detection is enabled, the misbehaving endpoint (MAC/IP) will be quarantined and a fault will be generated to allow for easy identification.
    3. After 3.2, this can be enabled by going to System > System Settings > Endpoint Controls > Rogue EP Control
    4. Recommended Values:
      1. Rogue EP Detection Interval = 30
      2. Rogue EP Detection Multiplication Factor = 6
  7. Enable Strict COOP Group Policy
    1. The APIC provides a managed object (fabric:SecurityToken), that includes an attribute to be used for the MD5 password. An attribute in this managed object, called “token”, is a string that changes every hour. COOP obtains the notification from the DME to update the password for ZMQ authentication. The attribute token value is not displayed.
      There are 2 choices,  Compatible Type and Strict Type.  Compatible Type accepts both MD5 authenticated and non-authenticated ZMQ connections, whereas Strict Type only allows MD5 authenticated ZMQ connections.
    2. This can be enabled by going to System > System Settings > COOP Group
  8. Enable BFD for Fabric Facing Interfaces
    1. BFD for the fabric-to-fabric interfaces (Leaf to Spine) will speed up convergence during failure scenarios.
    2. This can be enabled by going to Fabric > Fabric Policies > Policies > L3 Interface > default > BFD ISIS Policy Configuration
  9. Preserve COS through the ACI Fabric.
    1. This can be enabled by going to Fabric > Access Policies > Policies > Global > QOS Class > Preserve COS
    2. APIC enables the preservation of 802.1P class of service (CoS) settings within the fabric. Enable the fabric global QoS policy dot1p-preserve option to guarantee that the CoS value in packets which enter and transit the ACI fabric is preserved. 802.1P CoS preservation is supported in single pod and Multi-Pod topologies.

At a high level, options 2 and 3 will prevent the mis-learning of IP endpoints on your fabric that can occur. Mis-learning of endpoints leads to things like black-holed packets, as a remote IP endpoints can get stuck on a border leaf (for example). The process of clearing such events is cumbersome and causes a lot of heartburn. For detailed examples of use cases for each of the endpoint configuration knobs, please check out the ACI Endpoint Learning Whitepaper (below). While I always recommend that these changes are performed in a maintenance window, the impact from enabling these options would be basically non-existent (i.e., a flush of remote IP endpoints in the VRF will occur). 

ACI Endpoint Learning WhitePaper

https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-739989.pdf

 

Bridge Domain Best Practices:

For Bridge Domains, there are a wide-mixture of use-cases, and lots of perfectly valid use-cases for different configurations. So – in general, best practice is in the eye of the beholder. However, with that being said, I’ll try a few blanket recommendations, with appropriate caveats.

  1. Do not enable Unicast Routing if ACI is not the L3 Gateway for your Subnet.  Why would you ever enable unicast routing if ACI is not the L3 Gateway?  Without Unicast routing enabled, ACI will not learn the IP address for Endpoints. This leads some customers to enable Unicast routing, because (understandably) they want to learn the IP endpoint and not just the mac-address of connected devices. The problem with this, is it can lead to asymmetric routing, which can result in packets being dropped or mis-routed.
  2. Configure a single subnet for each Bridge Domain. ACI will only forward dhcp requests on the primary subnet for each BD. If you have second subnet configured on the same BD, DHCP will not work for the 2nd BD and beyond.
  3. In Network-Centric Mode (i.e., VLAN=EPG=BD), Do not configure multiple EPGs to a BD. When you mapping Vlans to EPGs and BDs in ACI, the external STP and HSRP multicasts are flooded in the same BD. For example, if you have Vlan 11 (EPG11) and Vlan12 (EPG12) attached to the same BD, HSRP hellos for both Vlans will intermingle in the BD and cause problems in your external (non-ACI) environment.
  4. Enable Limit IP Learning to Subnet – This should be enabled 99.999% of the time. It limits the IP learning of endpoints based on the subnets configured on the respective bridge domains. Note – If you have -EX or -FX based leafs and have configured “Enforce Subnet Check” Globally, this is turned on whether you have enabled it or not. 🙂
  5. Consider ARP Flooding + GARP-based detection – This is a 50/50 recommendation. I could go either way, but if it is my datacenter, I’m probably going to enable this configuration option. The Pro’s for GARP-based detection is that it will prevent IP learning issues in a specific situations. The Con is that you have to enabled ARP Flooding on the BD before you can configure the GARP-based detection. From the Cisco ACI Fabric Endpoint Learning Whitepaper“Although Cisco ACI can detect MAC and IP address movement between leaf switch ports, leaf switches, bridge domains, and EPGs, it does not detect the movement of an IP address to a new MAC address if the new MAC address is from the same interface and same EPG as the old MAC address. When the GARP based detection option is enabled, Cisco ACI will trigger an endpoint move based on GARP packets if the move occurs on the same interface and same EPG. If a GARP packet comes from the same interface and same EPG, then endpoint learning is triggered only when Unicast Routing, ARP Flooding, and “GARP based detection” are all enabled for the bridge domain. Although this scenario has not been widely seen across our customer base, in some cases customers do change their IP to MAC bindings and need to enable GARP-based detection.”

 

Fabric Provisioning Best Practices

Performing an ACI Fabric Setup is one of the best things about ACI. However, proper planning for your fabric setup values is critical. When considering the values for your ACI fabric, it is important to remember that changing either the infrastructure IP address (TEP IP pool) range or the infra VLAN after the initial provisioning setup process is not possible without rebuilding the fabric.

When performing your initial Fabric Setup, you are required to input a “TEP address range”. This range of IP addresses is used primarily to provide TEP addresses for Leaf and Spine nodes in the fabric. While the default value for this is 10.0.0.0/16, it is considered best practice to provide a unique address block for your TEP pool for a couple of reasons:

  1. If you want to extend your TEP pool to AVE (ACI Virtual Edge) switches in the future, you want a unique address that does not overlap with existing routing in your network.
  2. If you want to have communication to external devices from the APIC (i.e., VCENTER for VMM integration), you would want addressing on your infra TEP pool that is unique to avoid IP address / routing conflicts for traffic coming back to the APIC from your VCENTER device.
  3. Note – Changing the infrastructure IP address range or the VLAN after initial provisioning is not possible without rebuilding the fabric.

The Infra Subnet should not overlap with any other routed subnets in your network. If this subnet does overlap with another subnet, change this subnet to a different /16 subnet.

  • Beginning with APIC 2.2 code, the minimum supported subnet for a 3-APIC cluster is a /23.
  • If you are using APIC 2.0(1) code up until APIC 2.2 code, the minimum is /22.
  • Infra TEP IP should be unused and unique. However, if you do not have any spare RFC1918 addresses, consider using the RFC6598 range (100.64/10  –  CGN use). This will ensure that this is never conflicted on the internet.
  • Every Fabric / POD infra TEP pool should come from a unique IP subnet range.

For more information about this, check out the Cisco APIC Getting Started Guide, Release 3.x guide on CCO.

Infra Vlan ID – Set your Infra Vlan to 3967

During fabric setup, ACI requires a VLAN to be used as the infrastructure VLAN. This VLAN is used for control traffic between devices that make up the fabric (i.e., leafs, spines, and APICs).

Because this vlan can be extended outside of the fabric (Openstack integration, AVS/AVE), it is a best practice to have this as a unique Vlan in your environment. In addition, many Cisco devices have reserved Vlan ranges that are hard to modify (i.e., you have to reboot the switches for changes to take effect). Vlan 3967 is a Vlan which is not reserved on any Cisco switching platform and ideal for ACI.

Node ID Settings – Spines should be numbered between 101-199; Leafs should be numbered 200 and above.

For more detailed information, check out the Cisco ACI Best Practices Guide for Fabric Provisioning.

 

ACI Fabric Naming Best Practices

Need a good primer on ACI Fabric Naming best practices? Check out this post for suggested tips on naming your objects in both the Tenant and Fabric Access Section of your fabric!

Also – please check out the Official Cisco ACI Best Practices guide on CCO!


7 thoughts on “ACI Best Practice Configurations

    1. Thanks! When enabled, the Disable Remote EP Learn and Enforce Subnet Check config knob do not allow remote leafs to learn IPs. This removes the possibility for the remote leafs to black hole traffic due to mislearned or stuck IP EPs. By disabling the remote learning of IPs on remote leafs, the remote leafs do not look up the IP component of the EP on the leaf, but punt the traffic to the Spines, which already have knowledge of all endpoints. This is explained in detail in the ACI Endpoint Learning Whitepaper – https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-739989.pdf

  1. Hello,

    great articles, too bad I did not found it earlier.

    This leads to my question :
    – What happens if you activate the 4 global options on a live fabric ?
    – What kind of disruption can we expect ? ( none ? And I mean it in the best possible scenario, when everything is working and it does not trigger something unsupported.

    Those are enhancements but what really happens ? Do process restarts, are cache cleared leaving to an unresponsive fabric, or blocked servers for a few seconds ?

    1. Laurent – Thanks for checking out the blog! To answer your questions, I always recommend these changes be done in a maintenance window (just to be safe) – but in actuality, the impact from enabling them should be minimal. For the Endpoint enhancements (Disable remote EP learn / Enforce Subnet Check) – ACI flushes all local IP endpoints outside bridge domain subnets and all
      remote IP endpoints. For IP aging, there should be no impact. For MCP, no impact (other than stopping a loop).

    2. Caveat – All changes should be enabled in a maintenance window; With that being said, MCP is non-disruptive (nothing is cleared or bounced). The only exception to this is if you had a loop in place, it would shut down the loop, but I think we can agree that would be a good thing. For Disable Remote EP Learn, this is non-disruptive. For Enforce Subnet Check, this should clear all remote IP learns when it is enabled. I do not believe there would be any real impact from that process, as the the traffic should revert to using the Spine-Proxy for L3 routing at that point.

  2. Hi Jody,

    Thanks a lot for the great blog,

    1. if i am using golf feature do you prefer to disable “Disable Remote EP Learn”?
    2. do you prefer to enable endpoint dataplane learning under BD since this is L3 BD?
    3. if i have faced an fault regarding ip address with multiple mac address, is there eny feature need to enable it to solve this issue?

    Thanks

    1. Mohammed – 1. yes, you can have disable remote ep learn enabled if you use golf. 2 – Do not disable dataplane ep learning on the bd unless you are using pbr. 3 – There is a feature coming in 4.0 (disable dataplane learning for the vrf) that would allow you to workaround the issue you are describing; more on that later!! 😉

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.