The mere mention of that word starts an immediate flurry of word association. Things like “broadcast storm”, “loops”, and “outage” spring to mind, all the while palms get sweaty reminiscing the memories of bridge calls with angry and impatient end users. To be honest, the Spanning-tree protocol gets a bad wrap. Its whole purpose is to prevent the aforementioned network plagues by working holistically with all devices in the Layer-2 domain to develop a fully formed picture of the environment, and then decide where traffic will be forwarded, and what interfaces will go into a blocking state. When it comes to migrating workloads from your legacy environment to ACI, you’ll need to be prepared for those two worlds to co-exist.
Now that I’ve dedicated a whole paragraph to my old friend (and nemesis), we’ve got to move forward with today’s discussion around how Spanning-tree and ACI interact. The main difference between ACI and Legacy switches, is that ACI switches do not run the Spanning-tree Protocol (not 802.1d, not 802.1w, not RPVST+, none of the above). ACI will allow STP BPDUs to cross the fabric, acting as a hub or cable, but that is it.
Notables about ACI and Spanning-tree co-existence
- ACI switches do not actively participate in spanning-tree
- To be specific, there is no Spanning-tree protocol processes running on any ACI switches in the fabric
- This means that ACI switches do not craft and send their own STP BPDUs
- Even though ACI does not generate STP BPDUs, ACI switches will forward STP BPDUs across EPGs on which they are received.
- In fact, we want ACI to forward the BPDUs across to other switches! This allows the externally connected switches to maintain a loop-free topology and avoid broadcast storms and other nastiness that goes hand-in-hand when layer-2 loops form!
- STP BPDUs from Legacy switches are flooded within the EPG, not the BD. This is a change from most all flooded traffic in ACI. Most of the time when we talk about traffic being flooded inside of ACI, the flooding is occurring at the BD level**.
- TCNs generated from switches running Spanning-tree will cause ACI to flush endpoints from EPGs in which they are received. This can result in intermittent traffic for devices on those EPGs. If you want to know more about this, check out this article – STP and ACI: Intermittent packet loss due to TCNs.
- **NOTE – By default, there are protocols are that flooded within a Bridge Domain. These protocols are OSPF/OSPFv3, BGP, EIGRP, CDP, LACP, LLDP, ISIS, IGMP, PIM, ARP/GARP, RARP, and ND (refer to the APIC L2 Config Guide on CCO).
- ACI FEX ports have BPDUguard enabled by default, and this cannot be turned off. ACI never expects to see a BPDU from a device attached to a FEX.
- The best way to sum up how ACI operates with STP; ACI acts as a hub.
Best Practices when connecting External Switches to your ACI Fabric via Layer-2
When you are connecting an external environment to your ACI Fabric via a Layer-2 connection, you’re probably doing so for one of two reasons: Number-1, you are establishing a link between the existing environment and your ACI Fabric in order to migrate workloads. Number-2, you’re in this for the long haul, and perhaps your ACI Fabric will co-exist with your existing environment for months or years to come.
Either way, you need to be armed with the right information that will allow you to design your Layer-2 connection between the two environments in such as a way that allows you and you to effectively troubleshoot and operate both environments, while ensuring maximum uptime.
For your Legacy Switches (i.e., switches which run Spanning-tree)
- Configure your Spanning-tree link types to Shared on your external switches interfaces which connect to ACI.
- As explained in this article, ACI Operation with L2 Switches and Spanning-tree Link Types, by default, the STP link type on Legacy switches is P2P. An STP link type using P2P ensures a rapid transition from blocking to forwarding, which is great until you remember that ACI acts as a hub for BPDUs. By configuring the STP link type as Shared for your external switch interfaces which connect to ACI, you ensure that you allow the switches to take their time with the STP transition process, thereby protecting your environment from potential STP loop formation.
- Limit TCNs coming into ACI. Limiting TCNs can be done primarily by ensuring you have best practices configured for STP in your Legacy environment. We want to avoid STP TCNs in ACI, because they can lead to intermittent traffic loss when the endpoint tables are flushed. For more information on this, check out the article – STP and ACI: Intermittent packet loss due to TCNs. Below are recommendations which will help to limit the impact of STP TCNs on ACI:
- Use vPC connections from Legacy switches to the ACI Fabric. The idea here is that the fewer forwarding STP ports connecting to ACI, the better.
- BPDUguard and Portfast should be enabled on EVERY non-switch port. Portfast ensures that TCNs are not generated when host ports go up and down.
- Limit the amount of switches below the first layer of external switches that connect to ACI (i.e., only host ports on switches) wherever possible << This limits the STP radius
- Prune unnecessary Vlans from Trunk ports.
- Don’t enable all 4094 Vlans on your External switches. Only configure what is needed.
- And if all else fails, consider Outside/Inside EPGs for your Vlan. The Outside/Inside EPG design for Vlans allow us to isolate Outside devices and their TCNs from the Inside devices already migrated to ACI. We’ll explain more about this in the article STP and ACI: Intermittent packet loss due to TCNs.
- (optional) – Understand the design differences between MST and RPVST/PVST. This is optional because you may not be using MST at all. However, if you are using MST, or plan on it, you need to be aware of the fundamental differences between it and the more commonly used RPVST and PVST.
- BPDU frames in MST are sent on the Native Vlan (these are sent untagged), and not on a per-vlan basis as is the case with RPVST and PVST. This means we will have to make configuration specifically to accommodate for these differences in the ACI Fabric.
- To deal with the untagged MST BPDUs, you’ll need to configure a couple of things differently than what you do when dealing with RPVST or PVST.
- Create a Switch Policy Group for your MST region – Under Fabric Access Policies, you’ll need to create a Switch Policy Group (note – I’m not talking about Interface Policy Groups). For this Policy Group, you’ll create a Spanning-tree policy and add in information about your MST region.
- Native Vlan EPG – For ACI, in your Tenant where you Layer-2 connection resides, you will need to create a specific EPG to carry the MST BPDUs. The static path bindings for your Legacy switches will be configured as dot1p (native). Failure to do this could very likely result in a loop.
For your ACI Fabric
- Enable MisCabling Protocol (MCP) Fabric-wide, and on all interfaces. MCP detects loops from external sources (i.e., misbehaving servers, external networking equipment running STP) and will err-disable the interface on which ACI receives its own packet. Enabling this feature is a best practice and it should be enabled globally and on all interfaces, regardless of the end device.
- NOTE – Because MCP works to stop Layer-2 loops, it should be enabled right away on an ACI Fabric prior to connecting Layer-2 devices for migration purposes.
- If you want a detailed explanation of MCP, please take a look at the article here, Using MCP for ACI.
4 thoughts on “Spanning-tree (STP) and ACI”
Great article as usual Jody ! Thank you for sharing this ! One question: “STP BPDUs from Legacy switches are flooded within the EPG, not the BD”. I though the BPDUs are flooded within “VLAN Encapsulation” ? In a single EPG, we can configure multiple VLAN encapsulations. One of the recommendation in Cisco whitepaper mentioned that using a different VLAN encap between External L2 and ACI connected servers can limit the impact
Peter – you are correct. The encap vlan is truly where the bpdu is flooded. An example that highlights this is an EPG with a static path binding of vlan 10 for bare metal servers, but a different (dynamic) vlan encap for a VMM domain. In this case, the TCNs from the bare metal vlan would not be forwarded to the servers in the VMM domain.
I have a ring topology: a pair of 3850s connected back-to-back, and said 3850s connected with PortChannels via vPC to two leafs, in the same EPG/BD. I am trunking vlan 100, “show span vlan 100” shows the Root port as PC on both 3850s, meaning it works as expected – ACI does not add up any cost and relays BPDUs. However, when I ping the other 3850’s SVI100 off SVI100 of its counterpart, the ping is dropped. If I disable PC or withdraw vlan 100 from a PC towards the leaves, i.e. making the back-to-back link between 3850s as root port, ping works. Somehow the hub breaks the communication. Any ideas?
Jet – Do you have the dot1q tag of 100 set for the EPG where your 3850s are connected? I’d check and see if that was properly configured, and then verify that you can see the endpoints in the endpoint table for that EPG. Because this is intra-EPG, we don’t have to worry about contracts, so this should be straight-forward L2 adjacency.