STP and ACI: Intermittent packet loss due to TCNs -

One issue that can arise when we connect ACI and Legacy STP environments is intermittent packet loss to ACI endpoints due to Spanning-tree TCNs. TCNs will trigger ACI to flush endpoints in the EPG on which the TCN was received. ACI does re-learn the endpoints based on normal data-plane learning, but if the TCNs are frequent, this can result in traffic disruption. In this article, we will look at how to follow the clues regarding Spanning-tree TCNs to effectively troubleshoot TCN-related issues, and better yet, how to design around the problem to prevent issues in the first place!

Now – before we get started, if you are unfamiliar with Spanning-tree TCNs, or if you just need a refresher in Spanning-tree TCNs, take a look at this article on CCO – Understanding Spanning-tree Topology Changes.

Also – make sure you are comfortable with how the ACI Fabric handles STP (i.e., it doesn’t participate, it only forwards). If you are not familiar with how ACI interoperates with STP, please take a look at this article – Spanning Tree (STP) and ACI.

Assumptions

We assume you are using PVST+ or RPVST+ is in use on the externally connected switches to ACI (for MST, we’ll address this later in a different post).

TCNs in ACI

For PVST and RPVST, if ACI receives a TCN in an EPG, it will flush the learned entries in the endpoint table for the corresponding EPG.
Why do we respond to TCNs in this manner? This is done because we want to make sure that ACI has the optimum path to the endpoint entries. Flushing and re-learning the topology after a TCN is how we do this.
NOTE – For MST, TCNs are treated differently. MST BPDUs contain TCNs for multiple Vlans, thus, we would flush endpoints for multiple EPGs.

The Problem

So what does this problem look like conceptually?

Just imagine we have a port flapping on a host, which is connected to a classic STP switch (SW2). If Spanning-tree Portfast is not enabled on this host-facing port on SW2, every time the port transitions from up-to-down, and vice-versa, SW2 will generate a topology change notification (TCN) and a TCN will be sent to all ports in the Vlan.
Once the TCN is received on SW1, it will be re-flooded to all ports in the Vlan, including the trunk connection going towards the ACI Leaf switch on which SW1 is connected.
ACI will receive the TCN on the port connected to EPG-101 (i.e., Vlan101), flood it to all ports in the EPG, and the perform a flush of all endpoints in the EPG. When the flush occurs, packets destined to the endpoints in the EPG could be lost until data-plane traffic from the endpoints cause learning of that endpoint to re-occur.

Tracking down the problem

In general, the first symptom that you are having this problem might be a report of intermittent packet loss to endpoints on a fabric.

If you believe you are encountering problems on your ACI Fabric due to TCNs coming in from external devices, there are a few steps to check to track down the potential problem!

Review MCP on your ACI Fabric – Login to your ACI Leaf switches which have a Layer-2 connection to your external switches. Use MCP commands to narrow down where the TCN is being received.
Review STP on your legacy devices – Once you have narrowed down ACI Fabric interface where the TCNs is being received with the MCP commands on ACI, login to the external (STP running) switches and interrogate Spanning-tree

Step 1 – Using MCP to find the TCN

The show mcp internal info interface vlan XXX command will show display the following info:

The number of STP Topology Change Notifications
Timestamp of last BD Flush due to STP TCN
The interface the last TCN was received on
- A physical interface indicates the TCN was locally received (i.e., directly connected
- An SVI interface indicates the TCN was received via the BD interface (i.e., the source of the TCN is not local)

Leaf204# show mcp internal info vlan 101
------------------------------------------------- 
PI VLAN: 14 Up 
Encap VLAN: 101 
PVRSTP TC Count: 11 
RSTP TC Count: 0 
Last TC flush at Thu Feb 28 16:54:53 2019 
on Vlan13 << TCN received on BD and the source is not local

Leaf204# show vlan extended

Vlan Name                   Encap           Ports
---- ---------------------  --------------  ------------------
9    infra:default          vxlan-16777209, E1/1, E1/2
                            vlan-3967
13   COAST:bd1              vxlan-15269818  E1/3, E1/13, Eth1/14
Vlan 13 = bd1 in Tenant COAST


Leaf201# show mcp internal info vlan 101
-------------------------------------------------
PI VLAN: 28 Up
Encap VLAN: 101 << Encap Vlan defined in EPG
PVRSTP TC Count: 34 << # of STP Topology change packets
RSTP TC Count: 0
Last TC flush at Thu Feb 28 16:54:53 2019 << Last TCN packet received
on Ethernet1/13 << Interface last TCN received

Now – the above command executed at the leaf level is nice, but if you really want to gain an appreciation for how awesome ACI is, let’s bring the APIC into the picture and execute this command for ALL LEAFS on the fabric from the APIC itself. This will allow us to quickly track down which interface on which Leaf switch the TCN is being received!

apic1# fabric 201-206 show mcp internal info vlan 101

-------------------------------------------------
Node 201 (Leaf201)
-------------------------------------------------
PI VLAN: 28 Up
Encap VLAN: 101
PVRSTP TC Count: 34
RSTP TC Count: 0
Last TC flush at Thu Feb 28 16:54:53 2019
on Ethernet1/13 << Interface last TCN received

-------------------------------------------------
Node 202 (Leaf202)
-------------------------------------------------
PI VLAN: 26 Up
Encap VLAN: 101
PVRSTP TC Count: 11
RSTP TC Count: 0
Last TC flush at Thu Feb 28 16:54:53 2019
on port-channel1 << Interface last TCN received

-------------------------------------------------
Node 203 (Leaf203)
-------------------------------------------------
PI VLAN: 14 Up
Encap VLAN: 101
PVRSTP TC Count: 11
RSTP TC Count: 0
Last TC flush at Thu Feb 28 16:54:53 2019
on Vlan13

-------------------------------------------------
Node 204 (Leaf204)
-------------------------------------------------

-------------------------------------------------
Node 205 (Leaf205)
-------------------------------------------------

-------------------------------------------------
Node 206 (Leaf206)
-------------------------------------------------

Step 2 – Using STP to find the TCN

By reviewing LLDP / CDP information, I can clearly see that TCNs are being from my Nexus 7000. Once I login to my Nexus 7000, I issue the “show spanning-tree vlan 101 detail” command, and continue to follow the trail.

LabCore01# show spanning-tree vlan 101 detail

VLAN0101 is executing the rstp compatible Spanning Tree protocol
Bridge Identifier has priority 32768, sysid 101, address 00de.fb79.8bc3
Configured hello time 2, max age 20, forward delay 15
We are the root of the spanning tree
Topology change flag not set, detected flag not set
Number of topology changes 35 last change occurred 00:14:32 ago
from Ethernet1/40  << End host on port Ethernet 1/40
Times: hold 1, topology change 35, notification 2hello 2, max age 20, forward delay 15
Timers: hello 0, topology change 0, notification 0

When I look through the logfiles for my Nexus7000, I see that Ethernet1/40 has been bouncing up and down.

After examining the Nexus7000 interface configuration for interface e1/40, I can see that I do NOT have spanning-tree portfast enabled. Even though this is a trunk port, this is a trunk going to an end-host.

interface Ethernet1/40
description UCS-C Series COAST_UCS1
switchport
switchport mode trunk
switchport trunk allowed vlan 1-1101,1103-4094
mtu 9000
no shutdown

By adding the “spanning-tree port type edge trunk” command, I enable portfast, thereby ensuring that the port going up / down will not generate TCNs.

Design Considerations to mitigate exposure to TCNs

While ACI does not generate STP BPDUs, STP BPDUs from connected devices should be allowed to traverse the ACI Fabric to ensure that external switches can process and respond accordingly to prevent STP loops from occurring. Remember, the ACI Fabric acts like a hub between traditional switches as they send their STP BPDUs. Below are a few guidelines for ensuring that you reduce the risk of STP loops or TCNs impacting devices that connect to your ACI Fabric.

Enable BPDUguard and Portfast on your External switches

For your Legacy switches, BPDUguard and Portfast should be enabled on EVERY non-switch port. Portfast ensures that TCNs are not generated when host ports go up and down.

Remember that less is more..

Use vPC from Legacy switches to ACI

Use vPC connections from legacy switches to the ACI Fabric in order to minimize the potential impacts of TCNs coming in and out of the ACI Fabric. The fewer STP logical ports, the better. Remember, STP is trying to find redundant connections and block them in order to prevent a loop. Using mechanisms such as vPC will reduce the number of STP logical ports while keeping the redundancy you need.

Looking at the diagram below, you can see the that with a little thought, you can reduce the amount of STP blocking ports, without having to sacrifice redundancy. Going from left to right, I would argue that the rightmost designs (Designs 3 and 4) are really the only valid options you should consider. With Design 3 and 4, you are using vPC to reduce the number of logical interfaces that STP will have to account for. With Design 4, assuming there are no other switching infrastructure below SW1 and SW2, there is not even the possibility of an STP loop, because there is only one STP logical path, which is forwarding. You should never use Design 1 or Design 2, simply because they do not make use of vPC at all.

VPC — **Using vPC to reduce the amount of Logical STP Ports**

Less is more – Where possible, reduce the amount external switches connected to ACI.

Limit the amount of switches below the first layer of external switches that connect to the ACI Fabric. For this recommendation, the fewer switches running STP, the better. Fewer switches mean that we have fewer chances for TCNs.

As you can see below, even if you cannot eliminate layers of switches which will ultimately hang off of your ACI Fabric, even subtle changes such as using vPC to connect from inside of the STP domain will continue to reduce the amount of STP logical ports.

Reducing L2 interfaces — **Reduce the amount of Layer-2 connected switches to the ACI Fabric**

Less is more – Prune Trunk interfaces to the Vlans needed

If a Vlan is not allowed on a trunk going to ACI, even if STP events (including TCNs) are occurring in that Vlan, the TCN can never make it to ACI.

Less is more – Only enable the Layer-2 Vlans you need on your switches

In a similar vein as the recommendation above, configure the Vlans you need on your Layer-2 switch. You do not have to configure all 4094.

K.I.S.S for migration – Use the design principles of Vlan = EPG = BD.

K.I.S.S. for migration – Keep it simple Sonny :). When it comes to migration, stick with the tried and true Vlan = EPG = BD (1:1:1). You’ve probably heard this referred to as Network Centric Mode. While there is no “mode” per se, this is a design principle that allows you to avoid headaches. Why is it important to stick with this approach?

In the diagram on the left, we’ve connected two external Vlans, Vlan-101 and Vlan-102 to the same Layer-2 Bridge Domain. The L3GW is still outside the fabric, and we’re assuming HSRP is in use. It’s important to remind everyone that the Bridge Domain is a Layer-2 flooding domain, and by default, if we were to connect two externally available Vlans together like this, then the BPDUs from multiple Vlans will mix, causing STP issues. In addition, HSRP Multicast hello messages will bump into each other, again, resulting in a mess. Now with 3.1 and the ability to limit flooding to the EPG, we have other options, but we’re trying to K.I.S.S.

The option on the right is the simplest way to achieve what you want when migrating from your Legacy environment to ACI. Each Vlan = EPG = BD (1:1:1).

LLDP / CDP Native Vlan Mismatches

Pay Attention to LLDP/CDP mismatches for the native Vlan. These will show up as critical faults in ACI, as they could lead to loops.

(optional) Outside/Inside EPGs for your Vlan

Normally, the steps listed above are more than enough to limit impact from TCNs. However, there is another design option that will allow you to limit the impact that TCNs have on ACI almost completely. I’ve listed this as optional, because very few customers have to go this route. While it is a completely supported design, it is more complicated than the standard Network-Centric Vlan=EPG=BD.

The Outside/Inside EPG design for Vlans allow us to isolate TCNs from Outside devices from the Inside devices already migrated to ACI.

In the design below we see the following:
- (1) Bridge Domain, Vlan101_BD is configured for ACI. There are two EPGs attached to this BD.
- Vlan101_Outside_EPG is for External devices (i.e., external compute, external network devices), which have not been migrated to the ACI Fabric. In this example, we would use a dot1q vlan encap of 101 for this EPG to match up with the expected Vlan 101 dot1q tag.
- Vlan101_Inside_EPG is for all of the servers (non-TCN generating devices) that have been migrated directly to the ACI Fabric. For the inside EPG, I’ve suggested a Vlan tag of Vlan 1101. This is the expected dot1q tag that ACI expects from the directly connected devices (however, this could be any Vlan other than 101).

In this design, any TCNs from the external network are constrained to EPG Vlan101_Outside_EPG. This means that this EPG is flushed whenever TCNs are received. However, devices internal to ACI on Vlan_Inside_EPG are not flushed, because the TCNs are not flooded across the BD.

Pros

Impact on servers directly connected to the ACI Fabric from TCNs is almost completely removed. I say “almost completely”, because, while it is true that we will not flush the Endpoints in the Inside EPG, connectivity to the Outside devices on the same subnet (via the Vlan101_Outside_EPG) would still be impacted as the EPs on that EPG are flushed. This could be especially problematic is the GW for Vlan 101’s subnet remained external to ACI.

Cons

An extra EPG is needed for every Vlan (note – you wouldn’t necessarily need this design for every Vlan, only the ones experiencing issues from excessive TCNs)
Contracts are needed between the Outside and Inside EPGs, even though they are on the same subnet.

5 thoughts on “STP and ACI: Intermittent packet loss due to TCNs”

jachbr says:

March 29, 2019 at 3:02 pm

Great and informative post! Hoping for some clarification on the K.I.S.S recommended diagram. The diagram on the right shows both BDs with the same name (VLAN101_BD). Shouldn’t it be VLAN101_BD and VLAN102_BD? If that’s not the case, can you explain how to create BDs with the same name? Thanks!

Loading...

1. Jody says:
  
  March 29, 2019 at 4:54 pm
  
  Thanks! That was a typo and has been fixed! Appreciate the warning!
  
  Loading...
  
vijay says:

April 3, 2019 at 2:25 am

enjoyed this doc thanks

Loading...

Marko says:

November 21, 2019 at 2:28 am

You sir are a legend, thanks for the really enjoyable writeup.

Loading...

Mohamed Badr says:

June 21, 2021 at 5:14 am

Thanks a lot for this perfect demonstration … really appreciated

Loading...