Simple Troubleshooting Steps for Cloud ACI/AWS To OnPrem ACI For End Point Reachability Issues

Table of Contents:

  1. Introduction
  2. 2 Items to check, Control Plane & Data Plane
    2.a.Verifying Control Plane
    2.b. Verifying Data Plane
  3. References

Introduction

In this writeup I will go though some very simple troubleshooting steps that you can follow if you are having issues between ACI/AWS fabric and onPrem Fabric Tenant endpoint reachability.

Regardless of whether you are using Internet Connectivity for the connection or DX Connectivity these troubleshooing steps should help. The assumption is that you have already brought up cAPIC Fabric on AWS, done the necessary configurations from NDO, brought up Tenants and objects and you cannot reach from AWS endpoint (EC2) to onPrem endpoint (let’s say a VM).

2 Items to check, Control Plane & Data Plane

The important thing to note for troubleshooting, is that you have to look at 2 aspects:

  • Control Plane
  • Data Plane

The Control Plane is BGP EVPN between ACI OnPrem Spine and C8Kv routers.

The Data Plane is VXLAN

If any of them are broken you will have reachability problems.

Verifying Control Plane

a) check onprem leaf route table
Verify that you can see cloud cidr with next-hop is CSR Gig4 interface
show ip route vrf userVRF
b) check cloud vpc egress route tables (on AWS Tenant Account)
onprem subnet -> tgw
c) check tgw route table -> if they learn the route 0/0 through CSR (on AWS Infra Account)
d) check NSG rule
e) check zoning rule on onPrem

Below are screenshots from my lab on the items mentioned above.

a) Verify that the Tenant VRF shows the Cloud CIDR being advertised with NH of Gig4 of C8KV

C8KV has ip of 10.22.0.52
file
Figure 1: Looking at G4 IP of C8KV

Verifying that my Tenant VRF shows the VPC CIDR of 10.180.1.0/24 with NH of 10.22.0.52
file
Figure 2: Verifying on ACI Leaf that the next hop shows VPC CIDR with correct next hop

b) On Tenant AWS Account, Check the Route Table of the EC2 Subnet to verify that the onPrem subnet shows up with NH of tgw

file
Figure 3: Route Table of EC2 shows onPrem prefix with next hop of TGW

c) Checking TGW Route Table on Infra AWS Account to verify 0/0 is learnt through Connect attachment
file
Figure 4: Checking TGW route Table on Infra account for 0/0

Also Verify that the Connect Attachment connects to CSR G2

file
Figure 5: TGW Connect Attachment peers with C8KV Gig 2

d) check NSG rules
Make sure your ingress and egress rules are configued correctly for the EC2, (based on your contract configurations from NDO)

file
Figure 6: Checking NSG rules

e) check zoning rule on onPrem
Please see the below writeup to see how to verify the zoning rules (Normally this should not be a problem unless it’s misconfigured)
https://unofficialaciguide.com/2021/01/06/understanding-aci-tcam-utilization-optimization/

Verifying Data Plane

If the Control Plane looks good, but your reachability is still not working, chances are that the problem is in the Data Plane.

Below is the quick list of items to verify:

a. try to ping CSR g4 from onprem leaf
b. from CSR: ping BGP EVPN RID of spine
c. from CSR: show ip route
-> you should see onprem Infra Pool
d. Show nve peers on the Cat8kv. All these should be up. (these are the vxlan tunnels)
e. ping from onPrem and capture packets on the C8KV to verify if packets are coming in by using packet trace

Below are screenshots from my lab on the items mentioned above.

a) try to ping CSR g4 from onprem leaf

Looking for G4 IP on C8KV
file
Figure 7: Looking at Gig4 IP on C8KV

Pinging G4 IP from Leaf:

file
Figure 8: Pinging G4 IP from Leaf

b) from CSR: ping BGP EVPN RID of spine

First Get BGP RID of spine

file
Figure 9: Obtaining BGP RID of Spine

Pinging the RID of Spine from C8KV
file
Figure10: Pinging the BGP RID of Spine from C8KV

c) Checking for TEP Pool in Routing table of C8KV

Quick way to look at TEP Pool:
on onPrem APIC do:

cat /data/data_admin/sam_exported.config

file
Figure 11: Determining the TEP Pool configured on onPrem APIC

You can also check that the tep pools are used for the Destination TEPs from the leaf

show isis dtep vrf overlay-1

file
Figure 12: show isis dtep vrf overlay-1

Now check that those TEP routes shows from cloud C8KV

show ip route | i 10.7   # (notice its coming from Tu6 in this case)

file
Figure 13: TEP Pool Prefixes show on routing table of C8KV

d) show NVE Peers (to veify that VXLAN tunnels have established)
file
Figure 14: looking at NVE Peers

e) ping from onPrem and capture packets on the C8KV to verify if packets are coming in

I show using this at: https://unofficialaciguide.com/2020/07/17/aci-cloud-extension-usage-primer-azure-a-practical-guide-to-using-azure-vnet-peering-with-cloud-aci/
Please see right after Figure 41 in the writeup link above..

The commands on the CSRs to be used are as follows:

    • debug platform packet-trace packet 128
    • debug platform condition ipv4 10.70.5.4/32 both
    • debug platform condition start
    • debug platform packet-trace packet 128
 To View:
    • show platform packet-trace statistics
    • show platform packet-trace summary
 To Stop:
    • debug platform condition stop
 Other Useful Commands:
    • show platform packet-trace code                   # Show packet-trace drop, inject or punt codes
    • show platform packet-trace configuration     #Show packet-trace debug configuration
    • show platform packet-trace packet     #Per packet details for traced packets
    • show platform packet-trace statistics     #Statistics for packets traced and packet disposition
    • show platform packet-trace summary     #Per packet summary information for traced packets
    • clear platform packet-trace configuration
    • clear platform packet-trace statistics
References

Cloud ACI Documentation


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.