ACI NetCentric 2 AppCentric Using MicroSegmentation

In this blog post, I’ll discuss a method of moving from Network Centric ACI Deployment to APP Centric ACI deployment using combination of ACI Features: MicroSegmentation/PrefferedGroups/ContractMasters.

When ACI Fabric is initially deployed, it is either deployed in a Greenfield or Brownfield method. Green Field means it’s a brand new deployment. Brown Field means that the Fabric is stood up and then the existing DC apps are moved to ACI.

Obviously Greenfield is simple. You can design the Fabric with Best Practices from Day 0. You don’t have to worry about interoprability. You could enable all the fancy features of ACI, such as using a large subnet in the BD and putting all the application tiers in different EPGs in that large subnet. Generally in a normal classical network we have learnt to keep subnets limited in size, i.e a /24 or maybe /23. Going to a /22 or smaller is not a good idea because it adversely effects compute CPU, since every compute node on the network will receive that arp traffic and will iterrupt it’s CPU to read it. ACI has the capability of turning off “Flood in BD”, which in essence means that arp is treated as unicast inside the fabric and just delivered to the real endpoint compute that needs to reply to it. Thus ARP traffic is no longer an issue in network segments with large subnets. Additionally there are options for turning on Hardware Proxy for L2 Unknown Unicast and Optimize Flooding for L3 Unknown Multicast Flooding.

If you deployed ACI in a Greenfield you have the luxory of moving endpoints from one EPG to another EPG with minimal disruption. Since all the endpoints (compute )are all in the subnets, this is not an issue. Once you move the application tier to the correct EPGs you can use contracts and go to APP Centric Mode easily.

BrownField is a different animal.  It’s more complex and requires the expertise and field knowlege of an experienced ACI practioneer (if you want minimal dowtime during migration, generally migration is done over a extended period of time vlan or groups of vlan at a time. Moving over Firewalls and Load Balancers gracefully makes it even more complex. 

For one thing, you can’t use the fancy features of ACI BD until everything has been migrated over.   You want to keep everything in Flood mode in the BD.  You also want to bring in each VLAN from the classical side into one EPG/BD. Rule of thumb for migration (to be on safe side) is 1 VLAN == I EPG/BD.  Ofcourse once all your endpoints are moved over to ACI Fabric,  the Fancy features can be turned on, but you are still in 1 BD/EPG mode per migrated Vlan.

Further going to Application Centric mode once migration is finished is not an easy task.  It requires careful planning and execution.  Generally in the classical network, the tiers of each application were not segmented based on Vlan but could reside all over.  As an example a web farm may have some web server members on vlan 10 and others on vlan 20.   Obviously no customer wants to change IP addresses of Production Servers (if they did it app centric migration would be easy, but that’s not even practical, highly disruptive and a fool’s task– works great on power point ofcourse)

Typical BD configuration during the migration would look like this:

What is Network Centric and What is Application Centric ?

In ACI, you can configure your endpoint groups (EPG) in either Network Centric Mode, or Application Centric Mode

Network Centric Mode is where Application Tiers are not clearly allocated to the different EPGs. So, ACI Contracts cannot be applied on the basis of Application Tiers

Application Centric Mode is where Application Tiers are clearly allocated, one tier per EPG. So, ACI Contracts can be applied on the basis of Application Tiers (white list model). Note that each EPG can still have workloads belonging to different IP subnets, but still belonging to the same application tier

Now that we’ve got the Basics covered, let’s talk about moving from Network Centric to Application Centric Mode.

  • By this time, you have finished your migration. All endpoints are residing on ACI Fabric.
  • You have come in on Network Centric Mode, every Vlan has an equivalent EPG/BD in ACI
  • You have done some sort of analysis and figured out your Application Flow matrix,  i.e what tier talks to what tier and on what tcp ports (or udp or others).   You don’t need to do all applications at one time.  You could do application at a time.  Ofcourse getting this ADM data is another story.  If they are commonly used applications, like Word Press, etc, etc, they are well documented but chances are that during install the Application person changed the Port numbers, etc, etc, so you can’t take it as a Bible.   If you have custom applications, that becomes really challenging to get the ADM data.   Tools like Tetration, AppDynamics or other 3rd part tools could help.  However that is beyond the scope of this writeup.

 

Flood In Encap Method:

One rather neat way of achieving this goal is to use the “Flood in Encap Method”

  • Generally, Migration from Classical DC to ACI Fabric is done with mapping of vlan to epg/BD
  • This is to prevent Vlans on Classical Side to get merged together (remember the flood domain is the BD)
  • Merging all Vlans from Classical side to ACI side on a single BD will cause problems with mac explosion and HSRP Flapping and can cause major disruption
  • From 3.1, merging of all Vlans can be done on one BD with Flood In Encap Feature enabled
  • This feature limits flooding to the vlan Encapsulation. Proxy Arp with source of BD MAC is used to send bum traffic to other encapsulations
  • This helps Migrate from Network Centric To App Centric very easily

This method can be used during Migration to prepare for App Centric mode later when you understand the application flows.

Instead of bringing in Vlans in 1 Vlan == 1 EPG/BD you put all the EPGs on the ACI Side in the same BD.  You put all your default gateways on the same BD.

Very Important:  If you are using this approach, you need to turn on “Flood in Encap” on the BD side before attempting this or else you will melt down your classical network.

 

Above Diagram shows the “Flood in Encap” setting on the BD
The above diagram is a simple depiction on how to do this sort of migration

Ofcourse there are some Caveats / Limitations to this method which you should be aware of and decide if you want to do this before you actually start on it.  To read more about this “Flood in Encap” feature and the Limitations please read the following:

Flood In Encap Feature on CCO

I’ll highlight the limitations of  “Flood in Encap” below.  (I’m ommitting the last one “pervasive gateway”, because it’s a depricated feature and hardly anyone uses it any more)

  • Flood in encapsulation does not work in ARP unicast mode.
  • Neighbor Solicitation (Proxy NS/ND) is not supported for this release.
  • Because proxy Address Resolution Protocol (ARP) is enabled implicitly, ARP traffic can go to the CPU for communication between different encapsulations.
  • To ensure even distribution to different ports to process ARP traffic, enable per-port Control Plane Policing (CoPP) for ARP with flood in encapsulation.
  • Flood in encapsulation is supported only in bridge domain in flood mode and ARP in flood mode. bridge domain spine proxy mode is not supported.
  • IPv4 L3 multicast is not supported.
  • IPv6 NS/ND proxy is not supported when flood in encapsulation is enabled. As a result, the connection between two endpoints that are under same IPv6 subnet but resident in EPGs with different encapsulation may not work.
  • VM migration to a different VLAN or VXLAN has momentary issues (60 seconds).
  • Setting up communication between VMs through a firewall, as a gateway, is not recommended because if the VM IP address changes to the gateway IP address instead of the firewall IP address, then the firewall can be bypassed.
  • Prior releases are not supported (even interoperating between prior and current releases).
  • A mixed-mode topology with older-generation Application Leaf Engine (ALE) and Application Spine Engine (ASE) is not recommended and is not supported with flood in encapsulation.
  • Enabling them together can prevent QoS priorities from being enforced.
  • Flood in encapsulation is not supported for EPG and bridge domains that are extended across Cisco ACI fabrics that are part of the same Multi-Site domain.
  • However, flood in encapsulation is still working and fully supported, and works for EPGs or bridge domains that are locally defined in Cisco ACI fabrics, independently from the fact those fabrics may be configured for Multi-Site.
  • The same considerations apply for EPGs or bridge domains that are stretched between Cisco ACI fabric and remote leaf switches that are associated to that fabric

 

Now to the topic of this writeup.

Micro Segment Method Example ( a simplistic example)

  • In this example Scenario, we have 3 Vlans that have already been migrated to ACI
  • Migration has been done in Network Centric Mode (1 vlan=1 epg/bd)
  • Vlan 10, 20, 30 happen to have web servers (belonging to the same web farm)
  • Vlan 20 also happens to have a db server
  • Vlan 30 also happens to have a app server

Requirements:

  • We need to go to APP Centric. 
  • All Servers in web farm need to be able to talk to each other
  • Web Server and App Server need to be able to talk to each other (on some given tcp port)
  • App Server and DB Server need to be able to talk to each other (on some given tcp port)
  • In other words, we want to group all WEB Servers in one Group, all APP Servers in one Group, and all DB Servers in one Group. that way we can apply the necessary contracts between the group (the definition of APP Centric).  Keep in mind that the grouping we are doing is a logical grouping, regardless of what subnets the servers belong to.  For example even though Web Servers are in different vlans (epg/BD), we still do a logical grouping on top of that, meaning we don’t need to change the IPs of the Web Serves to bring them into the same physical EPG to do this grouping.

So, logically they would look like the below.

The Below Setup is done in a lab using multiple vNics of a Pagent Router, each NIC emulating an application tier endpoint.  The reason we are using a Pagent Router instead of real servers is because we also want to do a convergence test and get a feel of how much downtime is expected during this move to APP Centric

Step 1:

  • Migrate Your Classical DC Vlans to ACI Fabric in Network Centric Mode
  • One Classical Vlan == One EPG/BD in ACI
  • Make Sure Preferred Groups are Included in All EPGs
  • Make Sure to have microSegmentation Enabled for Binding (VMM or Static)

Also, please make sure that in the above EPGs, when you do your VMM domain Binding , to attach the servers (VMs), you check the “Allow Micro-Segment” box (otherwise microSement will not work).  This part is often overlooked and people get frustrated.

Side Note:

What is Preffered Group:

Preferred Group is a feature that allows you enable on VRF and then apply to EPGs as needed (preferred Group Include).  All EPGs that are in preferred Group Include (regarless of BD memberships), can talk freely to each other.  This is a very useful feature, because it allows you to move to app centric in a granular fashion.

Prior to this feature being available folks used to turn off enforcement in the VRF level.   The problem with that is that  when you do want to go to APP Centric (say you got all your contracts applied between EPGs).  Now  when you apply the enforcement expecting your EPGs to follow the policy intent (by the contracts), it was a all or nothing approach. Meaning all EPGs in those VRFs would get affected.  That means it’s not a granular approach.  You have to do it all at one time and if you make a mistake in your contract filters you need to go back all the way. 

With Preferred Groups you could do this in a very granular fashion EPG at a time. 

Enabling Preferred Group in the VRF

Putting EPGs in “Preferred Group Include”

Results of putting all the EPGs (web, app, db) in our example:

All endpoints can reach each other…

Step 2:

  • Create a App called ContractMasters with a CM-WEB EPG. (CM = Contract Master) which ties to ContractMasters BD
  • Contract Masters BD does not need any IP Address
  • CM-WEB EPG does not need to have PG Include
  • CM-WEB EPG does not need any VMM Binding or Static Binding
  • Put a ANY/ANY Contract both Consumed and Provided in that EPG
  • Repeat steps for APP tier
  • Repeat steps for DB tier

Step 3:

  • Create MicroSegment EPGs for the application tiers. i.e. create uSeg EPGs for each BDs that EGPs belong to (BD10, 20, 30 in this example).
  • USE Preferred Group Include in these uSeg EPGs
  • Tie Each of the uSeg EPGs to the correct BD (BD10, BD20, BD30 in this example)
  • Do Not Configure the Domain binding of the uSeg EPG at this time

Step 4:

  • For Each of the uSeg EPGs, Associate with the “EPG Contract Master” defined in Step 2

Step 5:

  • For Each of the uSeg EPGs, define some attribute to pull in the appropriate endpoint.
  • In this example we are using mac address

Step 6:

  • Tie in the domain in each uSeg EPG

Results 1:

  • Endpoints have moved to the appropriate uSeg EPGs (uSeg EPGs use Private Isolated Vlans)

Results 2:

All Endpoints across EPGs can still communicate with each other even though they have been microsegmented.  That is because the uSeg EPGs also have Preferred Group Include at this point.

Step 6:

  • Apply Required Contracts on Master EPGs ( the dummy EPGs)
  • In our case, we apply WEB-APP contract between WEB and APP
  • We apply APP-DB contract between APP and DB
  • Notice in the below depiction how the uSeg EPGs (where the endpoints now reside) have inherited the corresponding contracts from their Contract Master EPGs.

Step 7:

  • You are almost done
  • Change preferred group to exclude uSeg by uSeg.  In effect you are now relying on the contracts to kick in .
  • Test as you go to make sure your intended policy is met
  • If you have issues, your contracts are probably not correct, put back the Preferred group Include and analyze
  • Adjust your Contract Filters accordingly and Try again

Convergence Testing Results:

How Much Traffic Loss is expected ?

In this test we’ve set up 2 running traffic streams (udp)

Stream 1: from WEB to APP

Stream 2: from APPP to DB

We enabled the streams after the initial step of migration was completed in Network Centric Mode (after step 1)

Now we complete Step 2 through 7 and measure for traffic Loss

Below you see that no packet drops are seen through this entire migration from Net Centric to App Centric !!!

Conclusion:

  • If done correctly,  no traffic loss is expected
  • Still Recommended to do this sort of change during maintenance window

Things to Consider:

  • This method shows how to go from Network Centric to APP Centric using Micorosegmentation
  • TCAM utilization still needs to be analyzed for contracts
  • Endpoints should be spread out (distributed)on leaves so as not to exhaust TCAM on any particular leaf
  • Other features such as contract compression, should be considered
  • Policy Enforcement is by default on the Ingress Leaf.  This helps in distribution of TCAM resources.  Think about L3 Out Contracts.  Many EPGs will have contracts with the L3Out EPG (L3OutInstP).    Since the user EPGs are on the User Leafs and policy enforcement is on the Ingress (by default), the border leaves don’t run out of tcam.  Don’t change the Policy enforcement to egress unless you have a special use case ( like multidomain integration ACI to SDA integration)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.