Workload Mobility with Cloud Network Controller on AWS

Table of Contents:

  1. Introduction
  2. How to implement Active/Standby VPCs
  3. Workflow Needed to accomplish Active to Standby Switchover
  4. References

Introduction

For Critical Applications running in AWS you would normally place your workload in multiple zones and even multiple regions. Depending on the application, you could use Application Load Balancers, Network Load Balancers or Global Accelerators to front end the application.

If you wanted an Active/Standby scenario for applications running on some VPCs you could also achieve this quite easily for CNC fabric in AWS Cloud. The idea is that you would have some standby application tier that would be pre-configured with the same IP Subnet and residing on some VPC in a different region. If for whatever reason, a problem/major failure occurs with the primary Region/Application then the path for the affected application tier can be switched over to the standby VPC region. Ofcourse realize that you would also need some mechanism to keep the Acitive / Standby applications synced and updated.

📙 The procedure should be exactly the same for all supported cloud providers (AWS/Azure/GCP) since cisco supports brownfield integration in AWS and Azure already (GCP support for brownfield envrionment is coming in the following release). In a Multicloud Environment, using this procedure, you could switchover from an Active VPC in one cloud provider (for example AWS) to a standby VPC in a different cloud provider (for example Azure or GCP).

How to implement Active/Standby VPCs

Active/Stanby VPCs can be achieved in AWS Cloud fabric by utilizing the brownfield integration metod which was described in a previous writeup.

Variations of this method can be utilized as needed. One way to do this would be to create the standby VPC in AWS from Console. Then deploy the standby application tier there. The application tier/VPCs would have the same IP Prefixes as the Active Application/VPC tier.

As an example, below figure depicts a Manually created Standby VPC which has the same IP Prefixes as the Active one. The topology for this can be easily built using NDO (or just CNC if you had only 1 Site in the AWS Cloud Fabric). During Normal Conditions, VM1 will talk to VM2 of the us-east-2 region. Note that VM1 could be in a different site, such as onPrem or other cloud site (AWS, Azure, GCP).

file
file
Figure 1: Topology of Active / Standby before failure

Please observe from above figure that:

  • in the AWS Infra Account a TGW will be spun up which will peer with the TGW of the home region
  • The Infra Region TGWs will be shared with the Tenant Account for the corresponding regions
  • TGW attachments will be built from Tenant VPCs to the shared TGWs in the corresponding regions.
  • The Standby VPC is configured, application tier is deployed there (in this case just a VM). Standby VPC has the same IP Prefixes, in fact in this case the Standby VPC VM has even the same IP address as the Active VPC VM (which may be useful if the application logic uses IP instead of DNS).
  • The tags used for the VMs for Active VM and Standby VMs are the same. This will ensure that the VM will have the proper security group configured after switchover from Active to Standby.
  • During normal condition the application tier in the standby VPC will not be used.

It’s obvious, but I want to point out that all the configurations above, including Route Tables and Security Groups will all be confiugred automatically by NDO/CNC.

Once there is a major failure and the decision is made to switch over the application tier to the standby VPC, all you have to do is follow a workflow to switch the connectivity from the Active VPC to Standby VPC. After switchover the topology will look like shown below.

file
file
Figure 2: Topology after switching over from Active VPC to Standby VPC

Observe the changes that will be made to the topology after switchover from Primary VPC to Standby VPC:

  • In the Infra account a new TGW will be spun up and TGW peering will be implemented to the home region VPC and also to the Active Region VPC (depending if there are still other NDO/CNC created VPCs in the Active Region VPC).
  • The TGW will be shared to the Tenant Account to the corresponding Region.
  • The Security Groups needed for the Standby VM will automatically be configured
  • Since you still have the Primary Region VM tied to the Primary Region subnet, that will remain intact. You could also switch back to the primary VPC again if needed.

⚠️ After switchover the total time for communication to start back up showed to be around 5 to 7 minutes. For that reason, in a production environment, switchover should be done only when needed.

Workflow Needed to accomplish Active to Standby Switchover

It is important that the below workflow is followed for Active to Standby Switchover:

  1. From CNC delete the CTX profile for the Active VPC (VRF). Since the original VM2 in original Active VPC is still tied to the subnets, the VPC/Subnets will remain in AWS.
  2. Now, use the brownfield integration method to import the Standby (brownfield) VPC into the same VRF. In the example VRF-2.
    📙Since the CTX profile was deleted and replaced by the imported VPC, the new CTX profile now points to the region for the Standby VPC. However, recall that we did not modify the EPGs and the associated contracts. The Standby VPC’s Route table will be automatically merged into a new route table and associated with the subnet of standby VPC with appropriate entries. The same concept applies for the security groups.
  3. If using NDO, go to NDO and import the VRF ( in the example ase VRF-2) from CNC back to NDO and hit deploy. This will replace the old Region and pull in the new region into NDO.
  4. If you wanted to switch back to the original VPC at some time, you could delete the CTX profile from CNC and create the CTX profile (in original region with original CIDRs and subnets) again for the original Active region. Follow this by importing the VRF back again from NDO (if using NDO)

Below Figure shows the results of curl from VM1 to VM2 before and after switchover.
file
Figure 3: Curl to application tier before and after switchover

⚠️ After switchover the total time for communication to start back up showed to be around 5 to 7 minutes. For that reason, in a production environment, switchover should be done only when needed.

References

Importing Existing Brownfield AWS Cloud VPCs Into Cisco Cloud Fabric


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.