ACI/Cloud Extension Usage Primer (Azure) – Simple Service Graph with Azure Network Load Balancer & vNET Peering

This is a continuation of the previous article of Deploying Simple Service Graph with Azure ALB with vNET Peering.   To get the full understanding of this, please first follow through the Azure vNET Peering article and the   Simple Service Graph with Azure ALB with vNET Peering

In this exercise, we will take where we previously left off with the ALB deployment and replace the ALB service Graph with a NLB. This will highlight the differences in deployment of ALB and NLB.

Note that we still will not be using the overlay-2 Hub vNET in this exercise.  That will come in the following article.   Going through these 2 articles will prepare you with the basics of deploying Service Graphs with ALB and NLB.  Understanding these basics will make it much easier to understand how to deploy Multinode Service Graphs utilizing overlay-2, how to look at UDRs (User Defined Routes) and an appreciation of that method and the many use cases for that.

As a recap, these are the articles on Azure Service Graphs that I had planned to write.

The 3 articles I plan on writing on this are:

Let’s again Recap the differences between Azure ALB and Azure NLB:

Azure Application Gateways:

This is also know as Azure Application Load Balancers or ALB.  This is basically a specalized Layer 7 Load Balancer and balances web traffic namely http and https traffic.   It can also do URL filtering, redirection and forwarding based on user defined rules. 

ALBs can be deployed in 2 ways:

  • Interent-facing:  Also known as North/South
  • Internal-facing:  Also known a East West

In the ACI implementation model,  the ALB is deployed associating the ALB with a  Service Graph and then tying the Service Graph to a ACI contract.  Servers in provider EPG are dynamically added to the backend pool.

Azure Network Load Balancers:

This is also known as the Azure NLB. This is a Layer 4 device and distributes inbound packets based on L4 ports.   In other word this can load balance on all tcp/udp based ports.   NLBs can also be deployed Internet facing or Internal facing.

There are 2 modes of operation for NLB:

  • Forward Mode: Here you specifically list what port you want to forward to the backend server farms.
  • HA Port Mode: This mode will forward all TCP/UDP ports to the backend server farms.

In the ACI implementation model,  the NLB is deployed associating the NLB with a  Service Graph and then tying the Service Graph to a ACI contract. Servers in provider EPG are dynamically added to the backend pool.

Let’s also recap the 3 rules I pointed out that you need to follow when deploying  Service Graphs in Azure.

  1. If using the HUB vNET (overlay-2) to host the service devices, the Service Graph devices should always be built on the Provider vNET Region where cloud CSRs are present. In our first example we did not use overlay-2 to host ALB.  In this example also we will not be using overlay-2 to host the NLB.  So, technically we could place the NLB in either the West US Region or the East US Region (regardless of whether the region has CSRs or not) as long as the vNET in that region is the provider side of the contract (the service device still has to be on the provider side).  However to keep the flow consistent for this example, we will build the NLB in the eastus region in the Tenant  (VFR-APP) vNET itself.  Remember our provider was EPG-APP which is in VRF-APP in the East US Region. 
  2. The service devices should not use the ip subnet used by the workloads itself.  Bring up a new subnet for the service devices in a different unique subnet in the same vNET CIDR.
  3. If a VM in the provider region (having service graphs), also has Internet connectivity through Azure IGW, then those VMs cannot have Azure Basic Public IP.  They would have to have Azure Standard SKU IPs.

Where we left off in the last Azure ALB Build:

We had successfully built a Azure ALB device and attached it to a Service Graph which we then associated with EW-C1 (contract).   The WEB EPG was the consumer and the APP EPG was the Provider.   We also went through multiple screenshots where we had gone step by step on looking at the details of the implementation from cAPIC, Azure Console and debugs on cloud CSRs, so we could get an understanding on what happens under the hood.

One more important item to bring up based rule (2) above, we had to make a separate subnet (in the same CIDR of provider), to host the ALB.  When deploying NLB of-course the same rule applies.  Since we had already followed the rules before and we are just replacing the ALB with NLB, we are already compliant with all those rules.   There is one item about rule # 2, that I will point out in a few moments as you follow through.

The Figure below shows where we had left off when we finished with the ALB deployment.

Figure 1

Now, all we need to do is replace the ALB with NLB.  Simple enough.  the only thing that you have to note here is that I bring up a brand new subnet for NLB.   Remember (from figure above), the ALB subnet was 10.80.254.0/24.  For NLB, I’m bringing up 10.80.255.0/24.    The question you will probably ask is why ?   Why can’t I use the same subnet for hosting NLB that I used for hosting ALB.   The answer is that it’s the way I’m doing the implementation.   If I wanted to delete the ALB device all the way and create a NLB device, then I could put it in that subnet.  However I  have chosen to not delete the ALB device (in case I want to use it later for something).  

You cannot host ALB and NLB in the same subnet.  For that reason, I brought up a brand new subnet for NLB. If you tried to put the NLB and ALB in the same subnet you would see the following error message.

Figure 2

Below is a diagram on what we want to achieve in this follow through exercise.

Figure 3

The logical representation of that is shown below.   Please note that in this case we don’t need to configure any redirects on the NLB device. 

Also, note that the figure below is showing the components of the Hub vNET (overlay-1 and overlay-2) just for completeness because we are using vNET Peering for traffic to go from consumer region to provider region and vice versa (using the built in NLB in overlay-1 and the cloud CSRs for routing).  However this Service Graph could very well have been done while using VGW peering instead of vNET peering.  In this sort of scenario, there is no requirement that the provider region needs to have cloud CSRs.  You only need to pay attention to that rule if the service devices were being placed in the Hub-VNET (overlay-2).

However please keep in mind the benefits of using vNET Peering instead of VGW.   VGW peering is IPsec over the regular Internet.  This implies that you will be subject to limited bandwidth (around 1.25 Gbps) and will incur higher unpredictable latency.    vNET Peering on the other hand is static peering and it’s all using the Azure backbone.  Your packets don’t traverse the internet to go from region to region.  This implies you get the benefit of much higher bandwidth (around 20 Gbps) and much lower and predictable latency when using vNET Peering.

Figure 4

Let’s also look at what the packet flow should look like when we finish.  We will do packet captures / debugs to verify this later during the exercise.  

Figure 5

Please study the figure above carefully and you will notice that there are differences from the NLB packet flow compared to the ALB packet flow.

Below is a diagram of the ALB packet flow that we studied in the ALB writeup

Figure 6

Let me point out the differences with the aid of a diagram.   The orange boxes with the dashed outline highlight the differences.  We will do packet captures after deployment of NLB to show this.

  • You will notice that in the NLB case on the Provider Side (APP-EPG), the packet source is not changed to the VIP IP.  The original source IP of Consumer (WEB-VM) is preserved.   That is because we are not doing SNAT on the NLB (it’s not like Azure NLB cannot do that, it’s just that cAPIC does not support that with Azure NLB).  However there is no loss of functionality due to this.
  • Based on the packet received,  the packet is sent back from the Provider (WEB-VM) with the destination of the consumer IP (not the IP of the VIP as in the ALB case).
  • When the packet is received by the consumer (WEB-VM), you will notice that magically the source IP is changed to the NLB VIP IP !    The Azure Network does that for you !!!

Please refer to the diagram below to highlight differences of packet flow between NLB and ALB

Figure 7

Now that we’ve spoken about this enough, let’s start on the implementation

First let’s go ahead and remove the ALB that was tied to the EW-C1 (East West Contract 1)

Note:  As mentioned above, I am simply removing the Service Graph from the EW-C1 (East-West contract).   This will raise some faults in the cAPIC. If you did not want to see any faults raised, you should go and delete the old service graph and also the ALB device

To do this simply go to the template and remove the associated Service Graph from the EW-Contract.   Steps are listed below in the figure

Figure 8

After removing the Service Graph from the EW-C1, let’s verify that we can still ping from WEB-VM to APP-VM

Figure 9

Now, let’s go ahead and create a new subnet in the same CIDR for NLB

Figure 10

To create the new CIDR, go to MSO, Site Local Template and on the Provider VRF (VRF-APP), create the new CIDR of 10.80.255.0/24 as shown below

Figure 11

Now, let’s create the new NLB device.   To do this go to cAPIC to Application Management/Services and click on Actions/Create Device

Figure 12

Now, fill in the NLB information as shown below:

  1. Name the NLB
  2. Choose the Provider Tenant
  3. Select NLB from list of devices
  4. Select Internal (since this is for East West traffic)
  5. Use Static IP for VIP
  6. Chose:
    1. Region (eastus in my case)
    2. Provider VRF
    3. LB subnet (the one you created 10.80.255.0/24 in my case)
    4. VIP Static IP from that subnet (10.80.255.10 in my case)
Figure 13

Now, go to MSO and create the new Service Graph that will use this NLB device.  Follow the instructions in the figure below.

Figure 14

Now, on MSO,  go to the Site Local instantiation of the template and tie in the NLB to the Service Graph

Figure 15

Now, on MSO, go back to the main template and tie in that Service Graph to the contact EW-C1

Figure 16

Now, it’s time to create the listener rules on the NLB.  To do this on the Site Local Temaplate, click on the contract EW-C1 and then click on NLB Properties

Figure 17

Click on “Add Listener”

Figure 18

Put in the Listener Name (listener1) in my case.  We will aslo use the HA Mode for this NLB.  Remember from the discussion earlier in this writeup HA Mode does a 1:1 translation for every packet.   We have chosen not to use the Forward mode.

Figure 19

On the Rule setting change from default Forward Mode to HA Mode

Figure 20
  • Choose the Provide EPG (EPG-APP) in this example.
  • I’ve also decided to use port 22 for the health-check, since sshd server is running by default on those Azure Ubuntu VMs.
Figure 21

We are done with the Implementation !

Time to Test and do some packet captures and debugs

For testing purposes,  we will make it easier for us and change the ssh on the APP-VMs from key based to password based.   If you wish you could scp in your private key to the WEB-VM instead.  Please do this for both the APP-VMs.  in the diagram below I only show APP-VM1,

Figure 22

Now, ssh to the WEB-VM and from there ssh to the VIP IP of the NLB (10.80.255.10) in my case.   We land up on APP-VM2

Figure 23

exit from APP-VM2 and try your ssh again to the NLB VIP of 10.80.255.10.  You will notice that you will get a ssh identification error.  This is expected because this time the NLB decided to forward to APP-VM1 and your identity for 10.80.255.10 was already saved for APP-VM2 in your WEB-VM known-hosts file.

On the WEB-VM, remove the identity of the VIP IP (10.80.255.10).  You can use  ssh-keygen -R 10.80.255.10 to do this

Figure 24

Try your ssh again (from WEB-VM)  to VIP IP of 10.80.255.10.  This time you will land up in APP-VM1.   You have confirmed that NLB is working !!!   (you may have to do the ssh few times till you land up on the APP-VM1)

Figure 25

Ofcourse if you desire (for testing purposes), you could turn off ssh strict checking and you won’t have to bother then about these identity errors.  (the warning will still show up, but it will let you in regardless).   This is not good practice !!!    but for testing this sort of scenario in a POC it makes sense.

To do this just create a file in  “~/.ssh/config” with the contents shown in the figure below.

Figure 26

One more NLB related test.  Since we configured the NLB in HA Mode (not forward mode),  the NLB should forward all packets to VIP (from WEB-VM) on a 1:1 basis for every tcp/udp port.   Let’s do a real quick test to see if that’s working.   Let’s choose an arbitrary tcp port for instance tcp 1234.  We’ll use netcat to test that out.

Figure 26b

First let’s install what’s needed.  This should take a couple of minutes at most:

Install Items:

  • On both APP Servers Install netcat:
    • sudo apt-get install -y netcat
  • On WEB-VM install telnet:
    • sudo apt-get install -y telnet-client

Procedure to Test:

  • On both APP-VMs: 
    • netcat –l 1234
  • On Web-VM:
    • telnet 10.80.255.10 1234
Figure 26c

Test Results:

  • on the 1st try the echo came to APP-VM2
  • on the 2nd try the echo came to APP-VM1

Figure below shows the test results:

Figure 26d

Next question.  Are the packets still going through the infra (overlay-1) NLB and the CSRs.   We are using vNET peering, so YES.

The debug on the CSRs will prove this point.

Figure 26a
Figure 27

Time to do some basic checks from Azure Console

Figure 28

On Azure Console, go look at the Load Balancer (NLB) that you created (in tenant subscription) and:

  • check out the Front End VIP IP
  • check out the Backend Pool Members
Figure 29

Check out the Metric on the NLB for Data Path.  It’s at 100%

Figure 30

Check out the Health Probe Stat results on the NLB.  It’s at 100%

Figure 31

Let’s now do some tcpdump captures to check for packet flow on NLB.  You could also use Network Watcher and pull in captures on your local machine and decode with wireshark.  That method was illustrated in the ALB writeup

Figure 32
Figure 33

From the captures, we can confirm that the flow of NLB packets are different from the ALB case (this was discussed earlier in the writeup)

Figure 34

Before ending this writeup, there is one last item that I want to point out to you.  If you recall we configured the NLB to send out health probes to the backend pool members on port 22.   It is important to remember that these probes are always sent out by the Azure NLB with a source IP of 168.63.129.16.   This is documented in Azure documentation – Load Balancer health probes .

This is important to remember because when you are configuring a Firewall behind a NLB (which we will do in the next writeup), you will have to remember to configure the Firewall to allow the probes from 168.63.129.16 come in.  If it is blocked by the firewall then the NLB will deem the FW to be in unhealthy state and not forward packets to it.

Let’s verify this by doing a tcpdump on APP-VM1 by filtering on source IP of 168.63.129.16

Figure 35

As you can see in Figure below we do see the probes coming in on APP-VM1 from source IP of 168.63.129.16

sudo tcpdump -i eth0 -n -s 150 -vv host 168.63.129.16

Figure 36

References:


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.