Aug 9, 20238 min read

A Gatewayloadbalancer setup with Palo Alto

Hello community - today I want to share my experience regarding the gateway load balancer (gwlb) with you. I was able to successfully understand and implement the service in a proof of concept. As the setup cannot be described as trivial IMHO this post will help you to understand what you can do and what doesn´t work.

Which problems does a GWLB solve?

The AWS Gateway Load Balancer was launched in November 2020 and can be seen as an "almost" transparent ingestions service in AWS enterprise networking operating at layer 3. Similar to service chaining technologies it enables you to reroute traffic to a target you own for packet processing and ensures that the packet forward proceeds after you´re done with your processing.

Typical use cases for packet processing are for example the implementation of a centralized firewalling service. Another use case is to keep track of the network activity in a third-party firewall - without the need to spin up vpc flow logs or the need to duplicate the network traffic. The network traffic facing your firewall can be seen as stateless the means that the firewall "just" is aware of the Packet information. However, zoning information can be enhanced by some providers in terms of endpoint visibility. This means that the firewall receives the packets encapsulated with a protocol called geneve which contains information about the calling endpoint.

GWLB vs classical centralized deployments

Before the gateway load balancer was introduced a standard way to implement centralized firewall inspection was to put the firewall into your routing path by leveraging VPN or connect attachments and enable BGP or static routing to control the traffic flow. This kind of setup has the following disadvantages:

The firewall has routing responsibilities. Since the firewall is in the middle of your network path you are responsible to actively participate in the routing. As a result, you may face a big challenge when your whole firewall cluster becomes unavailable.
The firewall is not attached to VPCs running outside of your corporate backbone. Enterprises leverage the Transit Gateway service to connect data center workloads. However, if a feature team decides to build a pure public web service you typically do not want to connect your workloads to any kind of corporate backbone.
Moving VPCs to different zones may be very difficult. In order to route traffic in an efficient way it is recommended to build summary IP blocks for your VPCs inside a zone. If you mix the summary blocks by changing the zone you also need to update the routing.
Additional cost and compute effort due to expensive VPN attachments: If you are using VPN attachments the firewall will have additional effort to decrypt and encrypt data. In addition, the throughput is limited to 1.25gbps or 5 Gbps (for connect attachments).
Source NAT: Firewalls are stateful. This means you need to ensure that both request and response are passing the same firewall instances. If you are using TGW as your route reflector you need to make use of multiple routing tables and implement SNAT on the firewall. This is necessary as typical "transitive" network paths aren´t supported by TGW. The SNAT ensures that the response packets are first sent to the firewall instance instead of your actual workload.

GWLB and Cloud-WAN

The GWLB alone will not help you manage your backend routing. In my PoC I´ve seen that the effort to maintain your transit gateway and transit gateway route tables shouldn´t be underestimated! The AWS Cloud-WAN service can fill this gap for the lazy ones :). Cloud-WAN allows you to scale your setup by describing your backend via a policy that enables a faster and more reliable way to manage different zones in your corporate network. If you just need a hand full zone and want to send all of your traffic I would recommend using the classical TransitGateway setup. For all other use cases, I would go with Cloud-WAN.

Where does GWLB lack

As always people tend to highlight all the super cool features when a new service is launched. I tend to build visibility by also showing what´s the problems or missing features.

Missing Metadata

Nowadays technology has evolved and a lot of network providers are changing the semantics of their networking/firewalling products. Instead of classical ip based decisions vendors are trying to control traffic based on identities (DNS/Users/Security Groups) rather than IP addresses. For AWS Workloads we have the concept of Security Groups (SG) which are allowing features like "My workloads will only receive connections from other workloads attached with security group xyz". Also vpc flow logs are enhancing AWS-centric information in order to build more transparency. In comparison, a gateway load balancer only enhances the GWLBE/VPCE ID. What I am definitely missing here is a way to enhance the IP information in order to build a scalable solution to control traffic...and not only with AWS, but also with other providers such as Cisco SD-WAN. This is the current outer geneve header:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Option Class = 0x0108 (AWS)|    Type = 1   |R|R|R| Len = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                      64-bit GWLBE/VPCE ID                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Option Class = 0x0108 (AWS)|    Type = 2   |R|R|R| Len = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|              64-bit Customer Visible Attachment ID            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Option Class = 0x0108 (AWS)|    Type = 3   |R|R|R| Len = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     32-bit Flow Cookie                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

All-or-nothing routing

The decision to send traffic through a GWLB Endpoint to your centralized security appliance is based on the scope of a routing table. As routing tables in AWS do not support source-based routing you have to go with an all-or-nothing approach. This means that you have to decide whether to send all your VPC traffic through the centralized security appliance or none. I could see different trends in the SD-WAN area where vendors are trying to make the routing more efficient. For example: If you have some trusted services there may be no need to inspect packets. Every packet which isn´t sent through your centralized security appliance saves you some cost and also improves your latency and resiliency due to fewer hops involved.

Understand Zoning possibilities

Your firewall administrator most probably wants to apply a zoning concept whenever you start to use the GWLB. In the end, you are sending traffic from all kinds of sources to your centralized security appliance. Without a proper understanding of the zones, the setup of your ruleset will become hard to read, maintain and troubleshoot. In the case of GWLB, the zoning can be built based on the origin in the form of GWLB/VPCE ID. This means that every Endpoint can be mapped to a zone. In general, the positioning of your endpoint defines your zoning approach. The following examples are illustrating different concepts:

Centralized Zoning

Centralized Zoning means that you place your endpoint near the security appliance - for example in the same VPC as the security appliance. This kind of setup enables you to "hide" the security appliance from your customers. There is no need to think about anything when a customer designs their backbone connected VPC. As a network provider, a platform team takes the responsibility to reroute the traffic to the GWLB on the transit gateway/cloud-wan level. Since the Endpoint is located after the packet is sent to the transit gateway you can scale your setup globally or for a given set of Transitgateway Attachments. Typical examples of centralized zoning are:

North-South communication/Internet Breakout: Catch the default route at the TGW and send all traffic over the Gatewayloadbalancer to your security appliances.
East-West communication/Internal inspection: Catch private routes (rfc1918) in order to inspect traffic passing inside your corporate network
East-West communication for SD-WAN/DX: Typically SD-WAN or DX attachments are satisfying the need to place them into separate zones. In the case of SD-WAN, you are finally able to stretch your zones (in the form of VRFs) into your DC-Firewall. Since you will not be able to put any VPC Endpoint in your on-premise network environment this is a good chance to classify your traffic coming from your onprem locations.

Decentralized Zoning

Decentralized Zoning means that you place your endpoint in the VPC where your application is hosted. From a holistic point of view, this can be called a source zone. The advantage of this solution is that the VPC owner has more visibility and control over the traffic flow. Disadvantages are that you have to manipulate your local VPC in order to guarantee the correct routing and that you may have higher costs due to a higher amount of endpoints in case you scale horizontally. In addition, you must ensure that you don´t repeat this kind of action in the backend. It´s not efficient to apply security inspection via the decentralized and centralized approach.

I recommend using this kind of zoning for the following use cases:

VPCs with no connection to your backbone
Very important VPCs where you want to clearly understand the ruleset of your firewall by applying a separate zone to it.

A sample implementation

I´ve tested my setup using the following architecture:

The architecture makes use of centralized zoning for east-west and outbound traffic and decentralized zoning for non-transit gateway attached networks. In order to understand the traffic flow the graph is absolutely needed. The setup looks pretty scary at first glance. However, after building the whole solution I have to say that once an understanding of the underlying routing is established things are getting simpler to read and troubleshoot. Let us try to think through some routing use cases:

Example netflow for East-West Traffic

Source: Testworkload_spoke1a

Destination: Testworkload_spoke2a

Traffic hits route table RTB-SPOKE-1 and is forwarded through the transit gateway attachment to the Transitgateway
The packet uses the Transitgateway Association which is connected with TGW-RTB-EUC1:GWLB. The route table forwards all traffic to our Gatewayloadbalancer VPC
All Ingress Traffic to the Gatewayloadbalancer VPC is sent to the TGW_ENI in subnet snet-ec1a-tgw. The Subnet has the routing configured to forward the traffic toward the gwlb endpoints. In our case, we are trying to reach a private IP which means the traffic is sent to the GWLBe-eastwest-1A
All traffic that hits the GWLBe will be encapsulated (geneve) and forwarded to the GWLB. In our case the GWLB distributes the traffic over one of the 2 Palo Alto Firewalls.
The Firewalls are able to apply zoning through the endpoint information provided from the Geneve header. This is a unique feature that isn´t supported by all firewall vendors!
The packets get processed by the firewall and are sent back to the GWLBe.
The routing table indicates that traffic is sent back to the TGW. In comparison to the initial ingress traffic from the Testworkload_spoke1a the packet will be processed in a different routing table which has full visibility of all networks attached to the TGW.
The packet is going to be forwarded to its destination: Testworkload_spoke2a

The traffic back follows the same pattern. The GWLB keeps track of all connections and ensures that traffic is always sent to the same firewall instance in order to enable stateful packet processing.

Internet Traffic

In comparison to internal traffic internet traffic is handled in a different way. Internet connectivity can be established in 3 ways:

Route the traffic back to the TGW and provide a separate Firewall Cluster which provides internet Access
Use the Internet link of the Palo Alto firewall
Use a local NAT GW in the Security VPC and route traffic from the GWLBe-Internet subnet to the NAT GW

My example illustrates option 2. This is again a unique feature of Palo Alto firewalls and may not be supported by other vendors. In our case, the Palo Alto firewall will receive packets from the GWLB and applies for internet access through local NAT. Please keep in mind that you may run into an internet bottleneck since all internet traffic from the GWLB is sent to the firewall.

Wrap up

The gateway load balancer is definitely a great help for enterprises and can be used in various ways. I think in combination with Cloud WAN the solution has real potential to support you in almost any use case. Even though the setup is not easy and understanding the traffic flow takes some time I am convinced that this solution scales better and is less error-prone than all other setups I´ve seen so far on AWS.

Here are some additional resources which can help if you plan to use the gateway load balancer with Palo Alto Firewalls: