I'm in I think a somewhat similar situation where we have a UTM in an AWS VPC and want to have separate private subnets (prob. /28) for each of a number of customers.
The additional wrinkle is that I want to use the new UTM AWS image that supports HA using a warm-failover. In the event that I *do* have multiple ENIs, is there any way to make sure those get transferred over to the failover instance when the primary instance goes down? I haven't tested extensively yet, but it appears that the failover just swaps the single elastic IP of the primary over to the secondary. How do the various AWS routing tables get modified so that traffic from the various subnets behind the UTM get routed properly in that case?
Hi lprikockis,
You raise two questions in your message, and I'll try to answer them both.
First,regarding multiple ENIs. We only support a single interface for HA and autoscaling. There's no need for more, and worse, using more makes for a more fragile AWS environment. UTM doesn't need an interface in every subnet for it to act as the outbound gateway for each subnet. AWS strongly prefers your instances to use the default gateway IP within each subnet, which allows the AWS routing table to work effectively. If you want a stable and reproducible environment, that's the best way to do it.
Second, you mentioned outbound routing failover. Currently, HA and autoscaling are only able to failover inbound traffic. We are working on an outbound gateway feature at the moment, which will allow you to balance outbound traffic across an autoscaling group of firewalls, with full failover redundancy. Keep an eye out for this, later this summer.
Thanks for the detailed answers Alan!
I agree that a single ENI should be fine for most cases. So are you saying that using the UTM as the default gateway for a number of subnets is *not* the most "stable and reproducible" way to architect things? In many ways, it would be simpler to do that, but we have requirements to be able to monitor all traffic in and out of the VPC, and having it all enter and leave through a single point that we control greatly simplifies this task.
Of course it also introduces a single point of failure, which is why the HA and autoscaling options are of such interest to me.
Glad to hear the full outbound HA is something in the pipeline. I guess in the near-term, it's not tremendously difficult for us to either manually or via some external script update the AWS routing tables in the case of a failover event.
Thanks for the detailed answers Alan!
I agree that a single ENI should be fine for most cases. So are you saying that using the UTM as the default gateway for a number of subnets is *not* the most "stable and reproducible" way to architect things? In many ways, it would be simpler to do that, but we have requirements to be able to monitor all traffic in and out of the VPC, and having it all enter and leave through a single point that we control greatly simplifies this task.
Of course it also introduces a single point of failure, which is why the HA and autoscaling options are of such interest to me.
Glad to hear the full outbound HA is something in the pipeline. I guess in the near-term, it's not tremendously difficult for us to either manually or via some external script update the AWS routing tables in the case of a failover event.
lprikockis said:So are you saying that using the UTM as the default gateway for a number of subnets is *not* the most "stable and reproducible" way to architect things? In many ways, it would be simpler to do that, but we have requirements to be able to monitor all traffic in and out of the VPC, and having it all enter and leave through a single point that we control greatly simplifies this task.
This can be a confusing topic, so I'll try to elaborate a bit more. AWS expects instances to receive IP address details via DHCP, including the gateway they will use. That will always be the first IP in the subnet the AMI is in. This isn't a real host, it's just a point that AWS can seamlessly insert itself into the routing process, and apply the routes you create in your AWS console, without having to directly control routing tables on each AMI, or require them to use more advanced routing protocols.
Now, if you set your firewall's interface to be the default gateway for a subnet in the AWS routing table applied to your subnet, then all traffic sent from any host in that subnet will effectively use your firewall, as it tries to reach the internet - regardless of whether the firewall is in the same subnet as that host or not. In fact, a common design, is to put your firewall in a "public" subnet, that has access to an EGW, then place all of your hosts in "private" subnets, that cannot directly access the internet. This allows the firewall to use a single interface, and act as the internet gateway, for all of your internal hosts.
One of the primary limitations with using multiple NICs, is that all of your subnets must then sit in the same AZ. This is what makes your setup more fragile, when using a single host with multiple nics, as the direct gateway for your subnets. Amazon itself can't provide redundancy for any of it, if you're bound entirely to a single availability zone. As soon as you move to a model like I described, you're free to setup redundancy o your servers across multiple AZs, and you're also free from the need to use multiple NICs, but you can still use the firewall as your outbound gateway for all of your instances.
Of course, until we offer outbound gateway redundancy support, acting as an outbound gateway can still introduce a single point of failure, and you would need to manually point your routes to the HA replacement node in the event of a failure. This this could be automated, possibly via Lambda, or some other automation method, but this isn't something sophos provides just yet. We're working on it though!