UPDATE: Turns out this is a known issue during the 1.5 > 5.1 VSM upgrade and a fix should be released in an upcoming patch.
That's about the shortest title I could think of to be descriptive of this issue. TLDR is that NAT rules on vShield Edge appliances appear to be causing unexpected behavior on VPN traffic
after a vCloud upgrade from 1.5 to 5.1.
Background: We recently upgraded from 1.5 to 5.1. For most of our vDCs, we simply have a
single vSE/Routed network that connects a private subnet to a "WAN" network and
pulls a public IP from a pool. We forward (NAT) and allow (firewall) selected
ports (e.g. 3389 for RDP) to virtual machines. Most of these networks also have
a site-to-site VPN tunnel with a physical firewall across the internet. After
the upgrade, we went and converted our rules to match on original IP and then
enabled "multiple interfaces" - effectively taking them out of compatibility
mode. Everything looked good (even for the vSE devices still in compatibility
mode)
Issue: We first noticed this when a client reported that
they could not access a virtual machine via RDP using it's internal (VSE
protected) IP across a VPN tunnel, but could access the VM via RDP using it's
public hostname/IP address. We allow all traffic across the VPN (firewall has an
any:any rule for VPN traffic). When we logged in to troubleshoot (simply
thinking the VPN was down), we found that we could connect to any port on the
remote VM across the VPN tunnel
except 3389. I could ping from the
local subnet to the troubled VM on the vApp network with no problem. I could
connect to other ports that were open on the remote VM with no problem. I could
not connect to 3389 across the VPN.
We thought it might be isolated, but found the issue on every VSE we have: If
there existed a DNAT rule to translate inbound traffic for a particular port,
that port would be unresponsive when traffic traversed the VPN tunnel destined
for the target of the DNAT rule.
While vCloud Director doesn't show anything strange in the firewall section of vSE configuration, if you log in to vShield Manager and look at the firewall rules there, a "Deny"
rule with the
private/internal/translated IP is added for any NAT rule
that exists:
This, I'm assuming, is for security
reasons during the upgrade but it does not show up in vCloud Director (thus our
confusion). After taking our appliances out of compatibility mode post-upgrade,
the rules were still there.
Solution: After the vSE is out of compatibility mode (see pg. 49 of the
vCD 5.1 Install Guide), re-apply the service configuration (Right-Click vShield Edge Appliance in vCloud Director and select "Re-Apply Service Configuration"). You can also re-deploy the appliance or add an arbitrary rule to the firewall list - both appear to have the same effect.