OpenVPN DCO: Traffic to RA clients above the first available tunnel IP is incorrectly routed
Traffic from hosts in the local subnet, for example a server on LAN, can only reach the first assign RA client when DCO is enabled. This includes reply traffic.
Traffic to subsequent clients is incorrectly routed. For example:
steve@steve-Standard-PC-i440FX-PIIX-1996:~$ ping 10.1.9.2 PING 10.1.9.2 (10.1.9.2) 56(84) bytes of data. 64 bytes from 10.1.9.2: icmp_seq=1 ttl=63 time=0.482 ms ^C --- 10.1.9.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.482/0.482/0.482/0.000 ms steve@steve-Standard-PC-i440FX-PIIX-1996:~$ ping 10.1.9.3 PING 10.1.9.3 (10.1.9.3) 56(84) bytes of data. From 10.1.9.1 icmp_seq=1 Redirect Host(New nexthop: 0.0.0.0) From 10.1.9.1 icmp_seq=1 Redirect Host(New nexthop: 0.0.0.0) From 10.1.9.1 icmp_seq=1 Redirect Host(New nexthop: 0.0.0.0)
More details to follow.
Updated by Steve Wheeler 2 months ago
Nothing special is required to recreate this beyond enabling DCO:
Install 22.09 clean. Tested: 22.09.a.20220725.0600
Configure a remote access server using the OpenVPN wizard. Accept the default values. I configured it to pass only the LAN subnet as a route.
Create at least 2 valid clients.
The resulting openvpn config looks like:
dev ovpns1 verb 1 dev-type tun dev-node /dev/tun1 writepid /var/run/openvpn_server1.pid #user nobody #group nobody script-security 3 daemon keepalive 10 60 ping-timer-rem persist-tun persist-key proto udp4 auth SHA256 up /usr/local/sbin/ovpn-linkup down /usr/local/sbin/ovpn-linkdown client-connect /usr/local/sbin/openvpn.attributes.sh client-disconnect /usr/local/sbin/openvpn.attributes.sh local 172.21.16.160 tls-server server 10.160.9.0 255.255.255.0 client-config-dir /var/etc/openvpn/server1/csc username-as-common-name plugin /usr/local/lib/openvpn/plugins/openvpn-plugin-auth-script.so /usr/local/sbin/ovpn_auth_verify_async user TG9jYWwgRGF0YWJhc2U= false server1 1194 tls-verify "/usr/local/sbin/ovpn_auth_verify tls 'OpenVPN_Server_5' 1" lport 1194 management /var/etc/openvpn/server1/sock unix push "route 192.168.160.0 255.255.255.0" capath /var/etc/openvpn/server1/ca cert /var/etc/openvpn/server1/cert key /var/etc/openvpn/server1/key dh /etc/dh-parameters.2048 tls-auth /var/etc/openvpn/server1/tls-auth 0 data-ciphers AES-256-GCM data-ciphers-fallback AES-256-GCM allow-compression no persist-remote-ip float topology subnet explicit-exit-notify 1
Connect two or more clients. Only the client receiving the first tunnel subnet IP (10.160.9.2 here) will be able to reach hosts in the LAN subnet.
Other clients will time out trying to reach, for example, 192.168.160.10 but will be able to reach the pfSense LAN IP at 192.168.160.1.
Coming from a host on the LAN subnet the server IP (10.160.9.1) and first client IP will be reachable but other clients result in the routing error:
steve@steve-Standard-PC-i440FX-PIIX-1996:~$ ping 10.160.9.3 PING 10.160.9.3 (10.160.9.3) 56(84) bytes of data. From 10.160.9.1 icmp_seq=1 Redirect Host(New nexthop: 0.0.0.0) From 10.160.9.1 icmp_seq=1 Redirect Host(New nexthop: 0.0.0.0) From 10.160.9.1 icmp_seq=1 Redirect Host(New nexthop: 0.0.0.0)
That is the case for all other IPs in the tunnel subnet whether of not a client exists there.
Updated by Steve Wheeler 2 months ago
[22.09-DEVELOPMENT][email@example.com]/root: netstat -rn4 Routing tables Internet: Destination Gateway Flags Netif Expire default 172.21.16.1 UGS vtnet0 10.160.9.0/24 10.160.9.2 UGS ovpns1 10.160.9.1 link#8 UHS lo0 10.160.9.2 link#8 UH ovpns1 127.0.0.1 link#5 UH lo0 172.21.16.0/24 link#1 U vtnet0 172.21.16.1 3e:81:34:ce:48:36 UHS vtnet0 172.21.16.160 link#1 UHS lo0 192.168.160.0/24 link#2 U vtnet1 192.168.160.1 link#2 UHS lo0
Updated by Kristof Provost about 2 months ago
- Status changed from Confirmed to Pull Request Review
The issue here is that one of the assumptions in the if_ovpn code, namely that we always get an 'ro' argument to ovpn_output() unless called from pf's route-to, was incorrect.
This is the case in main, but not on stable/12. As a result we assumed that the packet was being route-to'd, and we had to send it to the first peer. That peer immediately sent the packet back to the server (according to its routing table), which wound up looping until the TTL expired.
This only affected the fast forwarding code, so I'd expect this problem to not manifest if IPSec is in use (because then we don't use ip_fastfwd()).