OpenVPN Authentication Against Backend Stalls All Server Traffic
When authenticating an OpenVPN Remote Access server against an authentication backend such as RADIUS, all traffic on the server is halted while that authentication is processed. If this takes time, such as RADIUS to a MFA service such as Duo, this delay can be significant even under normal circumstances.
Created and tested a RADIUS Authentication backend with a 30-second authentication timeout
Created an SSL/TLS + User Auth OpenVPN Server backed by the RADIUS server
Created two different user certificates and corresponding RADIUS accounts
Successfully logged in using the first account and started a ping
Changed the RADIUS server IP address so the request would "hang" for the configured 30 seconds
Attempted to log in to the second account
While that was timing out I changed the RADIUS server back to the proper IP address to see if Viscosity (the first client) recovered automatically. It did time out and reconnect successfully.
- 5 seconds apart
$ ping -i 5 172.25.228.1
PING 172.25.228.1 (172.25.228.1): 56 data bytes
64 bytes from 172.25.228.1: icmp_seq=0 ttl=63 time=4.658 ms
64 bytes from 172.25.228.1: icmp_seq=1 ttl=63 time=5.461 ms
64 bytes from 172.25.228.1: icmp_seq=2 ttl=63 time=4.718 ms
64 bytes from 172.25.228.1: icmp_seq=3 ttl=63 time=4.881 ms
64 bytes from 172.25.228.1: icmp_seq=4 ttl=63 time=2.311 ms
64 bytes from 172.25.228.1: icmp_seq=5 ttl=63 time=4.756 ms
64 bytes from 172.25.228.1: icmp_seq=6 ttl=63 time=4.851 ms
64 bytes from 172.25.228.1: icmp_seq=7 ttl=63 time=5.014 ms
64 bytes from 172.25.228.1: icmp_seq=8 ttl=63 time=4.892 ms
64 bytes from 172.25.228.1: icmp_seq=9 ttl=63 time=2.622 ms
Request timeout for icmp_seq 10
Request timeout for icmp_seq 11
Request timeout for icmp_seq 12
Request timeout for icmp_seq 13
Request timeout for icmp_seq 14
Request timeout for icmp_seq 15
Request timeout for icmp_seq 16
Request timeout for icmp_seq 17
Request timeout for icmp_seq 18
Request timeout for icmp_seq 19
Request timeout for icmp_seq 20
Request timeout for icmp_seq 21
64 bytes from 172.25.228.1: icmp_seq=22 ttl=63 time=2.731 ms
64 bytes from 172.25.228.1: icmp_seq=23 ttl=63 time=4.933 ms
64 bytes from 172.25.228.1: icmp_seq=24 ttl=63 time=5.390 ms
64 bytes from 172.25.228.1: icmp_seq=25 ttl=63 time=5.429 ms
64 bytes from 172.25.228.1: icmp_seq=26 ttl=63 time=5.014 ms
64 bytes from 172.25.228.1: icmp_seq=27 ttl=63 time=5.074 ms
Same behavior on 2.3.4_1 and most-recent 2.4.0-RC. I did not test 2.4.1 since it uses the same OpenVPN package as 2.4.0.
Updated by Jim Pingle over 4 years ago
- Affected Architecture All added
- Affected Architecture deleted (
Looks like it's a known issue with the nature of auth-user-pass-verify that OpenVPN does not plan to address: https://community.openvpn.net/openvpn/ticket/222
Looks like that would need to be converted into a FreeBSD port or added to the OpenVPN port and then the OpenVPN code and auth script would have to be adapted to use it instead, assuming it can do everything we need it to do.
Updated by Jim Pingle over 4 years ago
- Status changed from New to Confirmed
- Target version changed from 2.4.2 to 2.4.3
I was finally able to confirm the problem, I'm looking at that auth_script plugin now, but it will require some significant changes to the auth script and it doesn't compile out of the box on FreeBSD. Probably going to need more time to analyze this than we have for 2.4.2 given the nature of how it operates and what is required, so bumping it forward for now.
Updated by John Tikis over 4 years ago
I can also add that when two RADIUS servers are declared as backend authenticators and the first on the list fails (e.g. it is switched off), then OpenVPN never rolls back to the second RADIUS authentication server on the list.
A similar issue has been opened for LDAP backends -> https://redmine.pfsense.org/issues/3022
pFsense version: 2.4.2-Release-p1
Updated by Phil DeMonaco over 4 years ago
I'm actively working on implementing this in my pfSense environment. Currently I have a fork of the workaround linked above that compiles successfully both on FreeBSD 10.3 and FreeBSD 11.1:
Note that the Makefile is using gmake features.
FreeBSD isn't my primary development platform so I'm still tinkering a bit trying to get a pfSense build environment configured. Once I have that working I'm going to look at at least doing a backend implementation of this on my appliance.
When this is solved in the mainline implementation is there a plan to incorporate GUI changes to the VPN server page?
Additionally, is there a good way for me to contribute these changes directly to the project?
Updated by Phil DeMonaco about 4 years ago
- File server2.conf server2.conf added
- File openvpn-auth-script-test.log openvpn-auth-script-test.log added
- File openvpn-auth-script-test-client.log openvpn-auth-script-test-client.log added
- File client.ovpn client.ovpn added
I'm really close to having this working on the 2.4.2-RELEASE code base, however, I'm running into an issue and I'm hoping someone a bit more familiar with the integration of pfsense & openvpn might be able to help.In brief, I've modified two small parts of the existing pfsense environment:
- Added a new asynrchonous script designed to be called via a OpenVPN plugin: https://github.com/pdemonaco/pfsense/blob/master/src/usr/local/sbin/ovpn_auth_verify_async
- Updated the config file generation such that it used the plugin directive instead of auth-user-pass-verify: https://github.com/pdemonaco/pfsense/blob/master/src/etc/inc/openvpn.inc
- Modified the auth-script plugin to allow arguments to be passed to the target script: https://github.com/pdemonaco/auth-script-openvpn/tree/pass-script-arguments
Note that I made the same changes to the 2.4.2-RELEASE version of the openvpn.inc file to actually execute my tests as it differs from the current master.
While the call to the authentication module occurs correctly and returns the proper result something goes wrong after the plugin notifies the core process of the result. The log excerpt below begins at the point where the authentication process completes with a successful auth.
Feb 23 21:11:28 d0-nfw2533 openvpn: event_wait : Interrupted system call (code=4) Feb 23 21:11:29 d0-nfw2533 openvpn: <client-ip>:43518 UDPv4 READ  from [AF_INET]<client-ip>:43518: P_CONTROL_V1 kid=0 [ ] pid=5 DATA len=42 Feb 23 21:11:29 d0-nfw2533 openvpn: <client-ip>:43518 PUSH: Received control message: 'PUSH_REQUEST' Feb 23 21:11:29 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 MULTI_sva: pool returned IPv4=172.29.128.2, IPv6=(Not enabled) Feb 23 21:11:29 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 WARNING: Failed running command (--client-connect): external program fork failed Feb 23 21:11:29 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 UDPv4 WRITE  to [AF_INET]<client-ip>:43518: P_ACK_V1 kid=0 [ 5 ] Feb 23 21:11:35 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 UDPv4 READ  from [AF_INET]<client-ip>:43518: P_CONTROL_V1 kid=0 [ ] pid=6 DATA len=42 Feb 23 21:11:35 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 PUSH: Received control message: 'PUSH_REQUEST' Feb 23 21:11:35 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 Delayed exit in 5 seconds Feb 23 21:11:35 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 SENT CONTROL [<client-user>]: 'AUTH_FAILED' (status=1) Feb 23 21:11:35 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 UDPv4 WRITE  to [AF_INET]<client-ip>:43518: P_ACK_V1 kid=0 [ 6 ] Feb 23 21:11:35 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 UDPv4 WRITE  to [AF_INET]<client-ip>:43518: P_CONTROL_V1 kid=0 [ ] pid=7 DATA len=41 Feb 23 21:11:37 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 UDPv4 WRITE  to [AF_INET]<client-ip>:43518: P_CONTROL_V1 kid=0 [ ] pid=7 DATA len=41 Feb 23 21:11:40 d0-nfw2533 openvpn: <client-user>/<client-ip>:43518 SIGTERM[soft,delayed-exit] received, client-instance exiting
I believe the "event_wait" message may be the normal indication of the completion of the auth request as it is immediately followed by an attempt to allocate an IP to the client. The part which confuses me is the fork failure messageon the client connect command. From what I can tell the script referenced in the configuration file is not actually being called.
Any suggestions would be appreciated - thanks!
Updated by Phil DeMonaco almost 4 years ago
I've added another pull request which includes the new plugin port as a dependency to the main pfSense port.
It's been over a month since the plugin was accepted into FreeBSD upstream and I've seen no action on the requests. What can I do to expedite this process?
Updated by John Tikis over 3 years ago
Unfortunately, with pfSense version 2.4.4, the fallback to an alternative RADIUS server is still not operational.
I have 2 RADIUS servers active and if I declare them both (Srv1 & Srv2) within the OpenVPN backend authentication, if I stop the RADIUS service on Srv1 (i.e. server unreachable), pfSense will never roll-back to the RADIUS service on Srv2. I also checked the RADIUS server timeouts within the Users -> Auth menu to below 30 secs so that TLS negotiation does not time out.
If only one of the two RADIUS servers are declared within the OpenVPN backend settings, then the authentication process is completed correctly.