OpenVPN client started multiple times when connecting to FQDN where connectivity to server is delayed
1. WAN connection should not be Static/DHCP!
Steps to reproduce:
1. Create an ovpn client instance with DN as "server address" (for example: vpn.contoso.com).
2. Check "Server host name resolution" option.
3. Save and restart the router.
If WAN connection establishment delay was long enough our newly created ovpn instance will become "detached" from system.
Upon WAN iface goes up an ovpn client daemon will resolve a DN and establish connection to the server. Good! But you will not be able to control that ovpn instance anymore. That means you wont be able to stop, start, restart, disable/enable it! Ovpn iface will be up and working 4ver.
Fixes #3894, --resolv-retry is infinite by default. To avoid the issues of locking the persistnet tun device by this just retry two times by default. People can enable resolv-retry infinite themselves for previous behaviour
In some circumstances, OpenVPN doesn't exit on SIGTERM. SIGKILL it when that happens. Ticket #3894
#2 Updated by Dmitriy K over 4 years ago
Bug is 100% reproducible. My guess is Bind server is being restarted right after ovpn is done restarting so resolving is not available at the time when ovpn trying to resolve DN. When Bind is up on iface ovpn successfully resolves DN and connects to the server being detached from GUI.
Maybe i'm wrong, maybe not ...
#3 Updated by Dmitriy K over 4 years ago
After some research I've found out that system can't connect to "detached" ovpn instance socket.
I've added some logging to openvpn_get_client_status() of openvpn.inc and here is the output:
/index.php: openvpn_get_client_status(Array, unix:///var/etc/openvpn/client3.sock) = 61;
File (unix:///var/etc/openvpn/client3.sock) itself is exists but not accessible;
#11 Updated by Chris Buechler about 4 years ago
- Subject changed from System looses control over specifically configured ovpn client instance after reboot to OpenVPN client started multiple times when connecting to FQDN where connectivity to server is delayed
- Assignee set to Chris Buechler
- Affected Documentation 0 added
The specific issue here is OpenVPN client is launched multiple times when connecting to FQDN with "resolv-retry infinite", where there is a delay in the Internet coming up, or network connectivity to the VPN server and/or DNS is unavailable. I have a good test case for this, will look into it further.
#12 Updated by Michael Schefczyk about 4 years ago
On a server with two OpenVPN Clients in Peer to Peer (SSL/TLS) mode, I have the same issue, while "Infinitely resolve server" is NOT being checked. The issue occurs after every reboot. It can be cured by determining the OpenVPN clients' PIDs and then killing and restarting the processes. Usually, only one of the two clients is affected. Of course, I would very much welcome if the server could reboot to full functionality without manual intervention.
The setting is: 2.1.5-RELEASE (amd64), Intel(R) Atom(TM) CPU C2758 @ 2.40GHz 8 CPUs: 1 package(s) x 8 core(s), two WAN gateways, two OpenVPN Client in Peer to Peer (SSL/TLS) mode, Quagga OPSF package, Unbound package.
#14 Updated by Ermal Luçi about 4 years ago
- Status changed from Confirmed to Feedback
The issue here is that resolve-retry infinite is on by default.
I pushed a fix to do only 2 retries by default which should fix the issue at hand.
Previous behaviour people can just enable resolv-retry infinite if they want.
#16 Updated by Chris Buechler about 4 years ago
- Status changed from Feedback to Resolved
Ermal's change is good, but doesn't help this circumstance. The root cause here is OpenVPN doesn't exit when sent a SIGTERM in this circumstance, and then we start it again while it's still running. Changed to send a SIGKILL if it doesn't exit after SIGTERM. Confirmed this resolves the circumstance described here.
#17 Updated by Phillip Davis about 4 years ago
I have systems where the internet somewhere goes away quite regularly. The actual pfSense WAN interface to the upstream device (ISP, whatever) is fine, so there is no link down/link up event for pfSense to see in that sense.
OpenVPN site-to-clients time out after a bit, and then try to find their server end again. For this they try to resolve the FQDN of the server again. However the ISP issue lasts more than a few minutes, the DNS resolution fails, and with the now-default "resolv-retry 2", the OpenVPN client simply gives up and exits.
Then there is nothing in the system to try and start it again, either when ISP internet is better, or every so often. The clients stay down.
I have noticed this happen quite a few times recently and now realise the "resolv-retry 2" change is the reason for the new behavior. It seems odd to have a config that will simply exit in a reasonably-expected situation (DNS resolution has gone away for a few minutes) and that the client process just exits and is never restarted.
I can select "Infinitely resolve server" and that will put things back the way they were. But it will be a hassle for lots of users to find this out after upgrading to 2.2
But with Chris' comment above about the SIGKILL/SIGTERM stuff - if that really resolves the underlying issue, then would it be best to revert the commit of the "resolv-retry 2" stuff?
#19 Updated by Phillip Davis about 4 years ago
I understand that, and I will now go to all my site-to-site clients on 2.1.5 and turn on that setting so it carries over into 2.2.
At the moment in 2.1.5, no resolv-retry goes in the config by default. And thus the OpenVPN default is in effect:
"By default, --resolv-retry infinite is enabled."
I am thinking that there might be quite a few people who experience this after upgrading to 2.2. Or is my situation an unusual edge case? Just thought I would raise the issue so others can think and comment.
#22 Updated by Chris Buechler about 4 years ago
The last update has nothing to do with your issue Dmitriy, the fix I put in a couple weeks ago is fine for that. Ermal's other change in this ticket is what broke Phil's setup and would end up breaking a lot of others, which was undone today. Everything related here is all good.