Bug #2915
closed
OpenVPN server/client not started after WAN physical hotplug event
Added by Phillip Davis over 11 years ago.
Updated over 11 years ago.
Description
Easy to reproduce - setup OpenVPN client on interface WAN. Have it running connected to a server somewhere. Physically unplug the WAN. The client dies (this is OK - OpenVPN does not like to remain bound to an IP of a physical interface that has gone away). Plug the WAN cable again, the WAN interface comes up, internet is available again. But the OpenVPN client is never started. The same problem is expected to happen with a server.
If the client/s or server/s use a gateway group as an interface, then they are started.
Problem is in /etc/rc.openvpn - it only tries to start OpenVPN servers and clients that use a gateway group.
The easy fix is to make rc.openvpn restart all OpenVPN (like it used to about 9 months ago). However, in complex configs it would be nice NOT to restart every OpenVPN when some crappy little interface goes down and up. If rc.openvpn had some knowledge of which interface had come up, then it could be smart and restart just:
- OpenVPN that use that interface
- OpenVPN of any gateway group that now has that interface as the highest tier
as long as it only impacts OpenVPN instances on the interface where the event occurred, it should be fine to call resync on all instances on that interface (or on a gateway group where that interface is involved).
/etc/rc.openvpn does not get passed the interface, so it has no idea what is going on. I can't see an easy way to get all that check_reload_status/send_event stuff to pass the interface name all the way through.
/etc/inc/interfaces.inc interface_configure() gets called by rc.linkup and other good things - it looks like there could be a routine added to openvpn.inc openvpn_resync_interface($interface) that has all the smarts to resync all OpenVPNs related to the interface given. Then interface_configure() can call it. rc.newwanip could also call openvpn_resync_interface() instead of openvpn_resync_all(), as it also does not need to reset the whole universe of OpenVPN. And anywhere else that handles interface changes that effect OpenVPN.
Is that a reasonable way to go?
Note: gwlb.inc setup_gateways_monitor() also allows apinger to send "service reload openvpn", which will cause check_reload_status to run /etc/rc.openvpn, which reloads the OpenVPN universe - that seems to happen every time some dodgy gateway gets some packet loss, latency... - should that be the thing that is improved to somehow pass the interface through from apinger?
- Status changed from New to Feedback
Several fixes committed related to this.
- Status changed from Feedback to New
Still not started on hot plug whether using a gateway group or not.
Chris Buechler wrote:
Still not started on hot plug whether using a gateway group or not.
Could you try it with a more recent snapshot (>= April 15)? A binary upgrade is necessary because check_reload_status was updated
- Status changed from New to Feedback
- Status changed from Feedback to New
Seems to still be a problem on a snap from the 24th, especially with VPN instances bound to Gateway Groups.
This pull request makes it all work as far as I can see: https://github.com/pfsense/pfsense/pull/625
There is some more optimization that can be done. For a gateway group using multiple tiers, if one of the lower-priority gateway/interfaces goes down or up, then there is no need to restart OpenVPN instances that are happily running on the working higher-priority (e.g. Tier 1) interface/s. I will have a look at that, since I think it is worth doing. People tend to "play" with the backup links, thinking that they can plug/unplug them with no consequences as long as the primary WAN is not touched - it would be nice if this were true.
The optimized version is pull request https://github.com/pfsense/pfsense/pull/627
I believe this is all working nicely now. Once people have had a chance to try/test it for real in upcoming snapshots, this bug report can be closed.
- Status changed from New to Feedback
Should be fixed after pull request was merged
- Status changed from Feedback to Resolved
Also available in: Atom
PDF