OpenVPN uses 100% CPU after experiencing packet loss
I have two OpenVPN clients set up in a gateway group and when I was running 2.4.5p1 this was fine I had zero problems but ever since the upgrade whenever one or both clients start experiencing packet loss they start using 100% CPU.
71320 root 103 0 16M 16M CPU1 1 40.4H 100.00% /usr/local/sbin/openvpn --config /var/etc/openvpn/client1/config.ovpn
This puts a lot of load on my router for absolutely no reason, it is even worse when they are both going like that. I have tried to do some due diligence before posting this issue so hopefully I have not filed a duplicate. I have been through most of the log files, but I am unfamiliar with FreeBSD (mostly Linux). Even then besides the packet loss I'm not finding anything in the logs that might cause this.
I started a Reddit post a few weeks back and I guess I'm not the only one experiencing this issue <https://old.reddit.com/r/PFSENSE/comments/m7ooku/openvpn_connection_at_100_cpu_after_gateway_group/>. Like in that post things I've tried to resolve this are to remove extra settings, disable gateway monitoring, etc. Restarting the OpenVPN clients usually fixes it for a few days but eventually it's back to overusing my CPU.
What logs would you like and I will provide them?
#1 Updated by Jim Pingle about 1 month ago
- Subject changed from OpenVPN Uses 100% CPU to OpenVPN uses 100% CPU after experiencing packet loss
I'm not sure there is anything pfSense could do about that. If OpenVPN itself is using the CPU, it's likely a problem in OpenVPN. If you can manage to reproduce it with just FreeBSD+OpenVPN (install via `pkg` should be easy, similar to Linux) then you could report it upstream to OpenVPN directly.
There are a few changes in the most recent release of OpenVPN but I don't see anything that immediately jumps out as being one that would cause that kind of CPU load over time. I don't see anything in their bug tracker recently about 100% CPU usage either.
#2 Updated by Jason NA about 1 month ago
When I updated to 2.5 I changed a few more things from these VPN guides <https://nguvu.org/pfsense/pfsense-baseline-setup/> and <https://nguvu.org/pfsense/pfsense-multi-vpn-wan/>, which I used to set this up on the first place. The most notable change was under System -> Advanced -> Miscellaneous -> Cryptographic & Thermal Hardware -> changing AES-NI CPU based Acceleration to AES-NI and BSD based Crypto Device (aes-ni, cryptodev). I have a board with a Intel(R) Celeron(R) CPU J3355 which does have AES-NI but no idea about the BSD crypto device, says shouldn't use it if it doesn't have it though.
I did have ntopng running but have since disabled it (didn't uninstall) because it was also using too much cpu. I switched from pfblockerng to pfblockerng-devel. Other packages I'm using are bandwidthd, nut, service_watchdog, and status_traffic_totals. I also have a multi wan traffic shaper running to balance traffic. No idea if any of that could be causing my problems, didn't seem to with 2.4.5p1.
The funny thing about this is even though it's using 100% of one cpu it doesn't seem to be affecting performance in any way, just an increase in CPU temperature. I do have a Router rebuild coming up in the next few weeks which is going to require a complete re-install so maybe that will fix it when I restore the config from backup.
If I feel adventurous I may try to install OpenVPN with `pkg`, but as you say if this is a problem with OpenVPN I may take this issue to their issue tracker. Although it could take months (a year) for it to trickle down to pfSense.
#4 Updated by Jason NA about 1 month ago
I changed verbosity on client1, waited a couple of minutes then changed the verbosity on client2 and when I hit save unbound started misbehaving, I have no idea what it was doing but it wouldn't start. So I added unbound to service watchdog which didn't seem to make a difference. I had to stop client2 on the dashboard before I could connect to the internet, wait a couple of minutes and then start it again. At least I didn't get the bug where both client1 and client2 get the same ip address, which also leads to no internet.
Here are the log files for both openvpn and unbound for that misadventure, that will probably repeat when I change the verbosity back to normal. I will try to post more log files later when one of the clients starts pegging the cpu, but it usually takes hours perhaps days. Differentiating in the logs between client1 and client2 is also impossible (pid keeps changing) if I keep restarting it, and I don't think you want to read through a huge log file.
#5 Updated by Jason NA about 1 month ago
According to my email VPN1_WAN/client1 was suffering packet loss at Apr 6, 2021, 10:11 PM, then not soon after VPN2_WAN/client2 started experiencing packet loss at Apr 6, 10:16 PM. Attached is the log until 11:30 PM or so. Client2 is currently using 100% CPU.
For the past week I've been testing with the traffic shaper disabled and that is what seems to be causing this issue. Maybe it was the way that I had it set up giving priority to HTTP and HTTPS and limiting torrent traffic. I configured it using the multiple wan/lan wizard. Either way I disabled the traffic shaper and now OpenVPN doesn't use 100% of my CPU anymore when one or both of the gateways goes down, so that part of the problem is solved. However when I download torrents (linux iso's) I still get emails about the gateway (or both) being down and this happens if I have it set to member down or packet loss and high latency. I am unsure if I am just configuring this improperly or if I should just disable gateway monitoring entirely because I am getting way too many of these emails.
#8 Updated by Troy Emmerson 1 day ago
OpenVPN is historically notorious for high CPU usage to the extent that it can clog up CPU usage to point that other mission critical services and I/O are delayed by OpenVPN's excessive CPU usage.
This typically shows up with multiple VPN links, and/or limited available cores, (and often includes issues related to network interface and CPU pinning as well), in situations where OpenVPN goes into a recovery state, often depending on DNS and/or kernel network I/O, that is being blocked by heavy CPU usage. I've also seen where dpinger will cause gateway flapping, due not getting time on cpu, check gateway / dpinger logs to see if this is occurring).
The suggested mitigation strategy for this type of CPU loading by OpenVPN is to add the OpenVPN config option "nice X", to reduce OpenVPN's CPU priority, so that other mission critical processes can get CPU time without being blocked by OpenVPN.