Bug #11778
openOpenVPN uses 100% CPU after experiencing packet loss
0%
Description
I have two OpenVPN clients set up in a gateway group and when I was running 2.4.5p1 this was fine I had zero problems but ever since the upgrade whenever one or both clients start experiencing packet loss they start using 100% CPU.
71320 root 103 0 16M 16M CPU1 1 40.4H 100.00% /usr/local/sbin/openvpn --config /var/etc/openvpn/client1/config.ovpn
This puts a lot of load on my router for absolutely no reason, it is even worse when they are both going like that. I have tried to do some due diligence before posting this issue so hopefully I have not filed a duplicate. I have been through most of the log files, but I am unfamiliar with FreeBSD (mostly Linux). Even then besides the packet loss I'm not finding anything in the logs that might cause this.
I started a Reddit post a few weeks back and I guess I'm not the only one experiencing this issue <https://old.reddit.com/r/PFSENSE/comments/m7ooku/openvpn_connection_at_100_cpu_after_gateway_group/>. Like in that post things I've tried to resolve this are to remove extra settings, disable gateway monitoring, etc. Restarting the OpenVPN clients usually fixes it for a few days but eventually it's back to overusing my CPU.
What logs would you like and I will provide them?
Files
Related issues
Updated by Jim Pingle over 3 years ago
- Subject changed from OpenVPN Uses 100% CPU to OpenVPN uses 100% CPU after experiencing packet loss
I'm not sure there is anything pfSense could do about that. If OpenVPN itself is using the CPU, it's likely a problem in OpenVPN. If you can manage to reproduce it with just FreeBSD+OpenVPN (install via `pkg` should be easy, similar to Linux) then you could report it upstream to OpenVPN directly.
There are a few changes in the most recent release of OpenVPN but I don't see anything that immediately jumps out as being one that would cause that kind of CPU load over time. I don't see anything in their bug tracker recently about 100% CPU usage either.
Updated by Jason NA over 3 years ago
When I updated to 2.5 I changed a few more things from these VPN guides <https://nguvu.org/pfsense/pfsense-baseline-setup/> and <https://nguvu.org/pfsense/pfsense-multi-vpn-wan/>, which I used to set this up on the first place. The most notable change was under System -> Advanced -> Miscellaneous -> Cryptographic & Thermal Hardware -> changing AES-NI CPU based Acceleration to AES-NI and BSD based Crypto Device (aes-ni, cryptodev). I have a board with a Intel(R) Celeron(R) CPU J3355 which does have AES-NI but no idea about the BSD crypto device, says shouldn't use it if it doesn't have it though.
I did have ntopng running but have since disabled it (didn't uninstall) because it was also using too much cpu. I switched from pfblockerng to pfblockerng-devel. Other packages I'm using are bandwidthd, nut, service_watchdog, and status_traffic_totals. I also have a multi wan traffic shaper running to balance traffic. No idea if any of that could be causing my problems, didn't seem to with 2.4.5p1.
The funny thing about this is even though it's using 100% of one cpu it doesn't seem to be affecting performance in any way, just an increase in CPU temperature. I do have a Router rebuild coming up in the next few weeks which is going to require a complete re-install so maybe that will fix it when I restore the config from backup.
If I feel adventurous I may try to install OpenVPN with `pkg`, but as you say if this is a problem with OpenVPN I may take this issue to their issue tracker. Although it could take months (a year) for it to trickle down to pfSense.
Updated by Pippin MMD over 3 years ago
since the upgrade whenever one or both clients start experiencing packet loss they start using 100% CPU
A OpenVPN log at
verb 4
might help.
Updated by Jason NA over 3 years ago
- File openvpn.log openvpn.log added
- File unbound.log unbound.log added
I changed verbosity on client1, waited a couple of minutes then changed the verbosity on client2 and when I hit save unbound started misbehaving, I have no idea what it was doing but it wouldn't start. So I added unbound to service watchdog which didn't seem to make a difference. I had to stop client2 on the dashboard before I could connect to the internet, wait a couple of minutes and then start it again. At least I didn't get the bug where both client1 and client2 get the same ip address, which also leads to no internet.
Here are the log files for both openvpn and unbound for that misadventure, that will probably repeat when I change the verbosity back to normal. I will try to post more log files later when one of the clients starts pegging the cpu, but it usually takes hours perhaps days. Differentiating in the logs between client1 and client2 is also impossible (pid keeps changing) if I keep restarting it, and I don't think you want to read through a huge log file.
Updated by Jason NA over 3 years ago
- File openvpn_2.log openvpn_2.log added
According to my email VPN1_WAN/client1 was suffering packet loss at Apr 6, 2021, 10:11 PM, then not soon after VPN2_WAN/client2 started experiencing packet loss at Apr 6, 10:16 PM. Attached is the log until 11:30 PM or so. Client2 is currently using 100% CPU.
Updated by Jason NA over 3 years ago
Did a Router rebuild this morning requiring a complete re-install (2.5.1) and restored configuration from backup, the issue persists.
Updated by Jason NA over 3 years ago
For the past week I've been testing with the traffic shaper disabled and that is what seems to be causing this issue. Maybe it was the way that I had it set up giving priority to HTTP and HTTPS and limiting torrent traffic. I configured it using the multiple wan/lan wizard. Either way I disabled the traffic shaper and now OpenVPN doesn't use 100% of my CPU anymore when one or both of the gateways goes down, so that part of the problem is solved. However when I download torrents (linux iso's) I still get emails about the gateway (or both) being down and this happens if I have it set to member down or packet loss and high latency. I am unsure if I am just configuring this improperly or if I should just disable gateway monitoring entirely because I am getting way too many of these emails.
Updated by Anonymous over 3 years ago
OpenVPN is historically notorious for high CPU usage to the extent that it can clog up CPU usage to point that other mission critical services and I/O are delayed by OpenVPN's excessive CPU usage.
This typically shows up with multiple VPN links, and/or limited available cores, (and often includes issues related to network interface and CPU pinning as well), in situations where OpenVPN goes into a recovery state, often depending on DNS and/or kernel network I/O, that is being blocked by heavy CPU usage. I've also seen where dpinger will cause gateway flapping, due not getting time on cpu, check gateway / dpinger logs to see if this is occurring).
The suggested mitigation strategy for this type of CPU loading by OpenVPN is to add the OpenVPN config option "nice X", to reduce OpenVPN's CPU priority, so that other mission critical processes can get CPU time without being blocked by OpenVPN.
Updated by M B over 3 years ago
Jason NA wrote:
For the past week I've been testing with the traffic shaper disabled and that is what seems to be causing this issue. Maybe it was the way that I had it set up giving priority to HTTP and HTTPS and limiting torrent traffic. I configured it using the multiple wan/lan wizard. Either way I disabled the traffic shaper and now OpenVPN doesn't use 100% of my CPU anymore when one or both of the gateways goes down, so that part of the problem is solved. However when I download torrents (linux iso's) I still get emails about the gateway (or both) being down and this happens if I have it set to member down or packet loss and high latency. I am unsure if I am just configuring this improperly or if I should just disable gateway monitoring entirely because I am getting way too many of these emails.
I am experiencing this issue as well with OpenVPN and Traffic Shaper on 2.5.0 CE. It seems like after the traffic shaper kicks in, OpenVPN gets stuck at 100% CPU and my box starts to overheat. Restarting the OpenVPN service when that happens resolves the issue and temps go back down, so I suspect it's OpenVPN not handling the limitations gracefully.
Updated by Jim Pingle over 3 years ago
- Has duplicate Bug #12163: WAN interface throughput degradation after send high volume through OpenVPN site-to-site Tunnel added
Updated by Gavin Owen almost 3 years ago
Troy Emmerson wrote in #note-8:
OpenVPN is historically notorious for high CPU usage to the extent that it can clog up CPU usage to point that other mission critical services and I/O are delayed by OpenVPN's excessive CPU usage.
I am getting 100% CPU usage on a single core doing 2kbps of almost nothing trickle-traffic. OpenVPN can be bad, but not that bad! In 2.4.5p1 it was fine - wasn't getting anywhere near CPU saturation even when doing 100Mbps. I did the upgrade to 2.6.0 on Saturday afternoon and the entire rest of the weekend the CPU was near zero just like in 2.4.5p1. Monday morning 8:30am the CPU bugs out to 100% even when only a few Mbps usage. After yours now (midnight) PCs at the remote site are all off and the link has basically no traffic but still the openvpn process for the site to site tunnel is still maxing 100% CPU. It's still bugged out - definitely not usage related. If I restart the process then it goes back to normal. Likely to bug out again at 8:30am tomorrow.
Had the same issue in 2.5.2 but rolled back to 2.4.5p1. That version is getting too old now - can't avoid this issue any longer.
I don't know if packet loss or something else is the trigger. I am using a traffic shaper.
"top" outputs
2.6.0 box (is the head office)
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
79510 root 1 103 0 16M 7268K CPU0 0 711:37 99.74% openvpn <<<<< this is for the site-to-site OpenVPN process (box up 2 days 7hrs)
5974 root 1 27 0 132M 41M accept 1 0:02 0.57% php-fpm
59257 root 2 20 0 19M 7224K select 1 1:04 0.09% openvpn <<<<< this is for my remote dial-in connection from desktop PCs (for remote management / work-from-home users).
remote site 2.4.5p1 box (is the remote site)
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
24336 root 1 20 0 10308K 6340K select 1 172:33 0.08% openvpn <<<< site-to-site OpenVPN process. (box up 56 days)
82591 nobody 1 20 0 11596K 5172K select 0 0:17 0.05% dnsmasq
34567 root 1 20 0 12964K 7856K select 3 0:00 0.02% sshd
11008 root 5 52 0 6904K 2340K uwait 3 0:26 0.02% dpinger
- /usr/local/sbin/openvpn --version
OpenVPN 2.5.4 amd64-portbld-freebsd12.3 [SSL (OpenSSL)] [LZO] [LZ4] [MH/RECVDA] [AEAD] built on Jan 12 2022
library versions: OpenSSL 1.1.1l-freebsd 24 Aug 2021, LZO 2.10
Originally developed by James Yonan
Copyright (C) 2002-2021 OpenVPN Inc <sales@openvpn.net>
Compile time defines: enable_async_push=yes enable_comp_stub=no enable_crypto_ofb_cfb=yes enable_debug=yes enable_def_auth=yes enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_fast_install=needless enable_fragment=yes enable_iproute2=no enable_libtool_lock=yes enable_lz4=yes enable_lzo=yes enable_management=yes enable_multihome=yes enable_pam_dlopen=no enable_pedantic=no enable_pf=yes enable_pkcs11=no enable_plugin_auth_pam=yes enable_plugin_down_root=yes enable_plugins=yes enable_port_share=yes enable_selinux=no enable_shared=yes enable_shared_with_static_runtimes=no enable_silent_rules=no enable_small=no enable_static=yes enable_strict=yes enable_strict_options=no enable_systemd=no enable_unit_tests=no enable_werror=no enable_win32_dll=yes enable_x509_alt_username=yes with_aix_soname=aix with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_sysroot=no
- /usr/local/sbin/openvpn --version
OpenVPN 2.4.9 amd64-portbld-freebsd11.3 [SSL (OpenSSL)] [LZO] [LZ4] [MH/RECVDA] [AEAD] built on May 4 2020
library versions: OpenSSL 1.0.2u-freebsd 20 Dec 2019, LZO 2.10
Originally developed by James Yonan
Copyright (C) 2002-2018 OpenVPN Inc <sales@openvpn.net>
Compile time defines: enable_async_push=no enable_comp_stub=no enable_crypto=yes enable_crypto_ofb_cfb=yes enable_debug=yes enable_def_auth=yes enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_fast_install=needless enable_fragment=yes enable_iproute2=no enable_libtool_lock=yes enable_lz4=yes enable_lzo=yes enable_management=yes enable_multihome=yes enable_pam_dlopen=no enable_pedantic=no enable_pf=yes enable_pkcs11=no enable_plugin_auth_pam=yes enable_plugin_down_root=yes enable_plugins=yes enable_port_share=yes enable_selinux=no enable_server=yes enable_shared=yes enable_shared_with_static_runtimes=no enable_silent_rules=no enable_small=no enable_static=yes enable_strict=yes enable_strict_options=no enable_systemd=no enable_werror=no enable_win32_dll=yes enable_x509_alt_username=no with_aix_soname=aix with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_sysroot=no
Is it the compile options? Differences
New has:
enable_async_push=yes
enable_x509_alt_username=yes
enable_unit_tests=no
old has:
enable_async_push=no
enable_x509_alt_username=no
enable_crypto=yes
enable_server=yes
I should note that only the site-to-site OpenVPN bugs out. The remote access "roadwarrior" openvpn process stays low CPU. Not sure where we go from here.