Project

General

Profile

Actions

Bug #14386

closed

``openvpn.auth-user.php`` gets stuck at 100% CPU usage when RADIUS authentication times out

Added by Orion Poplawski 12 months ago. Updated 26 days ago.

Status:
Resolved
Priority:
Normal
Category:
OpenVPN
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
24.03
Release Notes:
Default
Affected Version:
Affected Architecture:

Description

This seems to be new behavior on pfsense plus 23.01-RELEASE. I'm seeing openvpn.auth-user.php processes stuck consuming 100% cpu. They seem to be correlated to openvpn AUTH_FAILED events as well as:

May 14 16:09:32 openvpn[8028]: /openvpn.auth-user.php: Error during RADIUS authentication : Operation timed out

Process:
26834  -  R     1124:57.09 /usr/local/sbin/fcgicli -f /etc/inc/openvpn.auth-user.php -d username=USERNAME&password=PASSWORD&cn=CN&strictcn=false&authcfg=AUTHCFGE=&dev=ovpns3&untrusted_port=55994&modeid=server3&nas_port=1196

truss shows the process continually running:

recvfrom(0,0x820ea2280,8,0,NULL,0x0)             = 0 (0x0)

Actions #1

Updated by Kris Phillips 11 months ago

Hello,

Do you see this same behavior in 23.05?

Actions #2

Updated by aleksei prokofiev 9 months ago

Can not reproduce on 23.05.1
23.05.1-RELEASE (amd64)
built on Wed Jun 28 03:57:27 UTC 2023
FreeBSD 14.0-CURRENT

Actions #3

Updated by Orion Poplawski 9 months ago

I just checked my 23.05 box and I see 5 of these runaways. I guess we'll need to update to 23.05.1 to test that...

Actions #4

Updated by Marcos M 7 months ago

I'm not sure that the Operation timed out log is related because the existence of the log means that the authentication process completes and the script returns, hence fcgicli should exit. The truss output seems to indicate that fcgicli is waiting on openvpn.auth-user.php to finish which implies the issue is with the .php file. I did manage to reproduce this issue once (unfortunately I did not have any debugging output applied); as expected no output was returned from openvpn.attributes.php. I also noticed the following in the OpenVPN logs rather than in the system logs (reversed):

Oct 11 14:08:17     openvpn     81755     /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - -> 172.25.1.1 - Restarting packages.
Oct 11 14:08:17     openvpn     81755     /rc.newwanip: rc.newwanip called with empty interface.
Oct 11 14:08:17     openvpn     81755     /rc.newwanip: rc.newwanip: on (IP address: 172.25.1.1) (interface: []) (real interface: ovpns1).
Oct 11 14:08:17     openvpn     81755     /rc.newwanip: rc.newwanip: Info: starting on ovpns1. 

Perhaps the script isn't ending due to some issue around openlog()? A quick way to test that would be to simply omit openlog() and closelog() altogether.

To narrow down the issue further, it'd be useful to know if this can be reproduced after doing any or all of the following:
  1. Disabling Duplicate Connection in the OpenVPN server config, or
  2. Renaming /etc/inc/openvpn.attributes.php to /etc/inc/openvpn.attributes.php.bak (used for Cisco-AVPair rules)
  3. Apply the following patch to omit openlog() Show
Actions #5

Updated by Orion Poplawski 6 months ago

Duplicate Connection was already disabled (multiple connections from the same user are not allowed - check box not checked.

I moved /etc/inc/openvpn.attributes.php out of the way.

Just had a re-occurrence, so I will now apply the patch.

Actions #6

Updated by Danilo Zrenjanin 6 months ago

The patch fixed the issue on a firewall with the same symptoms. There were no side effects after applying the patch.

23.05.1-RELEASE (amd64)
built on Wed Jun 28 03:57:27 UTC 2023
FreeBSD 14.0-CURRENT
Actions #7

Updated by Marcos M 6 months ago

Preferably we can get more testing/feedback on this. Given the inconsistent nature of the issue, it may simply be the issue has not appeared yet.

Actions #8

Updated by Orion Poplawski 6 months ago

Just had another reproducer. However, this time it did not appear to the associated with the auth timeout message initially reported, but a different variant:

Oct 27 10:04:22 PLUGIN_CALL: POST /usr/local/lib/openvpn/plugins/openvpn-plugin-auth-script.so/PLUGIN_AUTH_USER_PASS_VERIFY status=2
Oct 27 10:04:22 TLS: Username/Password authentication deferred for username 'USER'
...
Oct 27 10:05:18 PUSH: Received control message: 'PUSH_REQUEST'
Oct 27 10:05:34 Delayed exit in 5 seconds
Oct 27 10:05:34 SENT CONTROL [UNDEF]: 'AUTH_FAILED' (status=1)
Oct 27 10:05:34 SENT CONTROL [akialoa.cora.nwra.com]: 'AUTH_FAILED' (status=1)

Although now that I check I see that the "Duplicate Connection" box is checked on this particular VPN configuration, so I need to disable that now. Sigh
Actions #9

Updated by Marcos M 6 months ago

Orion Poplawski wrote in #note-8:

Just had another reproducer. However, this time it did not appear to the associated with the auth timeout message initially reported, but a different variant:

[...]
Although now that I check I see that the "Duplicate Connection" box is checked on this particular VPN configuration, so I need to disable that now. Sigh

Is that with the patch applied?

Actions #10

Updated by Orion Poplawski 6 months ago

Marcos M wrote in #note-9:

Orion Poplawski wrote in #note-8:

Just had another reproducer. However, this time it did not appear to the associated with the auth timeout message initially reported, but a different variant:

[...]
Although now that I check I see that the "Duplicate Connection" box is checked on this particular VPN configuration, so I need to disable that now. Sigh

Is that with the patch applied?

Yes - patch applied and /etc/inc/openvpn.attributes.php moved out of the way.

Actions #11

Updated by Marcos M 6 months ago

In that case, it seems more likely that the issue is with fcgicli. Given this issue seems to have appeared in 23.01, it may be due to reintroducing the use of fcgicli here:
https://github.com/pfsense/pfsense/commit/758ee42ae096fee8436efc89f2c9bcc4ae7ea23d

Please try the following patch to use php-cgi instead (updated): Show

Actions #12

Updated by Orion Poplawski 6 months ago

The patched version of ovpn_auth_verify seems to work okay, but if I patch ovpn_auth_verify_async I get auth failures:

SENT CONTROL [UNDEF]: 'AUTH_FAILED' (status=1)

not much else in the log to know why.

Actions #13

Updated by Orion Poplawski 3 months ago

FWIW - still present in 2.7.2 and 23.09.1.

Actions #14

Updated by Marcos M 3 months ago

Orion Poplawski wrote in #note-13:

FWIW - still present in 2.7.2 and 23.09.1.

Does the patch from #note-11 work around the issue? It should avoid the auth failure now.

Actions #15

Updated by Orion Poplawski 3 months ago

As I noted, patching ovpn_auth_verify_async gave me auth failures. I could try it again an see if that is still the case. Or just patch ovpn_auth_verify, but I wasn't sure if patching just that was sufficient.

Actions #16

Updated by Marcos M 3 months ago

I updated the patch which should deal with the auth failure.

Actions #17

Updated by Orion Poplawski 3 months ago

Okay, I have that applied and working now. And I don't seem to see a runaway with a simple auth failure. We'll see if anything else triggers it. Thanks.

Actions #18

Updated by Reid Linnemann 3 months ago

  • Assignee set to Reid Linnemann
  • Target version set to 24.03

Since fcgicli is continually calling recv() for 8 bytes, I think it's stuck in its read_packet loop. The socket's been closed by php-fpm but no fastcgi message has been received indicating the request is complete. Probably the socket is closed because php-fpm hits a terminal condition when accepting the connection, like running out of FDs, and just immediately closes the socket without sending any fastcgi messages back. fcgicli then loops on recv() because the return value is >= 0, interpreted as a short read. I'll correct this in devel and add some more foot-shooting protection. Do you have the ability to test this against a devel snapshot?

Actions #19

Updated by Orion Poplawski 3 months ago

I don't have a test instance at the moment, but I've thought about spinning one up for quite a while. I'll see what I can do. Thanks for taking a look at this - it sounds promising.

Actions #20

Updated by Reid Linnemann 3 months ago

What version(s) are you currently running and encountering this? Perhaps a test build of fcgicli would be appropriate.

Actions #21

Updated by Orion Poplawski 3 months ago

pfSense 2.7.2 and pfSense Plus 23.09.1

Actions #22

Updated by Reid Linnemann 3 months ago

  • Status changed from New to Feedback

Should be fixed in https://github.com/pfsense/FreeBSD-ports/commit/c0a12f594ba2a873ffd5ec8974c5582e6283fbdf. The 0 byte read handling is definitely the culprit.

Actions #23

Updated by Jim Pingle 3 months ago

  • Project changed from pfSense Plus to pfSense
  • Subject changed from openvpn.auth-user.php gets stuck at 100% cpu usage after RADIUS auth timed out to ``openvpn.auth-user.php`` gets stuck at 100% CPU usage when RADIUS authentication times out
  • Category changed from OpenVPN to OpenVPN
  • Target version changed from 24.03 to 2.8.0
  • % Done changed from 0 to 100
  • Affected Plus Version deleted (23.01)
  • Plus Target Version set to 24.03
Actions #24

Updated by Jim Pingle 26 days ago

  • Status changed from Feedback to Resolved

No reports of failures since this went in. Can always reopen it if someone can reproduce the problem on current builds.

Actions

Also available in: Atom PDF