Project

General

Profile

Actions

Bug #15612

open

Captive Portal with big number of passththrough MAC addresses is causing webgui gateway timeouts, Error 50x, and HA-sync XMLRPC Error

Added by Thomas Hohm about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Captive Portal
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Release Notes:
Default
Affected Plus Version:
24.03
Affected Architecture:

Actions #1

Updated by Thomas Hohm about 1 month ago

Sorry, submitted by accident without details, here are the details to it:

The problematic behaviours:

1. Editing firewall rules: when I try to edit/save firewall rules, it takes a long time until it is completed; it happens often, that we get a nginx gateway timeout during saving.
2. Editing captive portal zone: when we edit the zone with the high number of passthrough MAC addresses, saving takes a very long time and causes 50x error. The crash reporter does not show any error (see output below), the syslog shows a message about "upstream timed out" (see below).
3. HA sync is failing with xmlrpc default socket timeout (see below)

In some cases the web ui is accessable after some minutes again, in some cases I have to use the SSH cli menu to restart php-fpm in order to make the web ui accessable again.

Crash Reporter:

Crash report begins.  Anonymous machine information:

amd64
15.0-CURRENT
FreeBSD 15.0-CURRENT #0 plus-RELENG_24_03-n256311-e71f834dd81: Fri Apr 19 00:28:14 UTC 2024     root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/obj/amd64/Y4MAEJ2R/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/sources/FreeBS

Crash report details:

No PHP errors found.

No FreeBSD crash data found.

XMLRPC alert:

A communications error occurred while attempting to call XMLRPC method restore_config_section: Request timed out due to default_socket_timeout php.ini setting @ 2024-06-26 11:47:28

Syslog entry:

2024/06/28 08:07:14 [error] 18824#101717: *3816 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.10.100.11, server: , request: "POST /services_captiveportal.php?zone=mconweb_premium HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "10.10.100.64:8080", referrer: "https://10.10.100.64:8080/services_captiveportal.php?zone=mconweb_premium" 

- the behaviour is the same in Version 23.05 (tested) and also at least 1 version prior to it (as I can remember out of my head)
- we are using a ha cluster of 2x Netgate 1537 with 32 GB RAM & 500 GB SSD each
- we have 600+ mac addresses in the captive portal zone for automatic passthrough. The problems do not occur below 100 addresses.
- we have 2 captive portal zones in total, one with 6ßß+ mac addresses, the other with 0 mac addresses
- we are not using captive portal vouchers (we are using radius authentication with a radius server on a separate non-pfsense system)
- captive portal zones are included in the ha xmlrpc sync settings
- usualy whe have 1000+ users logged in to the captive portal
- as soon as we delete the captive portal zone, all problems are gone

Actions #2

Updated by Thomas Hohm about 1 month ago

addition:
- even excluding captive portal from xmlrpc ha sync does not fix the problem.
- I can also export the captive portal settings to XML and import them to a fresh installed system. Even during the import the web ui responds with error 50x or nginx gateway timeout (I`ve seen both, possible different behaviour between 24.03 and 23.05)

Actions

Also available in: Atom PDF