Bug #15612
openCaptive Portal with big number of passththrough MAC addresses is causing webgui gateway timeouts, Error 50x, and HA-sync XMLRPC Error
Added by Thomas Hohm 5 months ago. Updated 24 days ago.
0%
Updated by Thomas Hohm 5 months ago
Sorry, submitted by accident without details, here are the details to it:
The problematic behaviours:
1. Editing firewall rules: when I try to edit/save firewall rules, it takes a long time until it is completed; it happens often, that we get a nginx gateway timeout during saving.
2. Editing captive portal zone: when we edit the zone with the high number of passthrough MAC addresses, saving takes a very long time and causes 50x error. The crash reporter does not show any error (see output below), the syslog shows a message about "upstream timed out" (see below).
3. HA sync is failing with xmlrpc default socket timeout (see below)
In some cases the web ui is accessable after some minutes again, in some cases I have to use the SSH cli menu to restart php-fpm in order to make the web ui accessable again.
Crash Reporter:
Crash report begins. Anonymous machine information: amd64 15.0-CURRENT FreeBSD 15.0-CURRENT #0 plus-RELENG_24_03-n256311-e71f834dd81: Fri Apr 19 00:28:14 UTC 2024 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/obj/amd64/Y4MAEJ2R/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/sources/FreeBS Crash report details: No PHP errors found. No FreeBSD crash data found.
XMLRPC alert:
A communications error occurred while attempting to call XMLRPC method restore_config_section: Request timed out due to default_socket_timeout php.ini setting @ 2024-06-26 11:47:28
Syslog entry:
2024/06/28 08:07:14 [error] 18824#101717: *3816 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.10.100.11, server: , request: "POST /services_captiveportal.php?zone=mconweb_premium HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "10.10.100.64:8080", referrer: "https://10.10.100.64:8080/services_captiveportal.php?zone=mconweb_premium"
- the behaviour is the same in Version 23.05 (tested) and also at least 1 version prior to it (as I can remember out of my head)
- we are using a ha cluster of 2x Netgate 1537 with 32 GB RAM & 500 GB SSD each
- we have 600+ mac addresses in the captive portal zone for automatic passthrough. The problems do not occur below 100 addresses.
- we have 2 captive portal zones in total, one with 6ßß+ mac addresses, the other with 0 mac addresses
- we are not using captive portal vouchers (we are using radius authentication with a radius server on a separate non-pfsense system)
- captive portal zones are included in the ha xmlrpc sync settings
- usualy whe have 1000+ users logged in to the captive portal
- as soon as we delete the captive portal zone, all problems are gone
Updated by Thomas Hohm 5 months ago
addition:
- even excluding captive portal from xmlrpc ha sync does not fix the problem.
- I can also export the captive portal settings to XML and import them to a fresh installed system. Even during the import the web ui responds with error 50x or nginx gateway timeout (I`ve seen both, possible different behaviour between 24.03 and 23.05)
Updated by Karl Ruskowski 4 months ago
We've been having the Same-ish Problem.
Main XMLRPC Error:
A communications error occurred while attempting to call XMLRPC method captive_portal_sync: Unable to connect to tls://172.16.1.252:4444. Error: Operation timed out @ 2024-08-22 07:54:18
Syslog:
Aug 2 11:33:44 pfSense01 php-fpm[45974]: /rc.carpmaster: A communications error occurred while attempting to call XMLRPC method captive_portal_sync: Unable to connect to tls://172.16.1.252:4444. Error: Operation timed out
2x Netgate Hardware Version 23.09.1-RELEASE on both
Any changes in the configuration result in many of these errormessages.
Updated by Karl Ruskowski 3 months ago
I was able to solve our problem. Our firewalls weren't syncing at all at closer inspection. I set the same Options under System -> advanced settings -> Webconfigurator and the sync began working again.
Updated by Danilo Zrenjanin about 1 month ago
- Priority changed from Normal to High
I successfully replicated the observed behavior. Both High Availability (HA) nodes were operating on the 24.03 release. Initially, there were two zones with a total of 345 MAC address pass-through entries. The XML-RPC was failing, as indicated by the following logs:
Nov 21 15:12:35 php-fpm 4777 /rc.filter_synchronize: Retrying XMLRPC Request due to error: A communications error occurred while attempting to call XMLRPC method host_firmware_version: Request timed out due to default_socket_timeout php.ini setting.
Upon removing the second zone, which contained 88 entries, the XML-RPC functioned without issues. It is noteworthy that the firewall had no additional packages installed and was configured with only two interfaces during the testing phase.
Updated by Timo C 24 days ago
Subject: Ongoing Issues with pfSense+ Following Update
Hello,
We are still encountering the same issues exclusively with pfSense+. Has there been any progress or changes on this matter? The project was migrated from pfSense Plus to pfSense.
Recently, we updated from 24.03 to 24.11-RELEASE (amd64), built on Fri Nov 22, 05:34:00 CET 2024. However, the update continues to cause significant disruptions to the GUI, with erratic behavior persisting.
Additionally, we've observed that one Phase 2 IKEv2 tunnel is no longer syncing properly via HA, which is particularly concerning.
Could you let us know if a fix is in the works or if there's a timeline for a resolution?
Looking forward to your response.
Kind regards,
Timo