Bug #6406
closedWeb process becomes unresponsive producing 502 Bad Gateway nginx
Added by Alex Vergilis over 8 years ago. Updated over 6 years ago.
0%
Description
Eventually the web process becomes unresponsive and produces
502 Bad Gateway nginx
A restart of PHP-FPM addresses the issue, until it happens again about 12 hours later.
Files
Updated by Kill Bill over 8 years ago
+1; seems pretty replicable here when you leave the dashboard page open in a browser for a couple of hours. (Not 2.3.1 specific, was there with 2.3.0 as well.) Note that restarting the webserver alone (console option 11) does not help at all.
Updated by Chris Buechler over 8 years ago
- Affected Version changed from 2.3.1 to 2.3.x
Kill Bill: 2.3.1_1 fixed the bulk of remaining things there that 2.3.1 didn't. There's still something to this on occasion, but upgrade.
Updated by Kill Bill over 8 years ago
Well, while the original issue with the dashboard seems indeed gone, I managed to make the GUI completely unresponsive when upgrading pfBlockerNG package on two different boxes (2.3.1_1 full install, amd64). After the upgrade completed, the GUI did not return until PHP-FPM restart.
Updated by Xander Venterus over 8 years ago
I am now experiencing this issue on 2.3.1-RELEASE-p1 (i386)
Ive been having intermittent Layer 7 DDoS Attacks for a day or few now, and it seems that after each wave of the attack the web configurator is only returning 502s from inside or outside.
Restarting web configurator does nothing, i have to restart php-fpm to fix the issue.
the flood causing this is the recent WordPress pingback amplification attack exploit, i have mitigated the flooding nodes via cloudflare now, but it would be nice if the web configurator didnt go all 502 error every time a new set of IPs tries the same attack....
Updated by BBcan177 . over 8 years ago
Kill Bill wrote:
Well, while the original issue with the dashboard seems indeed gone, I managed to make the GUI completely unresponsive when upgrading pfBlockerNG package on two different boxes (2.3.1_1 full install, amd64). After the upgrade completed, the GUI did not return until PHP-FPM restart.
By any chance, did you use the "view" button in the Update Tab? Something has recently changed that is affecting that button's "End View" function...
Updated by Kill Bill over 8 years ago
BBcan177 . wrote:
By any chance, did you use the "view" button in the Update Tab? Something has recently changed that is affecting that button's "End View" function...
Yeah, not sure it's related to this issue but I noticed that button got broken as well.
Updated by Xander Venterus over 8 years ago
Confirming this has happenned again on my unit, and this time without any attacks having hit us, i just had to restart the FPM again.
Updated by Kill Bill over 8 years ago
Xander Venterus wrote:
Confirming this has happenned again on my unit, and this time without any attacks having hit us, i just had to restart the FPM again.
Yeah, seen again multiple times on multiple 2.3.1_1 boxes.
Updated by Chris Buechler over 8 years ago
- Target version changed from 2.3.2 to 2.4.0
Updated by Alex Vergilis about 8 years ago
FYI - Still happening on 2.3.2-RELEASE-p1 systems.
Updated by Anonymous about 8 years ago
Sorry to re-hash this, but since it has just been assigned to me I need an update.
Some of the above responses would indicate this issue is PfBlockerNG specific. Is that the case? Is the problem present if pfB is not installed/active?
Updated by Jim Pingle about 8 years ago
There is no known consistent single cause. Some have it with nothing else installed, some other pfBlocker, some with the IPsec widget, others hit it during HA XMLRPC sync. It's possible there is one root cause, or several, but so far it's not been simple to reproduce reliably under controlled conditions.
Updated by Michele Di Maria almost 8 years ago
Well, to me it started to happen when I readded the "Traffic Graphs" widget. It never happened before without that.
Updated by Romain Cabassot almost 8 years ago
We upgraded 2 days ago from 2.2.x to 2.3.1p1.
Same issue and no pfB installed.
We have upgraded only one of our two pfsense (the 2.2.x is halted) so we have many sync errors.
List of installed packages:
- Cron
- freeradius2
- Lightsquid
- nmap
- nrpe
- openvpn-client-export
- snort
- squid
- squidGuard
Updated by Bryan Fehl almost 8 years ago
Steve Beaver wrote:
Sorry to re-hash this, but since it has just been assigned to me I need an update.
Some of the above responses would indicate this issue is PfBlockerNG specific. Is that the case? Is the problem present if pfB is not installed/active?
I just ran into this myself. Strangely, this issue causes all clients who try to connect with OpenVPN to just hang indefinitely. Restarting Web Configurator & PHP fixes the issue. It seems to only happen when i leave the PFsense web gui open in my browser for an extended period of time, like if i leave the Dashboard tab open overnight. The only package i have installed is openvpn-client-export. I hope this helps.
Edit: This is on version 2.3.2-RELEASE
Updated by Jim Pingle almost 8 years ago
Bryan Fehl wrote:
I just ran into this myself. Strangely, this issue causes all clients who try to connect with OpenVPN to just hang indefinitely.
That's normal, OpenVPN uses PHP scripts for authentication and some certificate verification. So if PHP is wedged, then OpenVPN can't authenticate.
Restarting Web Configurator & PHP fixes the issue. It seems to only happen when i leave the PFsense web gui open in my browser for an extended period of time, like if i leave the Dashboard tab open overnight. The only package i have installed is openvpn-client-export. I hope this helps.
Edit: This is on version 2.3.2-RELEASE
Which dashboard widgets do you have visible?
Updated by Bryan Fehl almost 8 years ago
Jim Pingle wrote:
Right now I have the following widgets open:Which dashboard widgets do you have visible?
- System Information
- Picture
- Interfaces
- S.M.A.R.T. Status
- Gateways
- Thermal Sensors
- CARP Status
- NTP Status
- Services Status
- Traffic Graphs
- IPSec
I'm removing the IPsec widget based on recommendations I've seen in the forum where people had similar issues. Hopefully that prevents this from reoccurring.
Updated by Alex Vergilis almost 8 years ago
Just Restarted PHP-FPM on a system with the following (no pfblocker installed):
- System Information
- Traffic Graphs
- Interfaces
- Gateways
- IPsec
- Interface Statistics
Updated by John Silva almost 8 years ago
I've seen this symptom frequently with pfBlockerNG and large lists. I also don't run the IPsec widget.
The common thread I noticed is that there is a php process consuming 100% CPU for an extended length of time. It seems like there is some resource blocking somewhere.
What I've done that appears to help is increase the webUI process limit from 2 to 4. It's not perfect, but the instances of the webUI becoming totally unresponsive (and returning the 502 gateway error) have been fewer since making this change.
Updated by IT IGP over 7 years ago
- File Screenshot_36.png Screenshot_36.png added
we are as well getting this randomly every few days for a few months now. running always latest stable.
reproduction: leave dashboard page open. we have widgets added for GW, Interfaces, Interface Stats, Traffic Graphs, IPSec.
workaround: console/SSH option "Restart PHP-FPM".
not sure when it started this time, if there is a specific/different intial message when it starts, but the following is what you see repeating itself in the logs in the state "bad gateway":
/var/log/nginx.log
... Jun 15 12:12:48 pfs1 pfs1.xxx nginx: 2017/06/15 12:12:48 [error] 27069#100118: *322688 connect() to unix:/var/run/php-fpm.socket failed (61: Connection refused) while connecting to upstream, client: 192.168.0.5, server: , request: "GET /getstats.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket:", host: "192.168.0.19", referrer: "https://192.168.0.19/" Jun 15 12:12:48 pfs1 pfs1.xxx nginx: 192.168.0.5 - - [15/Jun/2017:12:12:48 +0200] "GET /getstats.php HTTP/1.1" 502 568 "https://192.168.0.19/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" ...
/var/log/system.log
... Jun 15 12:46:39 pfs1 check_reload_status: Could not connect to /var/run/php-fpm.socket Jun 15 12:46:40 pfs1 check_reload_status: Could not connect to /var/run/php-fpm.socket Jun 15 12:46:40 pfs1 kernel: sonewconn: pcb 0xfffff800d61ccc30: Listen queue overflow: 193 already in queue awaiting acceptance (480 occurrences) Jun 15 12:46:40 pfs1 check_reload_status: Could not connect to /var/run/php-fpm.socket ...
Updated by Christoffer Öhman over 7 years ago
I can not even use it before it locks.
As soon as I try to change something, it loads a really long time before it locks.
Updated by Bryan Fehl over 7 years ago
Christoffer Öhman wrote:
I can not even use it before it locks.
As soon as I try to change something, it loads a really long time before it locks.
Do you have the IPSec widget on your dashboard? I removed that widget months ago and i haven't had this issue pop up since.
Updated by Christoffer Öhman over 7 years ago
Bryan Fehl wrote:
Christoffer Öhman wrote:
I can not even use it before it locks.
As soon as I try to change something, it loads a really long time before it locks.
Do you have the IPSec widget on your dashboard? I removed that widget months ago and i haven't had this issue pop up since.
I'm sure I do not have IPSec up in the dashboard.
Updated by Anonymous over 7 years ago
- Target version changed from 2.4.0 to 2.4.1
Updated by Alex Vergilis over 7 years ago
pfsense team:
Why is this bug being pushed back to another release yet again to a date that has not been determined? This issues causes an outage everyday. Lots of people are reporting this issue here and in the forums for over a year now.
I will be more than happy to volunteer my time to assist you to get to the bottom of this.
Please let me know how I can help.
Updated by Anonymous over 7 years ago
Thanks for your offer. I have been working on this issue all week, sadly without getting very far because each diagnostic step takes so long.
What I am doing is cranking the polling frequency up to maximum (General setup page) and adding widgets one at a time. I'm trying to learn if this issue is caused by a particular widget, a combination of widgets, the polling frequency or ? I'm also watching the memory state, cpu load etc as I make each test.
If you have the ability to do that type of testing, or have any other ideas on the subject I would love to hear.
Thanks!
Updated by Alex Vergilis over 7 years ago
All I have to do to cause this is just leave the dashboard web page open. The problem happens anywhere from 1 hour to a day or so - across about 75 firewalls. I have started to close the web page to minimize the burden of the outages.
I have 3 columns with 5 second updates. The following widgets are in the dashboard for majority of the systems: System Information, Interfaces, Gateways, Interface Statistics, IPSec, NTP Status, Traffic Graph (1 sec updates)
Updated by Chris Collins about 7 years ago
Having a fair amount of experience myself managing php hosting systems I can offer some thoughts.
On my own pfsense unit I have seen this behaviour, when I was watching the system whilst it was happening I observed it was caused by background php scripts been busy perhaps tieing up the php-fpm server processes.
As an experiment I manually adjusted the php-fpm server configuration so there is more children running and the problem went away since.
It can be adjusted in /usr/local/etc/php-fpm.conf if anyone wants to experiment.
Given 2.4.1 wont support nano type systems anymore I expect memory usage can be loosened up a bit in terms of how restrictive things are configured to save resources.
Updated by Kill Bill about 7 years ago
Chris Collins wrote:
As an experiment I manually adjusted the php-fpm server configuration so there is more children running and the problem went away since.
It can be adjusted in /usr/local/etc/php-fpm.conf if anyone wants to experiment.
Given 2.4.1 wont support nano type systems anymore I expect memory usage can be loosened up a bit in terms of how restrictive things are configured to save resources.
The low number of processes/children apparently also is an issue with busy captive portals: https://forum.pfsense.org/index.php?topic=136847
So yeah, this should be relaxed by default since nano + i386 are gone, plus a GUI knob to have this configurable would be useful.
Updated by Jim Pingle about 7 years ago
- Target version changed from 2.4.1 to 2.4.2
There have been some fixes here in the IPsec widget and pfBlocker which may help - moving this forward in case there are any other issues that linger
Updated by Anonymous about 7 years ago
Running for 12+hours, dashboard up, IPSec widget (and many others including pfBlocker) loaded, no issues. (2.4.2.a.20171103.1355)
Update: been running for well over a day, still no problems.
Updated by Serrjo Downe over 6 years ago
Chris Collins wrote:
As an experiment I manually adjusted the php-fpm server configuration so there is more children running and the problem went away since.
It can be adjusted in /usr/local/etc/php-fpm.conf if anyone wants to experiment.
Given 2.4.1 wont support nano type systems anymore I expect memory usage can be loosened up a bit in terms of how restrictive things are configured to save resources.
I was having this problem and changing max children to 10 fixed my issue - Thank you kind sir!