Project

General

Profile

Bug #6406

Web process becomes unresponsive producing 502 Bad Gateway nginx

Added by Alex Vergilis about 1 year ago. Updated 9 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Web Interface
Target version:
Start date:
05/26/2016
Due date:
% Done:

0%

Affected version:
2.3.x
Affected Architecture:

Description

Eventually the web process becomes unresponsive and produces

502 Bad Gateway

nginx

A restart of PHP-FPM addresses the issue, until it happens again about 12 hours later.

Screenshot_36.png (4.97 KB) IT IGP, 06/15/2017 06:07 AM

History

#1 Updated by Kill Bill about 1 year ago

+1; seems pretty replicable here when you leave the dashboard page open in a browser for a couple of hours. (Not 2.3.1 specific, was there with 2.3.0 as well.) Note that restarting the webserver alone (console option 11) does not help at all.

#2 Updated by Chris Buechler about 1 year ago

  • Affected version changed from 2.3.1 to 2.3.x

Kill Bill: 2.3.1_1 fixed the bulk of remaining things there that 2.3.1 didn't. There's still something to this on occasion, but upgrade.

#3 Updated by Kill Bill about 1 year ago

Well, while the original issue with the dashboard seems indeed gone, I managed to make the GUI completely unresponsive when upgrading pfBlockerNG package on two different boxes (2.3.1_1 full install, amd64). After the upgrade completed, the GUI did not return until PHP-FPM restart.

#4 Updated by Xander Venterus about 1 year ago

I am now experiencing this issue on 2.3.1-RELEASE-p1 (i386)

Ive been having intermittent Layer 7 DDoS Attacks for a day or few now, and it seems that after each wave of the attack the web configurator is only returning 502s from inside or outside.

Restarting web configurator does nothing, i have to restart php-fpm to fix the issue.

the flood causing this is the recent WordPress pingback amplification attack exploit, i have mitigated the flooding nodes via cloudflare now, but it would be nice if the web configurator didnt go all 502 error every time a new set of IPs tries the same attack....

#5 Updated by BBcan177 . about 1 year ago

Kill Bill wrote:

Well, while the original issue with the dashboard seems indeed gone, I managed to make the GUI completely unresponsive when upgrading pfBlockerNG package on two different boxes (2.3.1_1 full install, amd64). After the upgrade completed, the GUI did not return until PHP-FPM restart.

By any chance, did you use the "view" button in the Update Tab? Something has recently changed that is affecting that button's "End View" function...

#6 Updated by Kill Bill about 1 year ago

BBcan177 . wrote:

By any chance, did you use the "view" button in the Update Tab? Something has recently changed that is affecting that button's "End View" function...

Yeah, not sure it's related to this issue but I noticed that button got broken as well.

#7 Updated by Xander Venterus about 1 year ago

Confirming this has happenned again on my unit, and this time without any attacks having hit us, i just had to restart the FPM again.

#8 Updated by Kill Bill about 1 year ago

Xander Venterus wrote:

Confirming this has happenned again on my unit, and this time without any attacks having hit us, i just had to restart the FPM again.

Yeah, seen again multiple times on multiple 2.3.1_1 boxes.

#9 Updated by Chris Buechler 12 months ago

  • Target version changed from 2.3.2 to 2.4.0

#10 Updated by Jim Thompson 8 months ago

  • Assignee set to Steve Beaver

#11 Updated by Alex Vergilis 8 months ago

FYI - Still happening on 2.3.2-RELEASE-p1 systems.

#12 Updated by Steve Beaver 8 months ago

Sorry to re-hash this, but since it has just been assigned to me I need an update.

Some of the above responses would indicate this issue is PfBlockerNG specific. Is that the case? Is the problem present if pfB is not installed/active?

#13 Updated by Jim Pingle 8 months ago

There is no known consistent single cause. Some have it with nothing else installed, some other pfBlocker, some with the IPsec widget, others hit it during HA XMLRPC sync. It's possible there is one root cause, or several, but so far it's not been simple to reproduce reliably under controlled conditions.

#14 Updated by Michele Di Maria 6 months ago

Well, to me it started to happen when I readded the "Traffic Graphs" widget. It never happened before without that.

#15 Updated by Romain Cabassot 5 months ago

We upgraded 2 days ago from 2.2.x to 2.3.1p1.
Same issue and no pfB installed.
We have upgraded only one of our two pfsense (the 2.2.x is halted) so we have many sync errors.

List of installed packages:
- Cron
- freeradius2
- Lightsquid
- nmap
- nrpe
- openvpn-client-export
- snort
- squid
- squidGuard

#16 Updated by Bryan Fehl 5 months ago

Steve Beaver wrote:

Sorry to re-hash this, but since it has just been assigned to me I need an update.

Some of the above responses would indicate this issue is PfBlockerNG specific. Is that the case? Is the problem present if pfB is not installed/active?

I just ran into this myself. Strangely, this issue causes all clients who try to connect with OpenVPN to just hang indefinitely. Restarting Web Configurator & PHP fixes the issue. It seems to only happen when i leave the PFsense web gui open in my browser for an extended period of time, like if i leave the Dashboard tab open overnight. The only package i have installed is openvpn-client-export. I hope this helps.

Edit: This is on version 2.3.2-RELEASE

#17 Updated by Jim Pingle 5 months ago

Bryan Fehl wrote:

I just ran into this myself. Strangely, this issue causes all clients who try to connect with OpenVPN to just hang indefinitely.

That's normal, OpenVPN uses PHP scripts for authentication and some certificate verification. So if PHP is wedged, then OpenVPN can't authenticate.

Restarting Web Configurator & PHP fixes the issue. It seems to only happen when i leave the PFsense web gui open in my browser for an extended period of time, like if i leave the Dashboard tab open overnight. The only package i have installed is openvpn-client-export. I hope this helps.

Edit: This is on version 2.3.2-RELEASE

Which dashboard widgets do you have visible?

#18 Updated by Bryan Fehl 5 months ago

Jim Pingle wrote:

Which dashboard widgets do you have visible?

Right now I have the following widgets open:
  • System Information
  • Picture
  • Interfaces
  • S.M.A.R.T. Status
  • Gateways
  • Thermal Sensors
  • CARP Status
  • NTP Status
  • Services Status
  • Traffic Graphs
  • IPSec

I'm removing the IPsec widget based on recommendations I've seen in the forum where people had similar issues. Hopefully that prevents this from reoccurring.

#19 Updated by Alex Vergilis 5 months ago

Just Restarted PHP-FPM on a system with the following (no pfblocker installed):

  • System Information
  • Traffic Graphs
  • Interfaces
  • Gateways
  • IPsec
  • Interface Statistics

#20 Updated by John Silva 5 months ago

I've seen this symptom frequently with pfBlockerNG and large lists. I also don't run the IPsec widget.

The common thread I noticed is that there is a php process consuming 100% CPU for an extended length of time. It seems like there is some resource blocking somewhere.

What I've done that appears to help is increase the webUI process limit from 2 to 4. It's not perfect, but the instances of the webUI becoming totally unresponsive (and returning the 502 gateway error) have been fewer since making this change.

#21 Updated by IT IGP 9 days ago

we are as well getting this randomly every few days for a few months now. running always latest stable.
reproduction: leave dashboard page open. we have widgets added for GW, Interfaces, Interface Stats, Traffic Graphs, IPSec.
workaround: console/SSH option "Restart PHP-FPM".

not sure when it started this time, if there is a specific/different intial message when it starts, but the following is what you see repeating itself in the logs in the state "bad gateway":

/var/log/nginx.log

...
Jun 15 12:12:48 pfs1 pfs1.xxx nginx: 2017/06/15 12:12:48 [error] 27069#100118: *322688 connect() to unix:/var/run/php-fpm.socket failed (61: Connection refused) while connecting to upstream, client: 192.168.0.5, server: , request: "GET /getstats.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket:", host: "192.168.0.19", referrer: "https://192.168.0.19/" 
Jun 15 12:12:48 pfs1 pfs1.xxx nginx: 192.168.0.5 - - [15/Jun/2017:12:12:48 +0200] "GET /getstats.php HTTP/1.1" 502 568 "https://192.168.0.19/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" 
...

/var/log/system.log

...
Jun 15 12:46:39 pfs1 check_reload_status: Could not connect to /var/run/php-fpm.socket
Jun 15 12:46:40 pfs1 check_reload_status: Could not connect to /var/run/php-fpm.socket
Jun 15 12:46:40 pfs1 kernel: sonewconn: pcb 0xfffff800d61ccc30: Listen queue overflow: 193 already in queue awaiting acceptance (480 occurrences)
Jun 15 12:46:40 pfs1 check_reload_status: Could not connect to /var/run/php-fpm.socket
...

Also available in: Atom PDF