Project

General

Profile

Bug #6318

IPsec dashboard widget causes GUI failure

Added by Rick Strangman 12 months ago. Updated 16 days ago.

Status:
Confirmed
Priority:
Normal
Assignee:
Category:
Web Interface
Target version:
Start date:
05/05/2016
Due date:
% Done:

0%

Affected version:
2.3.x
Affected Architecture:

Description

Since 2.3_1 the webconfigurator is continually being non responsive. Attempting to access my https website on port 444 the page hangs and eventually responds with 504 Gateway Time-out - nginx on both IE & Firefox. The nginx-error log file shows the following:

2016/05/05 18:27:54 [error] 30498#0: *254302 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 192.168.xxx.10, server: , request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "pf.xxxxx.biz:444"
2016/05/05 19:17:32 [alert] 30180#0: close() socket failed (9: Bad file descriptor)
2016/05/05 19:20:44 [error] 87811#0: *12 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 192.168.xxx.10, server: , request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "pf.xxxxx.biz:444"
2016/05/05 19:27:58 [error] 87811#0: *447 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 192.168.xxx.10, server: , request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "pf.xxxxx.biz:444"

Restarting the webconfigurator from the console does not resolve the issue.
Other than the web not functioning, the firewall is performing as normal.

php-stuck-truss-04.txt Magnifier (44.7 KB) Jim Pingle, 12/22/2016 02:39 PM

Associated revisions

Revision fa01d062
Added by Chris Buechler 11 months ago

Set request_terminate_timeout to the same as max_execution_time in case something executed externally doesn't respond, to avoid hanging up all of php-fpm eventually. Ticket #6318 among other similar potential issues.

Revision 42d2f11a
Added by Chris Buechler 11 months ago

Set request_terminate_timeout to the same as max_execution_time in case something executed externally doesn't respond, to avoid hanging up all of php-fpm eventually. Ticket #6318 among other similar potential issues.

Revision 062a5434
Added by Chris Buechler 11 months ago

Set request_terminate_timeout to the same as max_execution_time in case something executed externally doesn't respond, to avoid hanging up all of php-fpm eventually. Ticket #6318 among other similar potential issues.

History

#1 Updated by Chris Buechler 12 months ago

  • Status changed from New to Feedback
  • Affected version deleted (2.3.1)

that's 2.3(.0)_1 rather than 2.3.1. It wasn't 2.3->2.3_1 that did it, since that only upgraded ntpd, rather something that would have happened on 2.3 as well. I'm guessing it's one of two things. Either something related to #6177, or there seems to be some kind of issue with the IPsec dashboard widget causing that to happen for a few people.

If this is replicable for you, if you have the IPsec dashboard widget enabled, please try to remove that and see if that fixes the problem. That'll at least tell us where the issue resides.

#2 Updated by Brent Kerlin 11 months ago

Chris Buechler wrote:

that's 2.3(.0)_1 rather than 2.3.1. It wasn't 2.3->2.3_1 that did it, since that only upgraded ntpd, rather something that would have happened on 2.3 as well. I'm guessing it's one of two things. Either something related to #6177, or there seems to be some kind of issue with the IPsec dashboard widget causing that to happen for a few people.

If this is replicable for you, if you have the IPsec dashboard widget enabled, please try to remove that and see if that fixes the problem. That'll at least tell us where the issue resides.

I have seen this issue frequently on clients since 2.3 rolled. I was more concerned with #6296 which was causing me many headaches, but I will try removing the IPSec widget on a few sites and report back (I have one with the webgui locked up right now who is a prime candidate). Any log dumps that would be helpful?

#3 Updated by Brent Kerlin 11 months ago

Restarting the webconfigurator from the console does not resolve the issue.
Other than the web not functioning, the firewall is performing as normal.

Try restarting PHP-FPM from the console. That seems to clear up the issue for me...

#4 Updated by Brent Kerlin 11 months ago

Brent Kerlin wrote:

I have seen this issue frequently on clients since 2.3 rolled. I was more concerned with #6296 which was causing me many headaches, but I will try removing the IPSec widget on a few sites and report back (I have one with the webgui locked up right now who is a prime candidate). Any log dumps that would be helpful?

I have removed the IPSec Widget from all the sites at which I was having this PHP-FPM issue. I'll report back in a couple days or if the problem persists.

#5 Updated by Rick Strangman 11 months ago

I have no issues since removing the IPsec widget. Now on 2.3.1 and have not seen a lockup

#6 Updated by Chris Buechler 11 months ago

  • Subject changed from pfsense webconfigurator to IPsec dashboard widget causes GUI failure
  • Status changed from Feedback to Confirmed
  • Target version set to 2.3.2
  • Affected version set to 2.3.x
  • Affected Architecture deleted (amd64)

#7 Updated by Steve Beaver 11 months ago

I have looked through the code again and nothing really stands out.

It would be helpful to know:

  • How many tunnels do people have in cases where the issue is seen?
  • Does it make any difference if the widget is set to show Overview, Tunnels, or Mobile?

THanks!

#8 Updated by Chris Buechler 11 months ago

Steve Beaver wrote:

I have looked through the code again and nothing really stands out.

Ditto. Heard of roughly a handful of reports of this, but never seen it myself. Additional details would be appreciated.

#9 Updated by Chris Buechler 11 months ago

Thanks to Alex for getting me into an affected system. It's occasionally getting stuck in pfSense_ipsec_list_sa, without triggering any of the printfs there.

PHP_FUNCTION(pfSense_ipsec_list_sa) {

        vici_conn_t *conn;
        vici_req_t *req;
        vici_res_t *res;

        array_init(return_value);

        vici_init();
        conn = vici_connect(NULL);
        if (conn) {
                if (vici_register(conn, "list-sa", build_ipsec_sa_array, (void *) return_value) != 0) {
                        php_printf("VICI registration failed: %s\n", strerror(errno));
                } else {
                        req = vici_begin("list-sas");
                        res = vici_submit(req, conn);
                        if (res) {
                                vici_free_res(res);
                        }
                }
                vici_disconnect(conn);
        } else {
                php_printf("VICI connection failed: %s\n", strerror(errno));
        }

        vici_deinit();

}

What I committed on this ticket should prevent this (and many other possible failure scenarios with commands that don't return) from killing the GUI. request_terminate_timeout will kill them off after 900 seconds. It only happens once every few minutes when continually refreshing a page that uses that function, so that's been enough to keep Alex's system from killing the GUI again.

#10 Updated by Chris Buechler 10 months ago

  • Target version changed from 2.3.2 to 2.4.0

#11 Updated by Jim Pingle 4 months ago

This also affects Status > IPsec

We have access to a customer system that has 70 tunnels defined, and it happens every 5-20 minutes (timing varies) while a browser is left on Status > IPsec. The requests are not piling up, they only take about 300ms to complete. Leaving a browser open on Status > IPsec with firebug or similar running, it's easy to spot when it stops responding.

When it happens, there are always two PHP child processes:

: ps uxawww | grep '[p]hp'
root   64113   0.5  0.9 272496 38300  -  S     1:43PM    0:02.26 php-fpm: pool nginx (php-fpm)
root     267   0.0  0.6 268400 25140  -  Ss    3:01AM    0:02.49 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
root   64043   0.0  0.9 285304 38604  -  I     1:43PM    0:00.19 php-fpm: pool nginx (php-fpm)

Attempting to run a truss on the top process (In state "S", sleeping) shows no output at all

Running truss on the other process (In state "I", idle) outputs info and then the browser gets a response. So long as the truss happens before the browser times out, everything keeps running. The truss output is attached. I have several more copies of truss output from other times I reproduced the issue, but they are all very close if not identical. I find it odd that merely attaching to the process with truss is somehow waking it up and causing it to proceed. I've tried hitting the process with other signals like kill -HUP but so far nothing brings it back to life but touching it with truss, or killing/restarting PHP-FPM.

There isn't much that happens in the AJAX request being made for Status > IPsec or the IPsec widget, it could be getting stuck in vici interaction.

#12 Updated by Jim Thompson 4 months ago

  • Assignee set to Steve Beaver

#13 Updated by Nick Wenos 2 months ago

We are also having what appears to be the same issue running on version 2.3.2 As a side affect of php-fpm going down our OpenVPN clients also lose the ability to connect until we restart php-fpm and openvpn. I don't know if this would affect all OpenVPN or just those using ssl cert authentication as is the case with our setup.

#14 Updated by Eric Machabert 2 months ago

Nick Wenos wrote:

We are also having what appears to be the same issue running on version 2.3.2 As a side affect of php-fpm going down our OpenVPN clients also lose the ability to connect until we restart php-fpm and openvpn. I don't know if this would affect all OpenVPN or just those using ssl cert authentication as is the case with our setup.

We are also seeing this on 2.3.3
Running netstat -an shows request filling up the Recv-Q for IPC connection to /var/run/php-fpm.socket.

#15 Updated by Chris Baker 16 days ago

I am also seeing this on 2.3.3. Is there any known work around other than removing the ipsec widget? Maybe changing the polling frequency?

Also available in: Atom PDF