Project

General

Profile

Actions

Bug #3986

closed

BandwidthD can break php-fpm in unknown rare edge case

Added by Russell Morris over 9 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
bandwidthd
Target version:
-
Start date:
11/04/2014
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Affected Version:
2.2
Affected Plus Version:
Affected Architecture:

Description

Hi,

Having a lot of struggles with BandwidthD in v2.2 More info here,
https://forum.pfsense.org/index.php?topic=78175.msg456588#msg456588

Many recent observations in the last few posts. It also breaks the WebGUI, but a fix for that (from another user!) in the post also.

Thanks!


Files

bandwidthd-215-CPU-loop.txt (2.6 KB) bandwidthd-215-CPU-loop.txt Phillip Davis, 11/04/2014 08:39 PM
Actions #1

Updated by Phillip Davis over 9 years ago

In addition to the 2.2 issue of it somehow taking over php-fpm and thus breaking webGUI and...
I will note here that it goes into a CPU loop on 2.1.5 systems. It does not happen all the time, but can happen directly on bootup, like in the attached text file. I just rebooted an Alix 2D13 this morning and bandwidthd was chewing up all CPU for ages, so I killed it. It may be a different problem, or may prove to be related.
Since we can't really use bandwidthd on 2.2 at this time, it is not possible to say if the CPU loop behaviour also happens on 2.2

Actions #2

Updated by Chris Buechler over 9 years ago

  • Status changed from New to Confirmed
  • Target version set to 2.2
Actions #3

Updated by Chris Buechler over 9 years ago

  • Subject changed from BandwidthD broken in v2.2 to BandwidthD can break php-fpm in unknown rare edge case
  • Status changed from Confirmed to Feedback
  • Target version deleted (2.2)

bandwidthd works in general on 2.2 now. The issue Phil noted with php-fpm may still be a problem in some circumstances.
split discussion on the forum on any remaining issues to:
https://forum.pfsense.org/index.php?topic=84642.0

Actions #4

Updated by Phillip Davis over 9 years ago

I will try a few variations again to see if I can break anything. I did see it in a CPU loop on a 2.1.5 system last week and disabled it (didn't have time to play then).
Issues from the past that I have never got to be able to reproduce and really get to the bottom of are:

1) After an install it sometimes gives some errors about a lib*.so not being found and the binary will not start. Only noticed that myself on 2.1.5 and some time ago.

2) Seen it go into a CPU loop on 2.1.5 systems. Have not run it enough with realistic throughput... on a 2.2 system to know if the CPU loop can happen there. My 2.1.5 production systems are logging bandwidthd to a postgreSQL database at the main office, so some are over OpenVPN links. It could be an issue that happens if the OpenVPN site-to-site link is down or the database server is otherwise off-line. That is really just a thought, not even a tested theory.

3) Was known to cause the webGUI to become unavailable, and php-fpm not to respond in general on 2.2. Have not seen that recently.

If anyone experiences, and even can reproduce, any of the above issues (or other issues - hope there are no more) please post to here or the forum thread that Chris referenced. Would like to know all the details, what type of install (32/64 bit, nanoBSD or full, real hardware or VM, logging the data to itself or an SQL server, is the SQL server on a local subnet or across a link...)

Actions #5

Updated by Russell Morris over 9 years ago

Hi,

Yes, I can definitely reproduce this - just installed the latest version of pfSense (v2.2, from today), and I do find that the GUI is broken after the upgrade (have to manually restart it). And after boot I find two copies of bandwidthd running ... have to kill them, manually start this also.

To your questions - this is a full install on real HW, logging to an SQL server (on another machine).

Any things to check to help debug this?

Thanks!

Actions #6

Updated by Russell Morris over 9 years ago

Hi,

FYI, I just added some new observations to the forum post, https://forum.pfsense.org/index.php?topic=84642.0

But it is seeming to me like there is a deeper interaction here - not limited to bandwidthd, but also including OpenVPN, ntopng and php-fpm (and others?).

Thanks!

Actions #7

Updated by Tom Peeters about 9 years ago

I 'm having the same issues with bandwithd and would like to help to solve this.

Running release 2.2.1

  1. Installed package
  2. Going to Services >>> bandwithd >>> enable >>> save >>> GUI crash
  3. Restarting PHP-FPM from shell to fix GUI

config.xml:

        <bandwidthd>
            <config>
                <enable>on</enable>
                <active_interface>opt6</active_interface>
                <subnets_custom>192.168.128.0/21</subnets_custom>
                <skipintervals/>
                <graphcutoff>512</graphcutoff>
                <promiscuous/>
                <outputcdf/>
                <recovercdf/>
                <outputpostgresql/>
                <postgresqlhost/>
                <postgresqldatabase/>
                <postgresqlusername/>
                <postgresqlpassword/>
                <sensorid/>
                <filter/>
                <drawgraphs/>
                <meta_refresh/>
                <graph_log_info/>
            </config>
        </bandwidthd>

system.log:

Mar 24 10:13:32 firewall php-fpm[60155]: /pkg_mgr_install.php: Beginning package installation for bandwidthd .
Mar 24 10:13:33 firewall check_reload_status: Syncing firewall
Mar 24 10:13:45 firewall check_reload_status: Syncing firewall
Mar 24 10:13:45 firewall php-fpm[60155]: /pkg_mgr_install.php: Successfully installed package: bandwidthd.
Mar 24 10:13:46 firewall check_reload_status: Reloading filter
Mar 24 10:14:01 firewall check_reload_status: Syncing firewall
Mar 24 10:14:01 firewall php-fpm[16579]: /pkg_edit.php: The command '/usr/local/etc/rc.d/bandwidthd.sh stop' returned exit code '1', the output was 'No matching processes were found' 
Mar 24 10:14:01 firewall bandwidthd: Monitoring subnet 255.255.255.252 with netmask 255.255.255.252
Mar 24 10:14:01 firewall bandwidthd: Monitoring subnet 255.255.248.0 with netmask 255.255.248.0
Mar 24 10:14:01 firewall bandwidthd: Opening em1_vlan70
Mar 24 10:14:01 firewall bandwidthd: Packet Encoding: Ethernet
Mar 24 10:14:29 firewall check_reload_status: Syncing firewall
Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: unix:/var/run/php-fpm.socket 
Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.3346) response not received, request sent: 1434 on socket: unix:/var/run/php-fpm.socket for /pkg_edit.php?, closing connection 
Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.1754) connect failed: No such file or directory on unix:/var/run/php-fpm.socket 
Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1 
Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.3587) all handlers for /ifstats.php?if=em5_vlan131 on .php are down. 
Mar 24 10:14:30 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:31 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:32 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:32 firewall lighttpd[42020]: (mod_fastcgi.c.2848) fcgi-server re-enabled: unix:/var/run/php-fpm.socket 
Mar 24 10:14:32 firewall lighttpd[42020]: (mod_fastcgi.c.1754) connect failed: No such file or directory on unix:/var/run/php-fpm.socket 
Mar 24 10:14:32 firewall lighttpd[42020]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1 
Mar 24 10:14:32 firewall lighttpd[42020]: (mod_fastcgi.c.3587) all handlers for /ifstats.php?if=em4_vlan99 on .php are down. 
Mar 24 10:14:33 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:34 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:35 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:35 firewall lighttpd[42020]: (mod_fastcgi.c.2848) fcgi-server re-enabled: unix:/var/run/php-fpm.socket 
Mar 24 10:14:35 firewall lighttpd[42020]: (mod_fastcgi.c.1754) connect failed: No such file or directory on unix:/var/run/php-fpm.socket 
Mar 24 10:14:35 firewall lighttpd[42020]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1 
Mar 24 10:14:35 firewall lighttpd[42020]: (mod_fastcgi.c.3587) all handlers for /ifstats.php?if=em2_vlan98 on .php are down. 
Mar 24 10:14:36 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:37 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:38 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:38 firewall lighttpd[42020]: (mod_fastcgi.c.2848) fcgi-server re-enabled: unix:/var/run/php-fpm.socket 
Mar 24 10:14:38 firewall lighttpd[42020]: (mod_fastcgi.c.1754) connect failed: No such file or directory on unix:/var/run/php-fpm.socket 
Mar 24 10:14:38 firewall lighttpd[42020]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1 
Mar 24 10:14:39 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket
Mar 24 10:14:39 firewall rc.php-fpm_restart[85880]: >>> Restarting php-fpm

The logging "check_reload_status: Could not connect to /var/run/php-fpm.socket" errors disappear when I disable bandwithd

Feel free to ask any other thing to test.

Actions #8

Updated by Gabor Tjong A Hung about 9 years ago

I confirm that I have bandwithd installed, but it isn't enabled!

My syslog is filled with these:

$cat /var/log/system.log
Apr 15 10:20:54 pfSense check_reload_status: Could not connect to /var/run/php-fpm.socket

After running (See https://forum.pfsense.org/index.php?topic=84642.0):
/usr/local/sbin/php-fpm -c /usr/local/lib/php.ini -y /usr/local/lib/php-fpm.conf -RD

The Web GUI works again.

$uname -a
FreeBSD pfSense.amplify.local 10.1-RELEASE-p4 FreeBSD 10.1-RELEASE-p4 #0 36d7dec(releng/10.1)-dirty: Thu Jan 22 15:27:13 CST 2015     root@pfsense-22-i386-builder:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_wrap_vga.10.i386  i386

config.xml:

<package>
                        <name>bandwidthd</name>
                        <website>http://bandwidthd.sourceforge.net/</website>
                        <descr><![CDATA[BandwidthD tracks usage of TCP/IP network subnets and builds html files with graphs to display utilization. Charts are built by individual IPs, and by default display utilization over 2 day, 8 day, 40 day, and 400 day periods. Furthermore, each ip address's utilization can be logged out at intervals of 3.3 minutes, 10 minutes, 1 hour or 12 hours in cdf format, or to a backend database server. HTTP, TCP, UDP, ICMP, VPN, and P2P traffic are color coded.]]></descr>
                        <category>System</category>
                        <version>2.0.1_6 pkg v.0.5</version>
                        <status>BETA</status>
                        <required_version>2.2</required_version>
                        <depends_on_package_pbi>bandwidthd-2.0.1_6-i386.pbi</depends_on_package_pbi>
                        <config_file>https://packages.pfsense.org/packages/config/bandwidthd/bandwidthd.xml</config_file>
                        <configurationfile>bandwidthd.xml</configurationfile>
                        <build_pbi>
                                <ports_before>net/libpcap databases/postgresql91-client graphics/gd</ports_before>
                                <port>net-mgmt/bandwidthd</port>
                        </build_pbi>
                        <build_options>WITH_NLS=true;WITHOUT_PAM=true;WITHOUT_LDAP=true;WITHOUT_MIT_KRB5=true;WITHOUT_HEIMDAL_KRB5=true;WITHOUT_OPTIMIZED_CFLAGS=true;WITHOUT_XML=true;WITHOUT_TZDATA=true;WITHOUT_DEBUG=true;WITHOUT_GSSAPI=true;WITHOUT_ICU=true;WITH_INTDATE=true</build_options>
                        <depends_on_package_base_url>https://files.pfsense.org/packages/10/All/</depends_on_package_base_url>
                </package>

                <service>
                        <name>bandwidthd</name>
                        <rcfile>bandwidthd.sh</rcfile>
                        <executable>bandwidthd</executable>
                        <description><![CDATA[BandwidthD bandwidth monitoring daemon]]></description>
                </service>

Before command NO php-fpm processes:

ps aux |grep php
root    57205  0.0  8.8 31292 20080  -  Is   Tue09AM    0:00.20 /usr/local/bin/php
root    58193  0.0  8.8 31292 20036  -  Is   Tue09AM    0:00.19 /usr/local/bin/php
root    58535  0.0  8.8 31292 20080  -  Is   Tue09AM    0:00.21 /usr/local/bin/php
root    58659  0.0  8.8 31292 20080  -  Is   Tue09AM    0:00.22 /usr/local/bin/php
root    58950  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59247  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59356  0.0  8.8 31292 20036  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59662  0.0  8.8 31292 20036  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59850  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59862  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    60128  0.0 10.6 35388 24336  -  I    Tue09AM    0:01.61 /usr/local/bin/php
root    60381  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    93518  0.0  0.8  2204  1876  0  RL+  10:45AM    0:00.00 grep php

After command php-fpm processes:

 ps aux | grep php
root    57205  0.0  8.8 31292 20080  -  Is   Tue09AM    0:00.20 /usr/local/bin/php
root    58193  0.0  8.8 31292 20036  -  Is   Tue09AM    0:00.19 /usr/local/bin/php
root    58535  0.0  8.8 31292 20080  -  Is   Tue09AM    0:00.21 /usr/local/bin/php
root    58659  0.0  8.8 31292 20080  -  Is   Tue09AM    0:00.22 /usr/local/bin/php
root    58950  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59247  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59356  0.0  8.8 31292 20036  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59662  0.0  8.8 31292 20036  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59850  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    59862  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    60128  0.0 10.6 35388 24336  -  I    Tue09AM    0:01.61 /usr/local/bin/php
root    60381  0.0  8.8 31292 20080  -  I    Tue09AM    0:00.00 /usr/local/bin/php
root    93859  0.0  9.4 31384 21476  -  Ss   10:46AM    0:00.02 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
root     1249  0.0  0.8  2204  1900  0  R+   10:47AM    0:00.00 grep php

Actions #9

Updated by Russell Morris over 8 years ago

Hi,

Just following up - this still seems to cause a lot of grief in v2.2.5. Are others still seeing this?

Thanks!

Actions #10

Updated by Tom Peeters over 8 years ago

I have updated to 2.2.5 and reinstalled bandwithd. The problem still occurs.

Russell Morris wrote:

Hi,

Just following up - this still seems to cause a lot of grief in v2.2.5. Are others still seeing this?

Thanks!

Actions #11

Updated by Russell Morris over 8 years ago

FYI, updated to v2.2.6 -> still have the issue ... :(.

On boot, 2 copies of bandwidthd running, messes up the database. Killing bandwidthd also kills the Web GUI.

Thanks.

Actions #12

Updated by Chris Buechler about 8 years ago

  • Status changed from Feedback to Confirmed
Actions #13

Updated by Phillip Davis about 8 years ago

I will be traveling for a while, so not much chance to do any work on this. But if someone has time to port bandwidth to work on 2.3 it would be nice to be able to run it and see if this problem happens.

Actions #14

Updated by Kill Bill over 7 years ago

Package gone, please close.

Actions #15

Updated by Jim Pingle over 7 years ago

  • Status changed from Confirmed to Closed
Actions

Also available in: Atom PDF