Bug #3986
closedBandwidthD can break php-fpm in unknown rare edge case
Added by Russell Morris about 10 years ago. Updated almost 8 years ago.
0%
Description
Hi,
Having a lot of struggles with BandwidthD in v2.2 More info here,
https://forum.pfsense.org/index.php?topic=78175.msg456588#msg456588
Many recent observations in the last few posts. It also breaks the WebGUI, but a fix for that (from another user!) in the post also.
Thanks!
Files
bandwidthd-215-CPU-loop.txt (2.6 KB) bandwidthd-215-CPU-loop.txt | Phillip Davis, 11/04/2014 08:39 PM |
Updated by Phillip Davis about 10 years ago
In addition to the 2.2 issue of it somehow taking over php-fpm and thus breaking webGUI and...
I will note here that it goes into a CPU loop on 2.1.5 systems. It does not happen all the time, but can happen directly on bootup, like in the attached text file. I just rebooted an Alix 2D13 this morning and bandwidthd was chewing up all CPU for ages, so I killed it. It may be a different problem, or may prove to be related.
Since we can't really use bandwidthd on 2.2 at this time, it is not possible to say if the CPU loop behaviour also happens on 2.2
Updated by Chris Buechler almost 10 years ago
- Status changed from New to Confirmed
- Target version set to 2.2
Updated by Chris Buechler almost 10 years ago
- Subject changed from BandwidthD broken in v2.2 to BandwidthD can break php-fpm in unknown rare edge case
- Status changed from Confirmed to Feedback
- Target version deleted (
2.2)
bandwidthd works in general on 2.2 now. The issue Phil noted with php-fpm may still be a problem in some circumstances.
split discussion on the forum on any remaining issues to:
https://forum.pfsense.org/index.php?topic=84642.0
Updated by Phillip Davis almost 10 years ago
I will try a few variations again to see if I can break anything. I did see it in a CPU loop on a 2.1.5 system last week and disabled it (didn't have time to play then).
Issues from the past that I have never got to be able to reproduce and really get to the bottom of are:
1) After an install it sometimes gives some errors about a lib*.so not being found and the binary will not start. Only noticed that myself on 2.1.5 and some time ago.
2) Seen it go into a CPU loop on 2.1.5 systems. Have not run it enough with realistic throughput... on a 2.2 system to know if the CPU loop can happen there. My 2.1.5 production systems are logging bandwidthd to a postgreSQL database at the main office, so some are over OpenVPN links. It could be an issue that happens if the OpenVPN site-to-site link is down or the database server is otherwise off-line. That is really just a thought, not even a tested theory.
3) Was known to cause the webGUI to become unavailable, and php-fpm not to respond in general on 2.2. Have not seen that recently.
If anyone experiences, and even can reproduce, any of the above issues (or other issues - hope there are no more) please post to here or the forum thread that Chris referenced. Would like to know all the details, what type of install (32/64 bit, nanoBSD or full, real hardware or VM, logging the data to itself or an SQL server, is the SQL server on a local subnet or across a link...)
Updated by Russell Morris almost 10 years ago
Hi,
Yes, I can definitely reproduce this - just installed the latest version of pfSense (v2.2, from today), and I do find that the GUI is broken after the upgrade (have to manually restart it). And after boot I find two copies of bandwidthd running ... have to kill them, manually start this also.
To your questions - this is a full install on real HW, logging to an SQL server (on another machine).
Any things to check to help debug this?
Thanks!
Updated by Russell Morris almost 10 years ago
Hi,
FYI, I just added some new observations to the forum post, https://forum.pfsense.org/index.php?topic=84642.0
But it is seeming to me like there is a deeper interaction here - not limited to bandwidthd, but also including OpenVPN, ntopng and php-fpm (and others?).
Thanks!
Updated by Tom Peeters over 9 years ago
I 'm having the same issues with bandwithd and would like to help to solve this.
Running release 2.2.1
- Installed package
- Going to Services >>> bandwithd >>> enable >>> save >>> GUI crash
- Restarting PHP-FPM from shell to fix GUI
config.xml:
<bandwidthd> <config> <enable>on</enable> <active_interface>opt6</active_interface> <subnets_custom>192.168.128.0/21</subnets_custom> <skipintervals/> <graphcutoff>512</graphcutoff> <promiscuous/> <outputcdf/> <recovercdf/> <outputpostgresql/> <postgresqlhost/> <postgresqldatabase/> <postgresqlusername/> <postgresqlpassword/> <sensorid/> <filter/> <drawgraphs/> <meta_refresh/> <graph_log_info/> </config> </bandwidthd>
system.log:
Mar 24 10:13:32 firewall php-fpm[60155]: /pkg_mgr_install.php: Beginning package installation for bandwidthd . Mar 24 10:13:33 firewall check_reload_status: Syncing firewall Mar 24 10:13:45 firewall check_reload_status: Syncing firewall Mar 24 10:13:45 firewall php-fpm[60155]: /pkg_mgr_install.php: Successfully installed package: bandwidthd. Mar 24 10:13:46 firewall check_reload_status: Reloading filter Mar 24 10:14:01 firewall check_reload_status: Syncing firewall Mar 24 10:14:01 firewall php-fpm[16579]: /pkg_edit.php: The command '/usr/local/etc/rc.d/bandwidthd.sh stop' returned exit code '1', the output was 'No matching processes were found' Mar 24 10:14:01 firewall bandwidthd: Monitoring subnet 255.255.255.252 with netmask 255.255.255.252 Mar 24 10:14:01 firewall bandwidthd: Monitoring subnet 255.255.248.0 with netmask 255.255.248.0 Mar 24 10:14:01 firewall bandwidthd: Opening em1_vlan70 Mar 24 10:14:01 firewall bandwidthd: Packet Encoding: Ethernet Mar 24 10:14:29 firewall check_reload_status: Syncing firewall Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: unix:/var/run/php-fpm.socket Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.3346) response not received, request sent: 1434 on socket: unix:/var/run/php-fpm.socket for /pkg_edit.php?, closing connection Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.1754) connect failed: No such file or directory on unix:/var/run/php-fpm.socket Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1 Mar 24 10:14:29 firewall lighttpd[42020]: (mod_fastcgi.c.3587) all handlers for /ifstats.php?if=em5_vlan131 on .php are down. Mar 24 10:14:30 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:31 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:32 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:32 firewall lighttpd[42020]: (mod_fastcgi.c.2848) fcgi-server re-enabled: unix:/var/run/php-fpm.socket Mar 24 10:14:32 firewall lighttpd[42020]: (mod_fastcgi.c.1754) connect failed: No such file or directory on unix:/var/run/php-fpm.socket Mar 24 10:14:32 firewall lighttpd[42020]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1 Mar 24 10:14:32 firewall lighttpd[42020]: (mod_fastcgi.c.3587) all handlers for /ifstats.php?if=em4_vlan99 on .php are down. Mar 24 10:14:33 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:34 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:35 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:35 firewall lighttpd[42020]: (mod_fastcgi.c.2848) fcgi-server re-enabled: unix:/var/run/php-fpm.socket Mar 24 10:14:35 firewall lighttpd[42020]: (mod_fastcgi.c.1754) connect failed: No such file or directory on unix:/var/run/php-fpm.socket Mar 24 10:14:35 firewall lighttpd[42020]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1 Mar 24 10:14:35 firewall lighttpd[42020]: (mod_fastcgi.c.3587) all handlers for /ifstats.php?if=em2_vlan98 on .php are down. Mar 24 10:14:36 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:37 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:38 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:38 firewall lighttpd[42020]: (mod_fastcgi.c.2848) fcgi-server re-enabled: unix:/var/run/php-fpm.socket Mar 24 10:14:38 firewall lighttpd[42020]: (mod_fastcgi.c.1754) connect failed: No such file or directory on unix:/var/run/php-fpm.socket Mar 24 10:14:38 firewall lighttpd[42020]: (mod_fastcgi.c.3021) backend died; we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1 Mar 24 10:14:39 firewall check_reload_status: Could not connect to /var/run/php-fpm.socket Mar 24 10:14:39 firewall rc.php-fpm_restart[85880]: >>> Restarting php-fpm
The logging "check_reload_status: Could not connect to /var/run/php-fpm.socket" errors disappear when I disable bandwithd
Feel free to ask any other thing to test.
Updated by Gabor Tjong A Hung over 9 years ago
I confirm that I have bandwithd installed, but it isn't enabled!
My syslog is filled with these:
$cat /var/log/system.log Apr 15 10:20:54 pfSense check_reload_status: Could not connect to /var/run/php-fpm.socket
After running (See https://forum.pfsense.org/index.php?topic=84642.0):
/usr/local/sbin/php-fpm -c /usr/local/lib/php.ini -y /usr/local/lib/php-fpm.conf -RD
The Web GUI works again.
$uname -a FreeBSD pfSense.amplify.local 10.1-RELEASE-p4 FreeBSD 10.1-RELEASE-p4 #0 36d7dec(releng/10.1)-dirty: Thu Jan 22 15:27:13 CST 2015 root@pfsense-22-i386-builder:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_wrap_vga.10.i386 i386
config.xml:
<package> <name>bandwidthd</name> <website>http://bandwidthd.sourceforge.net/</website> <descr><![CDATA[BandwidthD tracks usage of TCP/IP network subnets and builds html files with graphs to display utilization. Charts are built by individual IPs, and by default display utilization over 2 day, 8 day, 40 day, and 400 day periods. Furthermore, each ip address's utilization can be logged out at intervals of 3.3 minutes, 10 minutes, 1 hour or 12 hours in cdf format, or to a backend database server. HTTP, TCP, UDP, ICMP, VPN, and P2P traffic are color coded.]]></descr> <category>System</category> <version>2.0.1_6 pkg v.0.5</version> <status>BETA</status> <required_version>2.2</required_version> <depends_on_package_pbi>bandwidthd-2.0.1_6-i386.pbi</depends_on_package_pbi> <config_file>https://packages.pfsense.org/packages/config/bandwidthd/bandwidthd.xml</config_file> <configurationfile>bandwidthd.xml</configurationfile> <build_pbi> <ports_before>net/libpcap databases/postgresql91-client graphics/gd</ports_before> <port>net-mgmt/bandwidthd</port> </build_pbi> <build_options>WITH_NLS=true;WITHOUT_PAM=true;WITHOUT_LDAP=true;WITHOUT_MIT_KRB5=true;WITHOUT_HEIMDAL_KRB5=true;WITHOUT_OPTIMIZED_CFLAGS=true;WITHOUT_XML=true;WITHOUT_TZDATA=true;WITHOUT_DEBUG=true;WITHOUT_GSSAPI=true;WITHOUT_ICU=true;WITH_INTDATE=true</build_options> <depends_on_package_base_url>https://files.pfsense.org/packages/10/All/</depends_on_package_base_url> </package> <service> <name>bandwidthd</name> <rcfile>bandwidthd.sh</rcfile> <executable>bandwidthd</executable> <description><![CDATA[BandwidthD bandwidth monitoring daemon]]></description> </service>
Before command NO php-fpm processes:
ps aux |grep php root 57205 0.0 8.8 31292 20080 - Is Tue09AM 0:00.20 /usr/local/bin/php root 58193 0.0 8.8 31292 20036 - Is Tue09AM 0:00.19 /usr/local/bin/php root 58535 0.0 8.8 31292 20080 - Is Tue09AM 0:00.21 /usr/local/bin/php root 58659 0.0 8.8 31292 20080 - Is Tue09AM 0:00.22 /usr/local/bin/php root 58950 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 59247 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 59356 0.0 8.8 31292 20036 - I Tue09AM 0:00.00 /usr/local/bin/php root 59662 0.0 8.8 31292 20036 - I Tue09AM 0:00.00 /usr/local/bin/php root 59850 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 59862 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 60128 0.0 10.6 35388 24336 - I Tue09AM 0:01.61 /usr/local/bin/php root 60381 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 93518 0.0 0.8 2204 1876 0 RL+ 10:45AM 0:00.00 grep php
After command php-fpm processes:
ps aux | grep php root 57205 0.0 8.8 31292 20080 - Is Tue09AM 0:00.20 /usr/local/bin/php root 58193 0.0 8.8 31292 20036 - Is Tue09AM 0:00.19 /usr/local/bin/php root 58535 0.0 8.8 31292 20080 - Is Tue09AM 0:00.21 /usr/local/bin/php root 58659 0.0 8.8 31292 20080 - Is Tue09AM 0:00.22 /usr/local/bin/php root 58950 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 59247 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 59356 0.0 8.8 31292 20036 - I Tue09AM 0:00.00 /usr/local/bin/php root 59662 0.0 8.8 31292 20036 - I Tue09AM 0:00.00 /usr/local/bin/php root 59850 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 59862 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 60128 0.0 10.6 35388 24336 - I Tue09AM 0:01.61 /usr/local/bin/php root 60381 0.0 8.8 31292 20080 - I Tue09AM 0:00.00 /usr/local/bin/php root 93859 0.0 9.4 31384 21476 - Ss 10:46AM 0:00.02 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm) root 1249 0.0 0.8 2204 1900 0 R+ 10:47AM 0:00.00 grep php
Updated by Russell Morris about 9 years ago
Hi,
Just following up - this still seems to cause a lot of grief in v2.2.5. Are others still seeing this?
Thanks!
Updated by Tom Peeters about 9 years ago
I have updated to 2.2.5 and reinstalled bandwithd. The problem still occurs.
Russell Morris wrote:
Hi,
Just following up - this still seems to cause a lot of grief in v2.2.5. Are others still seeing this?
Thanks!
Updated by Russell Morris almost 9 years ago
FYI, updated to v2.2.6 -> still have the issue ... :(.
On boot, 2 copies of bandwidthd running, messes up the database. Killing bandwidthd also kills the Web GUI.
Thanks.
Updated by Chris Buechler almost 9 years ago
- Status changed from Feedback to Confirmed
Updated by Phillip Davis almost 9 years ago
I will be traveling for a while, so not much chance to do any work on this. But if someone has time to port bandwidth to work on 2.3 it would be nice to be able to run it and see if this problem happens.
Updated by Jim Pingle almost 8 years ago
- Status changed from Confirmed to Closed