Bug #3045
closedNTPD crash / doesn't come up
0%
Description
The NTP services crashes a lot, reason unknown for me.
The System Logs says:
kernel: pid 35663 (ntpd), uid 0: exited on signal 11 (core dumped)
Link to forum thread: http://forum.pfsense.org/index.php/topic,62099.0.html
Idea for fixing it from the forum:
-U number, --updateinterval=number
interval in seconds between scans for new or dropped interfaces.
This option takes an integer number as its argument.
Give the time in seconds between two scans for new or dropped
interfaces. For systems with routing socket support the scans
will be performed shortly after the interface change has been
detected by the system. Use 0 to disable scanning. 60 seconds
is the minimum time between scans.
Files
Updated by Renato Botelho over 11 years ago
Is there a ntpd core with crash dump that you can share? It could help us to identify the issue.
Updated by B H over 11 years ago
If anyone tells me where the ntpd core crash dump is located, sure.
Updated by David Williams over 11 years ago
- File ntpd.core.2 ntpd.core.2 added
I'm also seeing this every couple of days and have also attached file.
Updated by Renato Botelho over 11 years ago
I've built ntpd binaries with debug symbols, there are binaries for 2.0.3 and 2.1, i386 and amd64:
ntpd-2.0.3-amd64
ntpd-2.0.3-i386
ntpd-2.1-amd64
ntpd-2.1-i386
Choose the one according your system version and arch and you can get that onto a system from the shell like so:
fetch -o /root/ntpd.new http://files.pfsense.org/garga/##NTPD_BINARY_NAME##
chmod a+x /root/ntpd.new
killall -9 ntpd
mv /usr/local/bin/ntpd /usr/local/bin/ntpd.old
mv /root/ntpd.new /usr/local/bin/ntpd
Then go to Status -> Services on control panel and restart ntpd.
If you need to back the original binary in place just run:
mv /usr/local/bin/ntpd.old /usr/local/bin/ntpd
Updated by B H over 11 years ago
After this procedure i can't start the NTPD service.
Updated by B H over 11 years ago
OK it's started now. I had to go Services - NTP - press Save button. Then press the restart service button. Now it's running.
Updated by B H over 11 years ago
Since implement your new file yesterday, i habe no more ntpd crashes. I will report again at the end of the week.
Updated by Renato Botelho over 11 years ago
- Status changed from New to Feedback
Updated by B H over 11 years ago
Not any single crash with the new file. The OpenNTPD service is running rock-stable. No crash, no error in system-logs.
Good Job!
Would be great if you can include i in the next snapshot.
Updated by Renato Botelho over 11 years ago
Could you confirm if the problem persists on more recent snapshots?
Updated by Jim Pingle over 11 years ago
- Status changed from Feedback to New
Still crashes easily.
I can reproduce it on a VM very easily. I have two test VMs with an OpenVPN tunnel in between them. All I have to do is bounce one of the boxes and NTPD will crash and not recover on the other one.
Updated by Jim Pingle over 11 years ago
- Subject changed from OpenNTPD crash / doesn't come up to NTPD crash / doesn't come up
Updated by Pierre POMES over 11 years ago
For information, here is the stack :
$ gdb ntpd ntpd.core GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... Core was generated by `ntpd'. Program terminated with signal 11, Segmentation fault. #0 0x283d2df3 in ?? () (gdb) where #0 0x283d2df3 in ?? () #1 0x08066c5b in more_pkt () at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_request.c:327 #2 0x080519ff in ctlsettrap () at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_control.c:2593 #3 0x0805b130 in input_handler (cts=0x0) at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_io.c:3049 #4 0x0805b582 in input_handler (cts=0x0) at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_io.c:2882 #5 0x0804c3e7 in getnetnum (num=0xbfbfedc8 "������\004\b\006", addr=0x750, complain=-1077940716, a_type=671928416) at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_config.c:2289 #6 0x0804c358 in getnetnum (num=0x804c3e7 "", addr=0x5, complain=-1077940760, a_type=3217026836) at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_config.c:2229 #7 0x00000006 in ?? ()
Updated by B H over 11 years ago
I can confirm that NTPD doesn't come up. It happens everytime after reboot the firewall.
Updated by Pierre POMES over 11 years ago
Pierre POMES wrote:
For information, here is the stack :
[...]
Forget this, this stack is wrong.
Renato, I think the ntpd you built is from a BSD base system (ntp 4.2.4) and not from ports (4.2.6). The ntpd used by pfSense is in /usr/local/bin and is coming from ports.
So the stack I posted was wrong.
Pierre
Updated by Renato Botelho over 11 years ago
Pierre POMES wrote:
Pierre POMES wrote:
For information, here is the stack :
[...]
Forget this, this stack is wrong.
Renato, I think the ntpd you built is from a BSD base system (ntp 4.2.4) and not from ports (4.2.6). The ntpd used by pfSense is in /usr/local/bin and is coming from ports.
So the stack I posted was wrong.
Yes, you are right. I apologize for the mistake. I'm going to provide new binaries.
Updated by Renato Botelho over 11 years ago
Pierre POMES wrote:
Pierre POMES wrote:
For information, here is the stack :
[...]
Forget this, this stack is wrong.
Renato, I think the ntpd you built is from a BSD base system (ntp 4.2.4) and not from ports (4.2.6). The ntpd used by pfSense is in /usr/local/bin and is coming from ports.
So the stack I posted was wrong.
Pierre
New binaries are available at http://files.pfsense.org/garga/ and steps were updated on comment 6. Could you please try again?
Regards
Updated by B H over 11 years ago
Loaded the new ntpd binaries into my firewall.
Now i got: kernel: pid 21367 (ntpd), uid 0: exited on signal 11 (core dumped)
Updated by Renato Botelho over 11 years ago
Benjamin H. wrote:
Loaded the new ntpd binaries into my firewall.
Now i got: kernel: pid 21367 (ntpd), uid 0: exited on signal 11 (core dumped)
Could you please let me know pfSense version and arch you are using and also send me the ntpd.core?
Updated by B H over 11 years ago
Updatet to latest snapshot, 2.1 RC1 amd64 and replaced the NTPD file again with your provided file in post #19.
After a reboot there is no more error with ntpd crash. But the main problem still exist; NTPD service doesn't come up by itself (never ever).
The system log lies, because it says ntpd is up and running.
Aug 7 17:14:22 ntpd76119: ntpd 4.2.6p5@1.2349-o Mon Aug 5 15:15:14 UTC 2013 (1)
Aug 7 17:14:22 ntp: Starting NTP Daemon.
but in the dashboard and under services it is not running.
I have to go to Services - NTP - then press Save. After this its shown as running.
Updated by Renato Botelho over 11 years ago
Benjamin H. wrote:
Updatet to latest snapshot, 2.1 RC1 amd64 and replaced the NTPD file again with your provided file in post #19.
After a reboot there is no more error with ntpd crash. But the main problem still exist; NTPD service doesn't come up by itself (never ever).The system log lies, because it says ntpd is up and running.
Aug 7 17:14:22 ntpd76119: ntpd 4.2.6p5@1.2349-o Mon Aug 5 15:15:14 UTC 2013 (1)
Aug 7 17:14:22 ntp: Starting NTP Daemon.but in the dashboard and under services it is not running.
I have to go to Services - NTP - then press Save. After this its shown as running.
NTPd is starting fine on all my systems, could you share ntpd section of config.xml with me?
Updated by Renato Botelho about 11 years ago
- Status changed from New to Feedback
It looks stable on latest snapshots
Updated by Chris Buechler about 11 years ago
- Status changed from Feedback to Resolved
none of us have seen any issues reported here in a while, seems fine.
Updated by Steve Jacobs about 11 years ago
Seeing this exact same issue on 2.1 Release i386 and amd64. I have attached my ntpd.core from an i386 install here. This is from the binary included in this thread.
--version shows ntpd 4.2.6p5@1.2349-o Mon Aug 5 18:13:54 UTC 2013 (1)
Updated by Steve Jacobs about 11 years ago
It is also incredibly inconsistent. If I wait an hour and try to start, perhaps ntpd will start without issue and run.
Updated by Fabio Giudici about 11 years ago
Issue just presented to me as well, on both members of cluster (pfSense 2.1 stable), Server SunFire X3-2...
I can attach core file for ntpd
Updated by Fabio Giudici about 11 years ago
Problem seems to be related to line:
statsdir /var/log/ntp
in /var/etc/ntp.conf.
After commenting this line, ntpd is starting correctly (no more core dumps).
Could someone check if it might be a permission/jail/chroot/wrong option issues?
thank you
Updated by Fabio Giudici about 11 years ago
also line
driftfile /var/db/ntpd.drift
seems to be involved.
those really seems to be directory permission issues to me, as process ntpd seems not being able to read/write on those directory/files...so it crashes!
I commented both lines on my machines and ntpd is running right now...
Updated by Thomas Rieschl almost 11 years ago
Sorry for posting here again. But I still got the "exited on signal 11 (core dumped)" error sometimes.
NTP runs fine for a long time but suddenly (probably when IPs change) it crashes and cannot be started again.
Various tips here and on the forums didn't help (make logfile, driftfile, world-writable, remote config lines,..).
The only thing that helps is a reboot.
I can reproduce it very easily, I just have to restart the OpenVPN Serivce and NTP dies and never comes back to life again (until a reboot of the server).
It's version 2.1 amd64 on a Dell PowerEdge 210 II, also running OpenVPN. But I think the problem did also occur before using OpenVPN.
Updated by Steve Jacobs almost 11 years ago
I can report that this happens on i386 and amd64. I've switched architectures trying to avoid this bug. I can also report that it occurs on boxes with openvpn running, and with boxes that do not have openvpn running. Its definitely broken, but no word from anyone about re-opening this bug? Still shows as Status: Resolved. It most definitely isn't. Can we at least get an acknowledgment from the pfsense team that this issue is known?
Updated by Fabio Giudici almost 11 years ago
I did just a series of test, and the core dump of ntpd seems strictly related to the presence of the file /var/db/ntpd.drift.
I deleted it, and the ntpd services started well being stable.
What I was wondering is: may it be that /var/db/ directory can only be written during boot-up phase?
That could explain why, when you restart ntpd (and ntpd is restarted in many occasion, such as restarting other network services) it starts to core dump...
Updated by Renato Botelho almost 11 years ago
Fabio Giudici wrote:
I did just a series of test, and the core dump of ntpd seems strictly related to the presence of the file /var/db/ntpd.drift.
I deleted it, and the ntpd services started well being stable.What I was wondering is: may it be that /var/db/ directory can only be written during boot-up phase?
That could explain why, when you restart ntpd (and ntpd is restarted in many occasion, such as restarting other network services) it starts to core dump...
/var/db is always read/write, on Full installation it's a regular directory on /, and on nanobsd /var is mounted on a memory partition, that is also always read/write.
Updated by Bill Meeks almost 11 years ago
I see a problem on my 2.1 64-bit system with NTPD that may be related to the issues reported here. Anytime the WAN interface bounces (I have cable modem service with DHCP on the pfSense WAN, and sometimes the cable signal goes out for a while), NTPD will crash/die and not restart. When I catch that it is not running, I can usually manually start it from the Status...Services menu.
Should it crash/stop running just because the WAN interface is bounced? I have it configured to actually listen on only the LAN interface since I have an internal NTP server on the network. I never saw this behavior with my old 2.0.3 box.
Bill
Updated by Fabio Giudici almost 11 years ago
Good morning
Just one more question: is it ntpd running in jail/chroot?
Just to restrict the issue...but it seems to be a bug of ntpd to me...deleting /var/db/ntpd.drift always solves the issue.
Fabio
Renato Botelho wrote:
/var/db is always read/write, on Full installation it's a regular directory on /, and on nanobsd /var is mounted on a memory partition, that is also always read/write.
Updated by Renato Botelho almost 11 years ago
Fabio Giudici wrote:
Good morning
Just one more question: is it ntpd running in jail/chroot?Just to restrict the issue...but it seems to be a bug of ntpd to me...deleting /var/db/ntpd.drift always solves the issue.
No, it's not chrooted.
What is the content of ntpd.drift when ntpd crashes?
Updated by Fabio Giudici almost 11 years ago
simply one line contining:
-0.056
(or other numbers)
Updated by Markus Brungs almost 11 years ago
- File ntpd1.core ntpd1.core added
- File ntpd2.core ntpd2.core added
I have several clusters running on 2.1 amd64 using LAGG (on igb based quad port cards) + VLAN + CARP + OpenVPN client.
On NONE of these systems NTPD ever starts on boot.
No matter if NTPD is bound to ANY interface (default) or if I bind it to particular interfaces (eg. WAN CARP vip and/or LAN CARP vip),
it wont come up without manually forcing it by clicking "save" within Services: NTP + clicking Status: services -> start service
multiple times.
NTPD will finally start after trying long enough by clicking save and/or start service, but it won´t stay up and crash with the well known
Feb 19 17:59:12 php: /status_services.php: NTPD is starting up.
Feb 19 17:59:05 kernel: pid 95961 (ntpd), uid 0: exited on signal 11 (core dumped)
Feb 19 17:59:04 php: /services_ntpd.php: NTPD is starting up.
Feb 19 17:58:50 kernel: pid 92900 (ntpd), uid 0: exited on signal 11 (core dumped)
Feb 19 17:58:50 php: /status_services.php: NTPD is starting up.
A core dump is generated on every cluster node I run. I tried to remove the file /var/db/ntpd.drift like mentioned abobve, but this doesn´t
help in my case.
I have other clusters running on 2.1 amd64 WITHOUT using LAGG but also using VLAN + CARP + OpenVPN client + OpenVPN server.
Here NTPD just works fine.
Updated by Jim Pingle almost 11 years ago
Please test on 2.1.1 and report back. https://forum.pfsense.org/index.php/topic,71546.0.html
Updated by Markus Brungs almost 11 years ago
Upgraded one of my clusters to 2.1.1-PRERELEASE (amd64) built on Wed Feb 19 19:46:29 EST 2014.
Can confirm NTPD is working as it should. Starts on boot on both nodes. Can be manually stopped/started without an issue.
Well done guys!
Thank you.