Project

General

Profile

Bug #3045

NTPD crash / doesn't come up

Added by B H almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
NTPD
Target version:
Start date:
06/14/2013
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.1
Affected Architecture:

Description

The NTP services crashes a lot, reason unknown for me.

The System Logs says:
kernel: pid 35663 (ntpd), uid 0: exited on signal 11 (core dumped)

Link to forum thread: http://forum.pfsense.org/index.php/topic,62099.0.html

Idea for fixing it from the forum:

-U number, --updateinterval=number
interval in seconds between scans for new or dropped interfaces.
This option takes an integer number as its argument.
Give the time in seconds between two scans for  new  or  dropped
interfaces. For systems with routing socket support the scans
will be performed shortly after the interface change has been
detected by the system. Use 0 to disable scanning. 60 seconds
is the minimum time between scans.
ntpd.core (3.35 MB) ntpd.core B H, 06/15/2013 10:35 AM
ntpd.core.2 (3.35 MB) ntpd.core.2 David Williams, 06/16/2013 01:39 PM
ntpd.core (3.35 MB) ntpd.core B H, 06/17/2013 03:33 AM
ntpd.core (2.59 MB) ntpd.core Steve Jacobs, 10/15/2013 01:36 PM
ntpd.core (3.35 MB) ntpd.core Fabio Giudici, 10/15/2013 06:52 PM
ntpd1.core (3.35 MB) ntpd1.core Markus Brungs, 02/19/2014 11:31 AM
ntpd2.core (3.35 MB) ntpd2.core Markus Brungs, 02/19/2014 11:31 AM

Associated revisions

Revision b8b78b9a (diff)
Added by Ermal Luçi over 5 years ago

Related to Ticket #3045 avoid races in the ntpdate_sync_one script due to killall returning without the process really exiting.

Revision c088fe72 (diff)
Added by Ermal Luçi over 5 years ago

Related to Ticket #3045 avoid races in the ntpdate_sync_one script due to killall returning without the process really exiting.

History

#1 Updated by Renato Botelho almost 6 years ago

Is there a ntpd core with crash dump that you can share? It could help us to identify the issue.

#2 Updated by B H almost 6 years ago

If anyone tells me where the ntpd core crash dump is located, sure.

#3 Updated by B H almost 6 years ago

The file is attached.

#4 Updated by David Williams almost 6 years ago

I'm also seeing this every couple of days and have also attached file.

#5 Updated by B H almost 6 years ago

Crashed again. File attached.

#6 Updated by Renato Botelho almost 6 years ago

I've built ntpd binaries with debug symbols, there are binaries for 2.0.3 and 2.1, i386 and amd64:

ntpd-2.0.3-amd64
ntpd-2.0.3-i386
ntpd-2.1-amd64
ntpd-2.1-i386

Choose the one according your system version and arch and you can get that onto a system from the shell like so:

fetch -o /root/ntpd.new http://files.pfsense.org/garga/##NTPD_BINARY_NAME##
chmod a+x /root/ntpd.new
killall -9 ntpd
mv /usr/local/bin/ntpd /usr/local/bin/ntpd.old
mv /root/ntpd.new /usr/local/bin/ntpd

Then go to Status -> Services on control panel and restart ntpd.

If you need to back the original binary in place just run:

mv /usr/local/bin/ntpd.old /usr/local/bin/ntpd

#7 Updated by B H almost 6 years ago

After this procedure i can't start the NTPD service.

#8 Updated by B H almost 6 years ago

OK it's started now. I had to go Services - NTP - press Save button. Then press the restart service button. Now it's running.

#9 Updated by B H almost 6 years ago

Since implement your new file yesterday, i habe no more ntpd crashes. I will report again at the end of the week.

#10 Updated by Renato Botelho almost 6 years ago

  • Status changed from New to Feedback

#11 Updated by B H almost 6 years ago

Not any single crash with the new file. The OpenNTPD service is running rock-stable. No crash, no error in system-logs.
Good Job!

Would be great if you can include i in the next snapshot.

#12 Updated by Renato Botelho almost 6 years ago

Could you confirm if the problem persists on more recent snapshots?

#13 Updated by Jim Pingle almost 6 years ago

  • Status changed from Feedback to New

Still crashes easily.

I can reproduce it on a VM very easily. I have two test VMs with an OpenVPN tunnel in between them. All I have to do is bounce one of the boxes and NTPD will crash and not recover on the other one.

#14 Updated by Jim Pingle almost 6 years ago

  • Subject changed from OpenNTPD crash / doesn't come up to NTPD crash / doesn't come up

#15 Updated by Pierre POMES almost 6 years ago

For information, here is the stack :

$ gdb ntpd ntpd.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...
Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
#0  0x283d2df3 in ?? ()
(gdb) where
#0  0x283d2df3 in ?? ()
#1  0x08066c5b in more_pkt ()
    at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_request.c:327
#2  0x080519ff in ctlsettrap ()
    at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_control.c:2593
#3  0x0805b130 in input_handler (cts=0x0)
    at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_io.c:3049
#4  0x0805b582 in input_handler (cts=0x0)
    at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_io.c:2882
#5  0x0804c3e7 in getnetnum (num=0xbfbfedc8 "������\004\b\006", addr=0x750, 
    complain=-1077940716, a_type=671928416)
    at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_config.c:2289
#6  0x0804c358 in getnetnum (num=0x804c3e7 "", addr=0x5, complain=-1077940760, 
    a_type=3217026836)
    at /usr/src/usr.sbin/ntp/ntpd/../../../contrib/ntp/ntpd/ntp_config.c:2229
#7  0x00000006 in ?? ()

#16 Updated by B H almost 6 years ago

I can confirm that NTPD doesn't come up. It happens everytime after reboot the firewall.

#17 Updated by Pierre POMES almost 6 years ago

Pierre POMES wrote:

For information, here is the stack :

[...]

Forget this, this stack is wrong.

Renato, I think the ntpd you built is from a BSD base system (ntp 4.2.4) and not from ports (4.2.6). The ntpd used by pfSense is in /usr/local/bin and is coming from ports.

So the stack I posted was wrong.

Pierre

#18 Updated by Renato Botelho almost 6 years ago

Pierre POMES wrote:

Pierre POMES wrote:

For information, here is the stack :

[...]

Forget this, this stack is wrong.

Renato, I think the ntpd you built is from a BSD base system (ntp 4.2.4) and not from ports (4.2.6). The ntpd used by pfSense is in /usr/local/bin and is coming from ports.

So the stack I posted was wrong.

Yes, you are right. I apologize for the mistake. I'm going to provide new binaries.

#19 Updated by Renato Botelho almost 6 years ago

Pierre POMES wrote:

Pierre POMES wrote:

For information, here is the stack :

[...]

Forget this, this stack is wrong.

Renato, I think the ntpd you built is from a BSD base system (ntp 4.2.4) and not from ports (4.2.6). The ntpd used by pfSense is in /usr/local/bin and is coming from ports.

So the stack I posted was wrong.

Pierre

New binaries are available at http://files.pfsense.org/garga/ and steps were updated on comment 6. Could you please try again?

Regards

#20 Updated by B H almost 6 years ago

Loaded the new ntpd binaries into my firewall.
Now i got: kernel: pid 21367 (ntpd), uid 0: exited on signal 11 (core dumped)

#21 Updated by Renato Botelho almost 6 years ago

Benjamin H. wrote:

Loaded the new ntpd binaries into my firewall.
Now i got: kernel: pid 21367 (ntpd), uid 0: exited on signal 11 (core dumped)

Could you please let me know pfSense version and arch you are using and also send me the ntpd.core?

#22 Updated by B H almost 6 years ago

Updatet to latest snapshot, 2.1 RC1 amd64 and replaced the NTPD file again with your provided file in post #19.
After a reboot there is no more error with ntpd crash. But the main problem still exist; NTPD service doesn't come up by itself (never ever).

The system log lies, because it says ntpd is up and running.
Aug 7 17:14:22 ntpd76119: ntpd Mon Aug 5 15:15:14 UTC 2013 (1)
Aug 7 17:14:22 ntp: Starting NTP Daemon.

but in the dashboard and under services it is not running.
I have to go to Services - NTP - then press Save. After this its shown as running.

#23 Updated by Renato Botelho almost 6 years ago

Benjamin H. wrote:

Updatet to latest snapshot, 2.1 RC1 amd64 and replaced the NTPD file again with your provided file in post #19.
After a reboot there is no more error with ntpd crash. But the main problem still exist; NTPD service doesn't come up by itself (never ever).

The system log lies, because it says ntpd is up and running.
Aug 7 17:14:22 ntpd76119: ntpd Mon Aug 5 15:15:14 UTC 2013 (1)
Aug 7 17:14:22 ntp: Starting NTP Daemon.

but in the dashboard and under services it is not running.
I have to go to Services - NTP - then press Save. After this its shown as running.

NTPd is starting fine on all my systems, could you share ntpd section of config.xml with me?

#24 Updated by Renato Botelho over 5 years ago

  • Status changed from New to Feedback

It looks stable on latest snapshots

#25 Updated by Chris Buechler over 5 years ago

  • Status changed from Feedback to Resolved

none of us have seen any issues reported here in a while, seems fine.

#26 Updated by Steve Jacobs over 5 years ago

Seeing this exact same issue on 2.1 Release i386 and amd64. I have attached my ntpd.core from an i386 install here. This is from the binary included in this thread.

--version shows ntpd Mon Aug 5 18:13:54 UTC 2013 (1)

#27 Updated by Steve Jacobs over 5 years ago

It is also incredibly inconsistent. If I wait an hour and try to start, perhaps ntpd will start without issue and run.

#28 Updated by Fabio Giudici over 5 years ago

Issue just presented to me as well, on both members of cluster (pfSense 2.1 stable), Server SunFire X3-2...
I can attach core file for ntpd

#29 Updated by Fabio Giudici over 5 years ago

Problem seems to be related to line:
statsdir /var/log/ntp

in /var/etc/ntp.conf.

After commenting this line, ntpd is starting correctly (no more core dumps).

Could someone check if it might be a permission/jail/chroot/wrong option issues?

thank you

#30 Updated by Fabio Giudici over 5 years ago

also line
driftfile /var/db/ntpd.drift

seems to be involved.

those really seems to be directory permission issues to me, as process ntpd seems not being able to read/write on those directory/files...so it crashes!

I commented both lines on my machines and ntpd is running right now...

#31 Updated by Thomas Rieschl over 5 years ago

Sorry for posting here again. But I still got the "exited on signal 11 (core dumped)" error sometimes.
NTP runs fine for a long time but suddenly (probably when IPs change) it crashes and cannot be started again.
Various tips here and on the forums didn't help (make logfile, driftfile, world-writable, remote config lines,..).

The only thing that helps is a reboot.

I can reproduce it very easily, I just have to restart the OpenVPN Serivce and NTP dies and never comes back to life again (until a reboot of the server).

It's version 2.1 amd64 on a Dell PowerEdge 210 II, also running OpenVPN. But I think the problem did also occur before using OpenVPN.

#32 Updated by Steve Jacobs over 5 years ago

I can report that this happens on i386 and amd64. I've switched architectures trying to avoid this bug. I can also report that it occurs on boxes with openvpn running, and with boxes that do not have openvpn running. Its definitely broken, but no word from anyone about re-opening this bug? Still shows as Status: Resolved. It most definitely isn't. Can we at least get an acknowledgment from the pfsense team that this issue is known?

#33 Updated by Fabio Giudici over 5 years ago

I did just a series of test, and the core dump of ntpd seems strictly related to the presence of the file /var/db/ntpd.drift.
I deleted it, and the ntpd services started well being stable.

What I was wondering is: may it be that /var/db/ directory can only be written during boot-up phase?

That could explain why, when you restart ntpd (and ntpd is restarted in many occasion, such as restarting other network services) it starts to core dump...

#34 Updated by Renato Botelho over 5 years ago

Fabio Giudici wrote:

I did just a series of test, and the core dump of ntpd seems strictly related to the presence of the file /var/db/ntpd.drift.
I deleted it, and the ntpd services started well being stable.

What I was wondering is: may it be that /var/db/ directory can only be written during boot-up phase?

That could explain why, when you restart ntpd (and ntpd is restarted in many occasion, such as restarting other network services) it starts to core dump...

/var/db is always read/write, on Full installation it's a regular directory on /, and on nanobsd /var is mounted on a memory partition, that is also always read/write.

#35 Updated by Bill Meeks over 5 years ago

I see a problem on my 2.1 64-bit system with NTPD that may be related to the issues reported here. Anytime the WAN interface bounces (I have cable modem service with DHCP on the pfSense WAN, and sometimes the cable signal goes out for a while), NTPD will crash/die and not restart. When I catch that it is not running, I can usually manually start it from the Status...Services menu.

Should it crash/stop running just because the WAN interface is bounced? I have it configured to actually listen on only the LAN interface since I have an internal NTP server on the network. I never saw this behavior with my old 2.0.3 box.

Bill

#36 Updated by Fabio Giudici over 5 years ago

Good morning
Just one more question: is it ntpd running in jail/chroot?

Just to restrict the issue...but it seems to be a bug of ntpd to me...deleting /var/db/ntpd.drift always solves the issue.

Fabio

Renato Botelho wrote:

/var/db is always read/write, on Full installation it's a regular directory on /, and on nanobsd /var is mounted on a memory partition, that is also always read/write.

#37 Updated by Renato Botelho over 5 years ago

Fabio Giudici wrote:

Good morning
Just one more question: is it ntpd running in jail/chroot?

Just to restrict the issue...but it seems to be a bug of ntpd to me...deleting /var/db/ntpd.drift always solves the issue.

No, it's not chrooted.

What is the content of ntpd.drift when ntpd crashes?

#38 Updated by Fabio Giudici over 5 years ago

simply one line contining:

-0.056

(or other numbers)

#39 Updated by Markus Brungs over 5 years ago

I have several clusters running on 2.1 amd64 using LAGG (on igb based quad port cards) + VLAN + CARP + OpenVPN client.

On NONE of these systems NTPD ever starts on boot.

No matter if NTPD is bound to ANY interface (default) or if I bind it to particular interfaces (eg. WAN CARP vip and/or LAN CARP vip),
it wont come up without manually forcing it by clicking "save" within Services: NTP + clicking Status: services -> start service
multiple times.

NTPD will finally start after trying long enough by clicking save and/or start service, but it won´t stay up and crash with the well known

Feb 19 17:59:12 php: /status_services.php: NTPD is starting up.
Feb 19 17:59:05 kernel: pid 95961 (ntpd), uid 0: exited on signal 11 (core dumped)
Feb 19 17:59:04 php: /services_ntpd.php: NTPD is starting up.
Feb 19 17:58:50 kernel: pid 92900 (ntpd), uid 0: exited on signal 11 (core dumped)
Feb 19 17:58:50 php: /status_services.php: NTPD is starting up.

A core dump is generated on every cluster node I run. I tried to remove the file /var/db/ntpd.drift like mentioned abobve, but this doesn´t
help in my case.

I have other clusters running on 2.1 amd64 WITHOUT using LAGG but also using VLAN + CARP + OpenVPN client + OpenVPN server.

Here NTPD just works fine.

#40 Updated by Jim Pingle over 5 years ago

Please test on 2.1.1 and report back. https://forum.pfsense.org/index.php/topic,71546.0.html

#41 Updated by Markus Brungs over 5 years ago

Upgraded one of my clusters to 2.1.1-PRERELEASE (amd64) built on Wed Feb 19 19:46:29 EST 2014.

Can confirm NTPD is working as it should. Starts on boot on both nodes. Can be manually stopped/started without an issue.

Well done guys!

Thank you.

Also available in: Atom PDF