Project

General

Profile

Actions

Bug #10692

open

PIMD starts twice at boot

Added by Louis B over 3 years ago. Updated 9 months ago.

Status:
Confirmed
Priority:
Normal
Assignee:
-
Category:
PIMD
Target version:
-
Start date:
06/23/2020
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Affected Version:
Affected Plus Version:
Affected Architecture:
All

Description

Hello,

I just discoverd a critical error in the pfSense boot sequence.
- Independed if you have enabled the PIMD-service, PIMD is starting during boot (error-1)
- More sevire PIMD is started twice. With as consequence the those two PIMD-instances are parallel communicating with the OS to get vifs for all interfaces. To starting PIMD interfaces in parallel is causing fatal errors (error-2)

In the past I already notices that only 3 out of my 9 interfaces got a vif assigned (not ok!!). Two hours ago I had the "bright idea" to test what happens when I would
- kill PIMD (pimd q)
disable two of the three interfaces that did get a vif
- starting PIMD again (pimd -c /var/etc/pimd/pimd.conf -d -f -N)

Guess what, now 7 out of the nine interfaces do get a vif
Then I did stop and start pimd again with the two interfaces enabeled again ...
and guess what all 9 interfaces did get a vif (that is the way it should be)

Now a reboot and ..... Yep the problem is back again.

I think this is a very important one, but you never now if there is more .. :)

Please implement this simple fix soon, so that we can check it!

Louis


Files

20200623 Pimd_VIF-issue_FoundViaInterfaceExpiriment.txt (6.73 KB) 20200623 Pimd_VIF-issue_FoundViaInterfaceExpiriment.txt Test_Expiriment_Description Louis B, 06/23/2020 12:56 PM
textdump.tar(1).0 (75 KB) textdump.tar(1).0 Example of system crash (first one with this snapshot) Louis B, 06/25/2020 01:09 PM
20200714 FirstBootCleanInstallLatestRelease.txt (24.1 KB) 20200714 FirstBootCleanInstallLatestRelease.txt Louis B, 07/14/2020 01:27 PM
20200714 InstallingAndTestingPIMD.txt (5.14 KB) 20200714 InstallingAndTestingPIMD.txt Louis B, 07/14/2020 01:27 PM
20200714 RestartWithPIMD-Enabled.txt (27.7 KB) 20200714 RestartWithPIMD-Enabled.txt Louis B, 07/14/2020 01:27 PM
pimd.conf (584 Bytes) pimd.conf Louis B, 07/14/2020 01:27 PM
20200716 PIMD starting and starting NotOK.txt (108 KB) 20200716 PIMD starting and starting NotOK.txt Louis B, 07/16/2020 02:37 AM
Actions #1

Updated by Jim Pingle over 3 years ago

  • Project changed from pfSense to pfSense Packages
  • Category set to PIMD
  • Priority changed from High to Normal
Actions #2

Updated by Jim Pingle over 3 years ago

  • Subject changed from Critical error in bootsequence related to PIMD (and IMGP-proxy I assume) to PIMD starts twice at boot
Actions #3

Updated by Louis B over 3 years ago

Hello,

I did some tests in advance of this patch. I could do that by forcing a reread of the vifs by disabling and enableing of interfaces in combination with pimd -q and pimg

That lead to the result below which is of course a big step forward. However it still does not work, which can be related to the way I tested, but check if the fix is applied.

Problem is the last few words in the log "no enabled vifs".

As said perhaps it is related to the way I am testing now, if not than we need to open another ticket to fix that.

Louis

debug level 0xffffffff (dvmrp_detail,dvmrp_prunes,dvmrp_routes,dvmrp_neighbors,dvmrp_timers,igmp_proto,igmp_timers,igmp_members,trace,timeout,packets,interfaces,kernel,cache,rsrr,pim_detail,pim_hello,pim_register,pim_join_prune,pim_bootstrap,pim_asserts,pim_cand_rp,pim_routes,pim_timers,pim_rpf)
11:47:01.553 pimd version 2.3.2 starting ...
11:47:01.553 Got 262144 byte send buffer size in 0 iterations
11:47:01.553 Got 262144 byte recv buffer size in 0 iterations
11:47:01.553 Got 262144 byte send buffer size in 0 iterations
11:47:01.553 Got 262144 byte recv buffer size in 0 iterations
11:47:01.553 Getting vifs from kernel
11:47:01.553 Installing lagg0.10 (192.168.10.1 on subnet 192.168.10) as vif #0 - rate 0
11:47:01.553 Installing lagg0.16 (192.168.1.1 on subnet 192.168.1) as vif #1 - rate 0
11:47:01.553 Installing lagg0.100 (192.168.100.1 on subnet 192.168.100) as vif #2 - rate 0
11:47:01.553 Installing lagg0.88 (192.168.88.3 on subnet 192.168.88) as vif #3 - rate 0
11:47:01.553 Installing ix0.14 (192.168.14.1 on subnet 192.168.14) as vif #4 - rate 0
11:47:01.553 Installing ix0.18 (192.168.18.1 on subnet 192.168.18) as vif #5 - rate 0
11:47:01.553 Installing ix1.116 (192.168.116.1 on subnet 192.168.116) as vif #6 - rate 0
11:47:01.553 Installing pppoe1 (83.162.43.161 -> 194.109.5.227) as vif #7 - rate=0
11:47:01.553 Installing lagg0.26 (192.168.2.1 on subnet 192.168.2) as vif #8 - rate 0
11:47:01.553 Installing lagg0.13 (192.168.13.1 on subnet 192.168.13) as vif #9 - rate 0
11:47:01.553 Disabling all vifs from kernel
11:47:01.553 Getting vifs from /var/etc/pimd/pimd.conf
11:47:01.553 Local static RP: 192.168.1.1, group 224.0.0.0/4
11:47:01.553 Local static RP: 192.168.116.1, group 224.0.0.0/4
11:47:01.553 Local static RP: 169.254.0.1, group 232.0.0.0/8
11:47:01.553 IGMP query interval : 12 sec
11:47:01.553 IGMP querier timeout : 41 sec
11:47:01.553 Cannot forward: no enabled vifs

Actions #4

Updated by Louis B over 3 years ago

I solved the "no enabled vifs" issue by changing in menu pimd/interfaces/interface binding from default to "Always bind"

Ofcourse I also changed the TTL default to 4 (I think 3 should do as well).

With wireshark I see IGMP messages now, but for some reason it is not yet fully working (testing)

Louis

Actions #5

Updated by Louis B over 3 years ago

With "the patch emulated" (by stopping pimd, disabling and anabling interfaced, stating pimd again) it is working more or less
... to test more if the patch is there

However I just did a clean install of todays 650 snapshot and:
- installing of the packages did not work (not this problem), so I did use a backup without package info
- then configured pimd again (via the GUI)
- disabled one of the interfaces and did wait a minute
- enabled it again and did wait another minute
- asked the pimd status .... and yep crash
- did it another time and .... yep crash again
- and earlier I had another one ....
So in the actual situation the OS is japanised (mrouted?) that sevire that interfaces changes (together with pimd) are enough to crash the system

Louis

Actions #6

Updated by Viktor Gurov over 3 years ago

Louis van Breda wrote:

Hello,

I just discoverd a critical error in the pfSense boot sequence.
- Independed if you have enabled the PIMD-service, PIMD is starting during boot (error-1)

Fix:
https://github.com/pfsense/FreeBSD-ports/pull/890

Actions #7

Updated by Louis B over 3 years ago

Viktor,

"many thanks!!"

I did install PIMD again and will test it as soon as the fix is in the snapshots

Louis

Actions #8

Updated by Jim Pingle over 3 years ago

  • Status changed from New to Pull Request Review
  • Target version deleted (2.5.0)
  • Affected Version deleted (2.5.0)
Actions #9

Updated by Renato Botelho over 3 years ago

  • Status changed from Pull Request Review to Feedback
  • Assignee set to Renato Botelho
  • % Done changed from 0 to 100

PR has been merged. Thanks!

Actions #10

Updated by Louis B over 3 years ago

I just did a clean install (built on Tue Jul 14 09:23:16 EDT 2020) and PIMD 0.03 Significant issues left, but the boot issue seems to be solved. I noted:
- OK: boot without PIMD-package
- NOK (but not this bug,(bug rejected by jim, technical reason to vague): Interface recognition and related PIMD behavoir
- NOK (but not this bug, there is a bug related to interface crashing might be the same .... or not!!): Switching off and on an interface is 99% sure a crash !!!!!!!!!!!
- NEW NOK: first boot with PIMD installed: pfSenseConfigurator pfSense is restoring the configuration /tmp/mnt/cf/config/config.xml @ 2020-07-14 19:59:01 strange as well PIMD package is still there config is gone (note USB with previous config was still attached)
- NEW Strange!! : second boot with PIMD installed. Pimd is startet after "Starting packages" and stoppend and started after "Restarting/Starting all packages."

Louis

Actions #11

Updated by Louis B over 3 years ago

I did retest today e.g. using a clean install. Still bizar how many times pimd is started and stopped during boot. One time should do :)

PIMD STOPPING AFTER KERNEL DONE
PIMD SEE BEHAVOIR JUST BEFORE THE BOOT
PIMD STARTING TWICE (AGAIN!!??)
PIMD STOPPING AFTER KERNEL DONE
AND STARTING AGAIN
Etc see attached file

One start at the right moment should do ..... IMHO

After the clean install the package was not automatic installed
when I did start by hand the install process stopped before

Cleaning up cache... done.

Success (Package config was still there)

I have seen this behavoir before.

Further on know other issues of course still there

Louis

Actions #12

Updated by Louis B over 3 years ago

As allready reported startup behavoir is not yet ok (pimd starting 5 times !!). Hereby some info form today bootlog

2.5.0-DEVELOPMENT (amd64)
built on Sun Jul 19 01:04:31 EDT 2020

pimd        net     0.0.3

Jul 19 10:33:25 pfSense pimd33412: pimd version 2.3.2 starting ...
Jul 19 10:33:24 pfSense root29125: Bootup complete

Jul 19 10:33:24 pfSense pimd26302: pimd version 2.3.2 starting ...
Jul 19 10:33:21 pfSense pimd43281: pimd version 2.3.2 exiting.
Jul 19 10:33:21 pfSense kernel: done.
Jul 19 10:33:21 pfSense php-fpm2213: /rc.start_packages: Restarting/Starting all packages.
Jul 19 10:33:21 pfSense syslogd: kernel boot file is /boot/kernel/kernel
Jul 19 10:33:20 pfSense syslogd: exiting on signal 15
Jul 19 10:33:20 pfSense kernel: done.

Jul 19 10:33:18 pfSense pimd42946: pimd version 2.3.2 starting ...
Jul 19 10:33:18 pfSense pimd42826: pimd version 2.3.2 starting ...
Jul 19 10:33:18 pfSense php-fpm2213: /rc.start_packages: Restarting/Starting all packages.
Jul 19 10:33:17 pfSense check_reload_status532: Starting packages

Actions #13

Updated by Louis B over 3 years ago

Despite that this issue has state "Fixed". it is not fixed at all !! So please "reopen" this bug!

Actions #14

Updated by Renato Botelho almost 2 years ago

  • Assignee deleted (Renato Botelho)
Actions #15

Updated by Jordan G over 1 year ago

still seeing this start 2x on 22.05 following reboot

Actions #16

Updated by Jordan G over 1 year ago

this is still present on 22.11 20221007-1510/pimd0.0.3_5
just enabling the package shows it starting twice in the logs each with unique PID

Actions #17

Updated by Jordan G about 1 year ago

not seeing the previously noted behavior on 23.01.b.20221217.1429 w/ pimd 0.0.3_5

good so far!

Actions #18

Updated by Louis B about 1 year ago

Running a pimd beta build on top of latest 2.7 pfSense build, Í just checked the general system log. Which does look better than in the past, however I noticed the following message:

Dec 18 13:33:19 pfSense pimd24: Another multicast routing application is already running.

Which is verdict.

The more I see systemlog general this

Dec 18 13:33:18 pfSense kernel: done.
Dec 18 13:33:19 pfSense php-fpm436: /rc.start_packages: Restarting/Starting all packages.
Dec 18 13:33:19 pfSense pimd99787: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Dec 18 13:33:19 pfSense pimd24: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Dec 18 13:33:19 pfSense pimd24: Another multicast routing application is already running.

So two different process numbers .....
Note that just above this the line /rc.start_packages: Restarting/Starting all packages ......

Actions #19

Updated by Jordan G about 1 year ago

seeing 2x PIMD start, each with unique PIDs running 0.0.3_6 on 23.01 release

Actions #20

Updated by Jordan G about 1 year ago

not seeing this happen with pimd 0.0.3_6 on 23.05.a.20230224.0600

Actions #21

Updated by Louis B about 1 year ago

I still see things which are strange running the actual 2.7 build in combination with the latest pimd beta from github. When filtered the log for 'pimd', I find the following lines:

Feb 27 19:38:50 pfSense pimd35669: Timeout waiting for reply from routing socket for 10.60.xxx.yy
Feb 27 19:38:55 pfSense pimd35669: pimd version 3.0-beta1 exiting.
Feb 27 19:40:44 pfSense pimd83789: pimd version 3.0-beta1 starting.
Feb 27 19:40:44 pfSense pimd84108: pimd version 3.0-beta1 starting.
Feb 27 19:40:44 pfSense pimd83789: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Feb 27 19:40:44 pfSense pimd84108: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Feb 27 19:40:44 pfSense pimd84108: Another multicast routing application is already running.
Feb 27 19:40:44 pfSense pimd83789: Timeout waiting for reply from routing socket for 169.254.0.1
Feb 27 19:40:45 pfSense pimd83789: pimd version 3.0-beta1 exiting.
Feb 27 19:40:48 pfSense pimd94534: pimd version 3.0-beta1 starting.
Feb 27 19:40:48 pfSense pimd94534: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Feb 27 19:40:48 pfSense pimd94534: Timeout waiting for reply from routing socket for 169.254.0.1
Feb 27 19:40:49 pfSense pimd14328: pimd version 3.0-beta1 starting.
Feb 27 19:40:49 pfSense pimd14328: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Feb 27 19:42:10 pfSense pimd14328: Timeout waiting for reply from routing socket for 169.254.0.1

Note that the lines:
- Timeout waiting for reply from routing socket for 169.254.0.1
- Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Are minor bugs related to pimd itself

And that I have no idea what is behind:
Feb 27 19:38:55 pfSense pimd35669: pimd version 3.0-beta1 exiting.
Feb 27 19:40:44 pfSense pimd83789: pimd version 3.0-beta1 starting.
It would be nice to understand this !!!

What comes close to my original finding is:
Feb 27 19:40:44 pfSense pimd83789: pimd version 3.0-beta1 starting.
Feb 27 19:40:44 pfSense pimd84108: pimd version 3.0-beta1 starting.

Practically seen the actual pimd beta is running fine here (I need it to reach my media server, and that does work). However, there seems to be some minor issues here (pfSense) and there (pimd).

Actions #22

Updated by Jordan G 11 months ago

double starting again with pfSense Plus 23.05.a.20230414.0600 and PIMD 0.0.3_6, at least it's self aware

pimd     84129     Another multicast routing application is already running.
pimd     83727     pimd version 2.3.2 starting ...
pimd     84129     pimd version 2.3.2 starting ... 

Actions #23

Updated by Jordan G 9 months ago

confirming, same thing as above with 23.05.1 and pimd 0.0.3_6

Actions #24

Updated by Jim Pingle 9 months ago

  • Status changed from Feedback to Confirmed
Actions

Also available in: Atom PDF