Bug #10692
openPIMD starts twice at boot
Added by Louis B over 4 years ago. Updated over 1 year ago.
100%
Description
Hello,
I just discoverd a critical error in the pfSense boot sequence.
- Independed if you have enabled the PIMD-service, PIMD is starting during boot (error-1)
- More sevire PIMD is started twice. With as consequence the those two PIMD-instances are parallel communicating with the OS to get vifs for all interfaces. To starting PIMD interfaces in parallel is causing fatal errors (error-2)
In the past I already notices that only 3 out of my 9 interfaces got a vif assigned (not ok!!). Two hours ago I had the "bright idea" to test what happens when I would
- kill PIMD (pimd q) disable two of the three interfaces that did get a vif
- starting PIMD again (pimd -c /var/etc/pimd/pimd.conf -d -f -N)
Guess what, now 7 out of the nine interfaces do get a vif
Then I did stop and start pimd again with the two interfaces enabeled again ...
and guess what all 9 interfaces did get a vif (that is the way it should be)
Now a reboot and ..... Yep the problem is back again.
I think this is a very important one, but you never now if there is more .. :)
Please implement this simple fix soon, so that we can check it!
Louis
Files
20200623 Pimd_VIF-issue_FoundViaInterfaceExpiriment.txt (6.73 KB) 20200623 Pimd_VIF-issue_FoundViaInterfaceExpiriment.txt | Test_Expiriment_Description | Louis B, 06/23/2020 12:56 PM | |
textdump.tar(1).0 (75 KB) textdump.tar(1).0 | Example of system crash (first one with this snapshot) | Louis B, 06/25/2020 01:09 PM | |
20200714 FirstBootCleanInstallLatestRelease.txt (24.1 KB) 20200714 FirstBootCleanInstallLatestRelease.txt | Louis B, 07/14/2020 01:27 PM | ||
20200714 InstallingAndTestingPIMD.txt (5.14 KB) 20200714 InstallingAndTestingPIMD.txt | Louis B, 07/14/2020 01:27 PM | ||
20200714 RestartWithPIMD-Enabled.txt (27.7 KB) 20200714 RestartWithPIMD-Enabled.txt | Louis B, 07/14/2020 01:27 PM | ||
pimd.conf (584 Bytes) pimd.conf | Louis B, 07/14/2020 01:27 PM | ||
20200716 PIMD starting and starting NotOK.txt (108 KB) 20200716 PIMD starting and starting NotOK.txt | Louis B, 07/16/2020 02:37 AM |
Updated by Jim Pingle over 4 years ago
- Project changed from pfSense to pfSense Packages
- Category set to PIMD
- Priority changed from High to Normal
Updated by Jim Pingle over 4 years ago
- Subject changed from Critical error in bootsequence related to PIMD (and IMGP-proxy I assume) to PIMD starts twice at boot
Updated by Louis B over 4 years ago
Hello,
I did some tests in advance of this patch. I could do that by forcing a reread of the vifs by disabling and enableing of interfaces in combination with pimd -q and pimg
That lead to the result below which is of course a big step forward. However it still does not work, which can be related to the way I tested, but check if the fix is applied.
Problem is the last few words in the log "no enabled vifs".
As said perhaps it is related to the way I am testing now, if not than we need to open another ticket to fix that.
Louis
debug level 0xffffffff (dvmrp_detail,dvmrp_prunes,dvmrp_routes,dvmrp_neighbors,dvmrp_timers,igmp_proto,igmp_timers,igmp_members,trace,timeout,packets,interfaces,kernel,cache,rsrr,pim_detail,pim_hello,pim_register,pim_join_prune,pim_bootstrap,pim_asserts,pim_cand_rp,pim_routes,pim_timers,pim_rpf)
11:47:01.553 pimd version 2.3.2 starting ...
11:47:01.553 Got 262144 byte send buffer size in 0 iterations
11:47:01.553 Got 262144 byte recv buffer size in 0 iterations
11:47:01.553 Got 262144 byte send buffer size in 0 iterations
11:47:01.553 Got 262144 byte recv buffer size in 0 iterations
11:47:01.553 Getting vifs from kernel
11:47:01.553 Installing lagg0.10 (192.168.10.1 on subnet 192.168.10) as vif #0 - rate 0
11:47:01.553 Installing lagg0.16 (192.168.1.1 on subnet 192.168.1) as vif #1 - rate 0
11:47:01.553 Installing lagg0.100 (192.168.100.1 on subnet 192.168.100) as vif #2 - rate 0
11:47:01.553 Installing lagg0.88 (192.168.88.3 on subnet 192.168.88) as vif #3 - rate 0
11:47:01.553 Installing ix0.14 (192.168.14.1 on subnet 192.168.14) as vif #4 - rate 0
11:47:01.553 Installing ix0.18 (192.168.18.1 on subnet 192.168.18) as vif #5 - rate 0
11:47:01.553 Installing ix1.116 (192.168.116.1 on subnet 192.168.116) as vif #6 - rate 0
11:47:01.553 Installing pppoe1 (83.162.43.161 -> 194.109.5.227) as vif #7 - rate=0
11:47:01.553 Installing lagg0.26 (192.168.2.1 on subnet 192.168.2) as vif #8 - rate 0
11:47:01.553 Installing lagg0.13 (192.168.13.1 on subnet 192.168.13) as vif #9 - rate 0
11:47:01.553 Disabling all vifs from kernel
11:47:01.553 Getting vifs from /var/etc/pimd/pimd.conf
11:47:01.553 Local static RP: 192.168.1.1, group 224.0.0.0/4
11:47:01.553 Local static RP: 192.168.116.1, group 224.0.0.0/4
11:47:01.553 Local static RP: 169.254.0.1, group 232.0.0.0/8
11:47:01.553 IGMP query interval : 12 sec
11:47:01.553 IGMP querier timeout : 41 sec
11:47:01.553 Cannot forward: no enabled vifs
Updated by Louis B over 4 years ago
I solved the "no enabled vifs" issue by changing in menu pimd/interfaces/interface binding from default to "Always bind"
Ofcourse I also changed the TTL default to 4 (I think 3 should do as well).
With wireshark I see IGMP messages now, but for some reason it is not yet fully working (testing)
Louis
Updated by Louis B over 4 years ago
- File textdump.tar(1).0 textdump.tar(1).0 added
With "the patch emulated" (by stopping pimd, disabling and anabling interfaced, stating pimd again) it is working more or less
... to test more if the patch is there
However I just did a clean install of todays 650 snapshot and:
- installing of the packages did not work (not this problem), so I did use a backup without package info
- then configured pimd again (via the GUI)
- disabled one of the interfaces and did wait a minute
- enabled it again and did wait another minute
- asked the pimd status .... and yep crash
- did it another time and .... yep crash again
- and earlier I had another one ....
So in the actual situation the OS is japanised (mrouted?) that sevire that interfaces changes (together with pimd) are enough to crash the system
Louis
Updated by Viktor Gurov over 4 years ago
Louis van Breda wrote:
Hello,
I just discoverd a critical error in the pfSense boot sequence.
- Independed if you have enabled the PIMD-service, PIMD is starting during boot (error-1)
Updated by Louis B over 4 years ago
Viktor,
"many thanks!!"
I did install PIMD again and will test it as soon as the fix is in the snapshots
Louis
Updated by Jim Pingle over 4 years ago
- Status changed from New to Pull Request Review
- Target version deleted (
2.5.0) - Affected Version deleted (
2.5.0)
Updated by Renato Botelho over 4 years ago
- Status changed from Pull Request Review to Feedback
- Assignee set to Renato Botelho
- % Done changed from 0 to 100
PR has been merged. Thanks!
Updated by Louis B over 4 years ago
- File 20200714 FirstBootCleanInstallLatestRelease.txt 20200714 FirstBootCleanInstallLatestRelease.txt added
- File 20200714 InstallingAndTestingPIMD.txt 20200714 InstallingAndTestingPIMD.txt added
- File 20200714 RestartWithPIMD-Enabled.txt 20200714 RestartWithPIMD-Enabled.txt added
- File pimd.conf pimd.conf added
I just did a clean install (built on Tue Jul 14 09:23:16 EDT 2020) and PIMD 0.03 Significant issues left, but the boot issue seems to be solved. I noted:
- OK: boot without PIMD-package
- NOK (but not this bug,(bug rejected by jim, technical reason to vague): Interface recognition and related PIMD behavoir
- NOK (but not this bug, there is a bug related to interface crashing might be the same .... or not!!): Switching off and on an interface is 99% sure a crash !!!!!!!!!!!
- NEW NOK: first boot with PIMD installed: pfSenseConfigurator pfSense is restoring the configuration /tmp/mnt/cf/config/config.xml @ 2020-07-14 19:59:01 strange as well PIMD package is still there config is gone (note USB with previous config was still attached)
- NEW Strange!! : second boot with PIMD installed. Pimd is startet after "Starting packages" and stoppend and started after "Restarting/Starting all packages."
Louis
Updated by Louis B over 4 years ago
- File 20200716 PIMD starting and starting NotOK.txt 20200716 PIMD starting and starting NotOK.txt added
I did retest today e.g. using a clean install. Still bizar how many times pimd is started and stopped during boot. One time should do :)
PIMD STOPPING AFTER KERNEL DONE
PIMD SEE BEHAVOIR JUST BEFORE THE BOOT
PIMD STARTING TWICE (AGAIN!!??)
PIMD STOPPING AFTER KERNEL DONE
AND STARTING AGAIN
Etc see attached file
One start at the right moment should do ..... IMHO
After the clean install the package was not automatic installed
when I did start by hand the install process stopped before
Cleaning up cache... done.
Success (Package config was still there)
I have seen this behavoir before.
Further on know other issues of course still there
Louis
Updated by Louis B over 4 years ago
As allready reported startup behavoir is not yet ok (pimd starting 5 times !!). Hereby some info form today bootlog
2.5.0-DEVELOPMENT (amd64)
built on Sun Jul 19 01:04:31 EDT 2020
pimd net 0.0.3
Jul 19 10:33:25 pfSense pimd33412: pimd version 2.3.2 starting ...
Jul 19 10:33:24 pfSense root29125: Bootup complete
Jul 19 10:33:24 pfSense pimd26302: pimd version 2.3.2 starting ...
Jul 19 10:33:21 pfSense pimd43281: pimd version 2.3.2 exiting.
Jul 19 10:33:21 pfSense kernel: done.
Jul 19 10:33:21 pfSense php-fpm2213: /rc.start_packages: Restarting/Starting all packages.
Jul 19 10:33:21 pfSense syslogd: kernel boot file is /boot/kernel/kernel
Jul 19 10:33:20 pfSense syslogd: exiting on signal 15
Jul 19 10:33:20 pfSense kernel: done.
Jul 19 10:33:18 pfSense pimd42946: pimd version 2.3.2 starting ...
Jul 19 10:33:18 pfSense pimd42826: pimd version 2.3.2 starting ...
Jul 19 10:33:18 pfSense php-fpm2213: /rc.start_packages: Restarting/Starting all packages.
Jul 19 10:33:17 pfSense check_reload_status532: Starting packages
Updated by Louis B over 4 years ago
Despite that this issue has state "Fixed". it is not fixed at all !! So please "reopen" this bug!
Updated by Jordan G over 2 years ago
still seeing this start 2x on 22.05 following reboot
Updated by Jordan G about 2 years ago
this is still present on 22.11 20221007-1510/pimd0.0.3_5
just enabling the package shows it starting twice in the logs each with unique PID
Updated by Jordan G about 2 years ago
not seeing the previously noted behavior on 23.01.b.20221217.1429 w/ pimd 0.0.3_5
good so far!
Updated by Louis B about 2 years ago
Running a pimd beta build on top of latest 2.7 pfSense build, Í just checked the general system log. Which does look better than in the past, however I noticed the following message:
Dec 18 13:33:19 pfSense pimd24: Another multicast routing application is already running.
Which is verdict.
The more I see systemlog general this
Dec 18 13:33:18 pfSense kernel: done.
Dec 18 13:33:19 pfSense php-fpm436: /rc.start_packages: Restarting/Starting all packages.
Dec 18 13:33:19 pfSense pimd99787: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Dec 18 13:33:19 pfSense pimd24: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Dec 18 13:33:19 pfSense pimd24: Another multicast routing application is already running.
So two different process numbers .....
Note that just above this the line /rc.start_packages: Restarting/Starting all packages ......
Updated by Jordan G almost 2 years ago
seeing 2x PIMD start, each with unique PIDs running 0.0.3_6 on 23.01 release
Updated by Jordan G almost 2 years ago
not seeing this happen with pimd 0.0.3_6 on 23.05.a.20230224.0600
Updated by Louis B almost 2 years ago
I still see things which are strange running the actual 2.7 build in combination with the latest pimd beta from github. When filtered the log for 'pimd', I find the following lines:
Feb 27 19:38:50 pfSense pimd35669: Timeout waiting for reply from routing socket for 10.60.xxx.yy
Feb 27 19:38:55 pfSense pimd35669: pimd version 3.0-beta1 exiting.
Feb 27 19:40:44 pfSense pimd83789: pimd version 3.0-beta1 starting.
Feb 27 19:40:44 pfSense pimd84108: pimd version 3.0-beta1 starting.
Feb 27 19:40:44 pfSense pimd83789: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Feb 27 19:40:44 pfSense pimd84108: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Feb 27 19:40:44 pfSense pimd84108: Another multicast routing application is already running.
Feb 27 19:40:44 pfSense pimd83789: Timeout waiting for reply from routing socket for 169.254.0.1
Feb 27 19:40:45 pfSense pimd83789: pimd version 3.0-beta1 exiting.
Feb 27 19:40:48 pfSense pimd94534: pimd version 3.0-beta1 starting.
Feb 27 19:40:48 pfSense pimd94534: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Feb 27 19:40:48 pfSense pimd94534: Timeout waiting for reply from routing socket for 169.254.0.1
Feb 27 19:40:49 pfSense pimd14328: pimd version 3.0-beta1 starting.
Feb 27 19:40:49 pfSense pimd14328: Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Feb 27 19:42:10 pfSense pimd14328: Timeout waiting for reply from routing socket for 169.254.0.1
Note that the lines:
- Timeout waiting for reply from routing socket for 169.254.0.1
- Recommended querier timeout = Robustness x query-interval + response-time / 2 = 3 x 12 + 10 / 2 = 41
Are minor bugs related to pimd itself
And that I have no idea what is behind:
Feb 27 19:38:55 pfSense pimd35669: pimd version 3.0-beta1 exiting.
Feb 27 19:40:44 pfSense pimd83789: pimd version 3.0-beta1 starting.
It would be nice to understand this !!!
What comes close to my original finding is:
Feb 27 19:40:44 pfSense pimd83789: pimd version 3.0-beta1 starting.
Feb 27 19:40:44 pfSense pimd84108: pimd version 3.0-beta1 starting.
Practically seen the actual pimd beta is running fine here (I need it to reach my media server, and that does work). However, there seems to be some minor issues here (pfSense) and there (pimd).
Updated by Jordan G over 1 year ago
double starting again with pfSense Plus 23.05.a.20230414.0600 and PIMD 0.0.3_6, at least it's self aware
pimd 84129 Another multicast routing application is already running. pimd 83727 pimd version 2.3.2 starting ... pimd 84129 pimd version 2.3.2 starting ...
Updated by Jordan G over 1 year ago
confirming, same thing as above with 23.05.1 and pimd 0.0.3_6
Updated by Jim Pingle over 1 year ago
- Status changed from Feedback to Confirmed