Bug #15043
closedIGMP proxy works intermittently
100%
Description
The IGMP proxy no longer works reliably after the update to 2.7.1. Channels that are no longer watched continue to be streamed and one channel cannot be accessed at all. This means that switching between channels can consume a lot of bandwidth as the stream is not interrupted. After restarting the IGMP proxy, a random other channel no longer works, but the one that did not work before usually does.
Bandwidth consumption from switching between channels:
The routing log is full of such entries:
Nov 26 13:23:50 igmpproxy 81833 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:23:50 igmpproxy 81833 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:23:50 igmpproxy 81833 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:23:50 igmpproxy 81833 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:23:50 igmpproxy 81833 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:23:08 igmpproxy 32918 can't leave group 239.186.64.19 on interface igb0; Errno(49): Can't assign requested address Nov 26 13:23:08 igmpproxy 32918 can't leave group 239.186.64.18 on interface igb0; Errno(49): Can't assign requested address Nov 26 13:23:08 igmpproxy 32918 can't leave group 239.186.68.13 on interface igb0; Errno(49): Can't assign requested address Nov 26 13:23:08 igmpproxy 32918 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:23:08 igmpproxy 32918 select() failure; Errno(4): Interrupted system call Nov 26 13:22:17 igmpproxy 32918 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:22:17 igmpproxy 32918 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:22:17 igmpproxy 32918 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:21:35 igmpproxy 44865 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:21:35 igmpproxy 44865 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:21:35 igmpproxy 44865 select() failure; Errno(4): Interrupted system call Nov 26 13:07:37 igmpproxy 44865 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 13:01:24 igmpproxy 44865 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 12:51:17 igmpproxy 44865 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 12:44:20 igmpproxy 44865 MRT_DEL_MFC; Errno(49): Can't assign requested address Nov 26 11:42:09 igmpproxy 44865 MRT_DEL_MFC; Errno(49): Can't assign requested address
Files
Related issues
Updated by Martial G 10 months ago
Also applies to pfSense Plus version 23.09 according to this article: https://forum.netgate.com/topic/184360/igmp-proxy-no-longer-works-reliably-after-2-7-1-update?_=1701184819006
Updated by Remie van de Zande 10 months ago
Martial G wrote in #note-1:
Also applies to pfSense Plus version 23.09 according to this article: https://forum.netgate.com/topic/184360/igmp-proxy-no-longer-works-reliably-after-2-7-1-update?_=1701184819006
I've also posted in that topic and I am here to confirm that this has the same behaviour in 23.09.
Nov 30 19:03:55 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 can't leave group 224.0.252.123 on interface em0.4; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55 igmpproxy 99971 select() failure; Errno(4): Interrupted system call
Nov 30 18:51:39 igmpproxy 99971 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 18:46:19 igmpproxy 78906 select() failure; Errno(4): Interrupted system call
Nov 30 17:37:53 igmpproxy 78906 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 08:01:41 igmpproxy 78906 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 06:52:27 igmpproxy 78906 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 06:51:55 igmpproxy 78906 MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 06:09:22 igmpproxy 78906 MRT_DEL_MFC; Errno(49): Can't assign requested address
Updated by Jim Pingle 9 months ago
- Related to Bug #15065: igmpproxy works very unreliable, needs frequent restarts to watch television added
Updated by Jim Pingle 9 months ago
- Subject changed from IGMP proxy no longer works reliably after 2.7.1 update to IGMP proxy not working reliably
- Priority changed from Urgent to Normal
- Target version set to 2.8.0
- Plus Target Version set to 24.03
Updated by Rai Wol 9 months ago
- File igmp_wan.JPG igmp_wan.JPG added
- File igmp_lan.JPG igmp_lan.JPG added
I did a compare on the LAN and the WAN interface after changing TV channel. On the LAN interface I see a IGMP leave group. But on the WAN interface there is no leave group, only Join group. Eventually the old stream will timeout after 3-4 min.
Packet capture on LAN (192.168.5.X)
and
Packet capture on WAN (10.214.93.155)
Updated by Kristof Provost 9 months ago
MRT_DEL_MFC; Errno(49) is interesting. error 49 is EADDRNOTAVAIL, which can only be returned (for MRT_DEL_MFC at least) if the membership igmpproxy attempted to remove does not exist.
So as far as the kernel is concerned we're not joined to that group.
(Do note that the membership lookup takes both origin and group address into account, so it's possible that the group is correct but the membership address is not for example.)
We're going to need a bit more information to track this down.
Let's start with a debug log, because igmpproxy can log the relevant information before it attempts MRT_DEL_MFC. That does require it to run with debug verbosity, which can't be set through the webui. Terminate the running process and restart it as `/usr/local/sbin/igmpproxy -v -v /var/etc/igmpproxy.conf` (or /usr/local/sbin/igmpproxy -d -v -v /var/etc/igmpproxy.conf to run it in the foreground and send all output to stdout).
Once the error occurs (and igmpproxy is still running!) also export the kernel's view of the multicast state with netstat -i -a -n and netstat -g.
Updated by Remie van de Zande 9 months ago
- File routing.log.0.bz2 routing.log.0.bz2 added
- File routing.log.1.bz2 routing.log.1.bz2 added
- File routing.log.2.bz2 routing.log.2.bz2 added
- File routing.log.4.bz2 routing.log.4.bz2 added
- File routing.log.3.bz2 routing.log.3.bz2 added
- File routing.log.5.bz2 routing.log.5.bz2 added
- File routing.log routing.log added
- File routing.log.6.bz2 routing.log.6.bz2 added
- File netstat g.txt netstat g.txt added
- File netstat ian.txt netstat ian.txt added
I've ran the test.
The logs are included during my test time.
Also the netstat results are included.
Hope this helps.
Updated by Haraldinho D 9 months ago
If required I can deliver data too. I also suffer from this issue, but as I don't watch a lot of linear TV, I have 23.09.1 actively running, so no need to reconfigure before testing. Just let me know.
Updated by Kristof Provost 9 months ago
netstat -gs might be interesting too.
Looking at the igmpproxy log there appear to be two things going on. The first this this:
Dec 9 10:43:18 gateway igmpproxy65193: Got a interrupt signal. Exiting.
Along with the select() error above it looks like something or someone decided to terminate (either via SIGINT or SIGTERM) igmpproxy. It then tries to clean up all of the forwarding entries, which fails.
I believe I understand why the second issue might be happening. The error number and code mean that mfc_find() must have failed. Given the logged multicast group and origin are correct and present in the netstat output there's really only one way for that to happen, and that is that there are pending entries in mfc_stall_ring. That's not entirely unexpected for an active subscription (although it's a bit odd that all groups have this).
It seems fairly straightforward to teach the cleanup code to cope with that, and actually remove the entry even in that case. I'll build a test kernel with that change soon.
I do not have an explanation for why igmpproxy is being terminated, but it does seem to have been restarted (as evidenced by igmpproxy logs from a different pid). It looks like this new instance might be trying and failing to clean up the existing routes, which might explain why the system stays in the bad state. If that's the scenario the change I have in mind might fix the entire problem. If not it should at least give us further clues.
Updated by Kristof Provost 9 months ago
Here's that patched kernel: https://nc.netgate.com/nextcloud/index.php/s/L9ERQHXbtygQHrt
Can someone try that one (ideally with the verbose igmpproxy output enabled) and see if that makes a difference?
Updated by Haraldinho D 9 months ago
Hey Kristof, I just sent you some additional logs + the netstat -gs output as well in your email. Happy to try the patched kernel, but not sure what to do with it :-o
Please let me know and I'll try it right away. My wife is out playing tennis, so have some time to disrupt the tv :-D
Updated by Kristof Provost 9 months ago
Backup your device, download the pkg file to it, "pkg install -U pfSense-kernel-pfSense-2.7.2.r.20231211.1745.pkg" and reboot.
Updated by Rai Wol 9 months ago
- File netstat-g.zip netstat-g.zip added
- File routing.zip routing.zip added
Kristof Provost wrote in #note-12:
Backup your device, download the pkg file to it, "pkg install -U pfSense-kernel-pfSense-2.7.2.r.20231211.1745.pkg" and reboot.
I just tried the patched kernel. Unfortunately I see no difference. In the logging I don't see the error: MRT_DEL_MFC; Errno(49): Can't assign requested address. But there is still no leave group IGMP message on the outside interface, so all the streams keep active and consume all the bandwidth after fast zapping channels.
I attached my routing.log and netstat -g.
Updated by Haraldinho D 9 months ago
I have pfSense+ 23.09.1, not 2.7.2, so probably this patch is not for me... Can you build one for 23.09.1?
Updated by Kristof Provost 9 months ago
Haraldinho D wrote in #note-14:
I have pfSense+ 23.09.1, not 2.7.2, so probably this patch is not for me... Can you build one for 23.09.1?
Given that we've got an initial data point that this change is insufficient there's not much point to that. Let me look at the new log files first so we can run a more useful next test.
Updated by Kristof Provost 9 months ago
It's also not clear to me why igmpproxy shuts down.
The only thing I can think of right now is that there's a new wan ip (or at least, a DHCP renew that's fooling rc.newwanip). Can someone take a look at their DHCP logs to confirm that (i.e. see if the time for "igmpproxy[<pid>]: Got an interrupt signal. Exiting." matches a DHCP event?
I'd expect to see something like "pfSense check_reload_status[<pid>]: rc.newwanip starting <if>" in system.log, and for the timestamp to be the same as (or very close to) the igmpproxy exit message.
Updated by Haraldinho D 9 months ago
Kristof, I sent logs on rc.newwanip occurrence per email.
Updated by Kristof Provost 9 months ago
I believe I've found the reason we don't send an igmp leave when we should, along with the cleanup improvements for MRT_DEL_MFC.
2.7.2 and 23.09.1 kernel images are on https://nc.netgate.com/nextcloud/index.php/s/L9ERQHXbtygQHrt
Updated by Haraldinho D 9 months ago
Seems to work like a charm!! No more streams that stop and it also shows in the bandwidth graph. The peaks show the second stream kicking in when zapping and right after that the drop when the first stream is abandoned. Great work Kristof! Do you also understand how this regression happened?
Updated by Rai Wol 9 months ago
- File igmp_leave_group.JPG igmp_leave_group.JPG added
Kristof Provost wrote in #note-18:
I believe I've found the reason we don't send an igmp leave when we should, along with the cleanup improvements for MRT_DEL_MFC.
2.7.2 and 23.09.1 kernel images are on https://nc.netgate.com/nextcloud/index.php/s/L9ERQHXbtygQHrt
Great job Kristof! I can confirm its working again. Thank you very much! For my interest, what was the cause of this?
I see igmp leave group messages now.
Updated by Kristof Provost 9 months ago
Okay, thanks for confirming.
The relevant fixes are:
https://reviews.freebsd.org/D43031
https://reviews.freebsd.org/D43032
https://reviews.freebsd.org/D43033
The sending of igmp leave got broken when I fixed a memory leak, and failed to notice some of the subtleties in the igmp/multicast code. Once the reviews are approved (or time out) I'll push upstream and cherry-pick to the pfsense branches. I expect the fixes to be in snapshots around the middle of next week.
Updated by Haraldinho D 9 months ago
Hi Kristof, there have been quite some igmpproxy regressions in the past releases. I’m always anxious when applying a new release and my IPTV setup is usually the first I check when upgrading. Perhaps it is a good idea to increase the automated test coverage in this area.
In any case happy with all the effort that went into fixing this issue. Much appreciated!
Updated by Kristof Provost 9 months ago
The relevant fixes have been merged to the devel-main and plus-devel-main branches, and are included in recent snapshot builds.
Updated by Haraldinho D 9 months ago
When/how will this lead to an installable patch?
Updated by Kristof Provost 9 months ago
This requires a kernel change, so it'll be part of the next release. This issue cannot be fixed through the System_Patches package.
Updated by Arturo de Vries 9 months ago
That pretty much sucks.
So, we either have to wait for 2.8 final or install a snapshot/dev version to be able to use IGMP proxy?
How did this work out the last time with https://redmine.pfsense.org/issues/12079 where we also had to wait for the final Pf release, only to see the issue being "closed due to lack of feedback" just before 2.7.1 went stable.
In my case I've gone months without live television. And I certainly don't want to risk deploying a snapshot on a production system.
Updated by Haraldinho D 9 months ago
I agree completely. This is a very disruptive bug for those that use IPTV. You can simply not use it. I think this bug is bad enough to allow for a minor intermediate release.
Updated by Arturo de Vries 9 months ago
It would make a lot of us happy if the kernel patch was made available as a separate download.
There was a downloadable kernel patch, but this has since been removed, no idea why.
Please give users the possibility to download and fix their Pfsense installation.
Updated by Jim Pingle 7 months ago
- Subject changed from IGMP proxy works intermitently to IGMP proxy works intermittently
Updated by Martial G 7 months ago
This ticket can be closed and therefore included in the next release 2.8 or earlier (very important). Since the kernel fix from @Kristof Provost, IPTV with the IGMP proxy has been working perfectly for me.
I only wish that in future IPTV will always be tested before a new release. Because here in Switzerland and also in Europe, many ISPs rely on IPTV with multicast.