Project

General

Profile

Actions

Bug #15043

closed

IGMP proxy works intermittently

Added by Martial G about 1 year ago. Updated 11 days ago.

Status:
Resolved
Priority:
Normal
Category:
IGMP Proxy
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
24.03
Release Notes:
Default
Affected Version:
2.7.1
Affected Architecture:
amd64

Description

The IGMP proxy no longer works reliably after the update to 2.7.1. Channels that are no longer watched continue to be streamed and one channel cannot be accessed at all. This means that switching between channels can consume a lot of bandwidth as the stream is not interrupted. After restarting the IGMP proxy, a random other channel no longer works, but the one that did not work before usually does.

Bandwidth consumption from switching between channels:
WAN Traffic Graph

The routing log is full of such entries:

Nov 26 13:23:50    igmpproxy    81833    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:23:50    igmpproxy    81833    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:23:50    igmpproxy    81833    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:23:50    igmpproxy    81833    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:23:50    igmpproxy    81833    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:23:08    igmpproxy    32918    can't leave group 239.186.64.19 on interface igb0; Errno(49): Can't assign requested address
Nov 26 13:23:08    igmpproxy    32918    can't leave group 239.186.64.18 on interface igb0; Errno(49): Can't assign requested address
Nov 26 13:23:08    igmpproxy    32918    can't leave group 239.186.68.13 on interface igb0; Errno(49): Can't assign requested address
Nov 26 13:23:08    igmpproxy    32918    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:23:08    igmpproxy    32918    select() failure; Errno(4): Interrupted system call
Nov 26 13:22:17    igmpproxy    32918    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:22:17    igmpproxy    32918    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:22:17    igmpproxy    32918    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:21:35    igmpproxy    44865    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:21:35    igmpproxy    44865    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:21:35    igmpproxy    44865    select() failure; Errno(4): Interrupted system call
Nov 26 13:07:37    igmpproxy    44865    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 13:01:24    igmpproxy    44865    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 12:51:17    igmpproxy    44865    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 12:44:20    igmpproxy    44865    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 26 11:42:09    igmpproxy    44865    MRT_DEL_MFC; Errno(49): Can't assign requested address


Files

Screenshot 2023-11-28 191634.png (16.9 KB) Screenshot 2023-11-28 191634.png WAN Traffic Graph Martial G, 11/28/2023 06:27 PM
igmp_wan.JPG (80.9 KB) igmp_wan.JPG Rai Wol, 12/05/2023 07:58 PM
igmp_lan.JPG (195 KB) igmp_lan.JPG Rai Wol, 12/05/2023 07:58 PM
routing.log.0.bz2 (10.4 KB) routing.log.0.bz2 Remie van de Zande, 12/09/2023 09:49 AM
routing.log.1.bz2 (16.5 KB) routing.log.1.bz2 Remie van de Zande, 12/09/2023 09:49 AM
routing.log.2.bz2 (9.13 KB) routing.log.2.bz2 Remie van de Zande, 12/09/2023 09:49 AM
routing.log.4.bz2 (8.75 KB) routing.log.4.bz2 Remie van de Zande, 12/09/2023 09:49 AM
routing.log.3.bz2 (9.95 KB) routing.log.3.bz2 Remie van de Zande, 12/09/2023 09:49 AM
routing.log.5.bz2 (10.6 KB) routing.log.5.bz2 Remie van de Zande, 12/09/2023 09:49 AM
routing.log.6.bz2 (13.5 KB) routing.log.6.bz2 Remie van de Zande, 12/09/2023 09:49 AM
routing.log (476 KB) routing.log Remie van de Zande, 12/09/2023 09:49 AM
netstat g.txt (1.21 KB) netstat g.txt Remie van de Zande, 12/09/2023 09:49 AM
netstat ian.txt (11.7 KB) netstat ian.txt Remie van de Zande, 12/09/2023 09:49 AM
netstat-g.zip (493 Bytes) netstat-g.zip Rai Wol, 12/11/2023 07:55 PM
routing.zip (7.52 KB) routing.zip Rai Wol, 12/11/2023 07:55 PM
clipboard-202312121942-zfdcl.png (84.7 KB) clipboard-202312121942-zfdcl.png Haraldinho D, 12/12/2023 06:42 PM
igmp_leave_group.JPG (209 KB) igmp_leave_group.JPG Rai Wol, 12/12/2023 07:36 PM

Related issues

Related to Bug #15065: igmpproxy works very unreliable, needs frequent restarts to watch televisionDuplicate

Actions
Actions #1

Updated by Martial G about 1 year ago

Actions #2

Updated by Remie van de Zande about 1 year ago

Martial G wrote in #note-1:

Also applies to pfSense Plus version 23.09 according to this article: https://forum.netgate.com/topic/184360/igmp-proxy-no-longer-works-reliably-after-2-7-1-update?_=1701184819006

I've also posted in that topic and I am here to confirm that this has the same behaviour in 23.09.

Nov 30 19:03:55    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    can't leave group 224.0.252.123 on interface em0.4; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 19:03:55    igmpproxy    99971    select() failure; Errno(4): Interrupted system call
Nov 30 18:51:39    igmpproxy    99971    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 18:46:19    igmpproxy    78906    select() failure; Errno(4): Interrupted system call
Nov 30 17:37:53    igmpproxy    78906    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 08:01:41    igmpproxy    78906    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 06:52:27    igmpproxy    78906    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 06:51:55    igmpproxy    78906    MRT_DEL_MFC; Errno(49): Can't assign requested address
Nov 30 06:09:22    igmpproxy    78906    MRT_DEL_MFC; Errno(49): Can't assign requested address

Actions #3

Updated by Jim Pingle about 1 year ago

  • Related to Bug #15065: igmpproxy works very unreliable, needs frequent restarts to watch television added
Actions #4

Updated by Jim Pingle about 1 year ago

  • Subject changed from IGMP proxy no longer works reliably after 2.7.1 update to IGMP proxy not working reliably
  • Priority changed from Urgent to Normal
  • Target version set to 2.8.0
  • Plus Target Version set to 24.03
Actions #5

Updated by Rai Wol about 1 year ago

I did a compare on the LAN and the WAN interface after changing TV channel. On the LAN interface I see a IGMP leave group. But on the WAN interface there is no leave group, only Join group. Eventually the old stream will timeout after 3-4 min.

Packet capture on LAN (192.168.5.X)

and

Packet capture on WAN (10.214.93.155)

Actions #6

Updated by Kristof Provost about 1 year ago

MRT_DEL_MFC; Errno(49) is interesting. error 49 is EADDRNOTAVAIL, which can only be returned (for MRT_DEL_MFC at least) if the membership igmpproxy attempted to remove does not exist.
So as far as the kernel is concerned we're not joined to that group.
(Do note that the membership lookup takes both origin and group address into account, so it's possible that the group is correct but the membership address is not for example.)

We're going to need a bit more information to track this down.
Let's start with a debug log, because igmpproxy can log the relevant information before it attempts MRT_DEL_MFC. That does require it to run with debug verbosity, which can't be set through the webui. Terminate the running process and restart it as `/usr/local/sbin/igmpproxy -v -v /var/etc/igmpproxy.conf` (or /usr/local/sbin/igmpproxy -d -v -v /var/etc/igmpproxy.conf to run it in the foreground and send all output to stdout).

Once the error occurs (and igmpproxy is still running!) also export the kernel's view of the multicast state with netstat -i -a -n and netstat -g.

Actions #7

Updated by Remie van de Zande about 1 year ago

I've ran the test.
The logs are included during my test time.

Also the netstat results are included.
Hope this helps.

Actions #8

Updated by Haraldinho D about 1 year ago

If required I can deliver data too. I also suffer from this issue, but as I don't watch a lot of linear TV, I have 23.09.1 actively running, so no need to reconfigure before testing. Just let me know.

Actions #9

Updated by Kristof Provost about 1 year ago

netstat -gs might be interesting too.

Looking at the igmpproxy log there appear to be two things going on. The first this this:

Dec 9 10:43:18 gateway igmpproxy65193: Got a interrupt signal. Exiting.

Along with the select() error above it looks like something or someone decided to terminate (either via SIGINT or SIGTERM) igmpproxy. It then tries to clean up all of the forwarding entries, which fails.

I believe I understand why the second issue might be happening. The error number and code mean that mfc_find() must have failed. Given the logged multicast group and origin are correct and present in the netstat output there's really only one way for that to happen, and that is that there are pending entries in mfc_stall_ring. That's not entirely unexpected for an active subscription (although it's a bit odd that all groups have this).

It seems fairly straightforward to teach the cleanup code to cope with that, and actually remove the entry even in that case. I'll build a test kernel with that change soon.

I do not have an explanation for why igmpproxy is being terminated, but it does seem to have been restarted (as evidenced by igmpproxy logs from a different pid). It looks like this new instance might be trying and failing to clean up the existing routes, which might explain why the system stays in the bad state. If that's the scenario the change I have in mind might fix the entire problem. If not it should at least give us further clues.

Actions #10

Updated by Kristof Provost about 1 year ago

Here's that patched kernel: https://nc.netgate.com/nextcloud/index.php/s/L9ERQHXbtygQHrt

Can someone try that one (ideally with the verbose igmpproxy output enabled) and see if that makes a difference?

Actions #11

Updated by Haraldinho D about 1 year ago

Hey Kristof, I just sent you some additional logs + the netstat -gs output as well in your email. Happy to try the patched kernel, but not sure what to do with it :-o
Please let me know and I'll try it right away. My wife is out playing tennis, so have some time to disrupt the tv :-D

Actions #12

Updated by Kristof Provost about 1 year ago

Backup your device, download the pkg file to it, "pkg install -U pfSense-kernel-pfSense-2.7.2.r.20231211.1745.pkg" and reboot.

Actions #13

Updated by Rai Wol about 1 year ago

Kristof Provost wrote in #note-12:

Backup your device, download the pkg file to it, "pkg install -U pfSense-kernel-pfSense-2.7.2.r.20231211.1745.pkg" and reboot.

I just tried the patched kernel. Unfortunately I see no difference. In the logging I don't see the error: MRT_DEL_MFC; Errno(49): Can't assign requested address. But there is still no leave group IGMP message on the outside interface, so all the streams keep active and consume all the bandwidth after fast zapping channels.

I attached my routing.log and netstat -g.

Actions #14

Updated by Haraldinho D about 1 year ago

I have pfSense+ 23.09.1, not 2.7.2, so probably this patch is not for me... Can you build one for 23.09.1?

Actions #15

Updated by Kristof Provost about 1 year ago

Haraldinho D wrote in #note-14:

I have pfSense+ 23.09.1, not 2.7.2, so probably this patch is not for me... Can you build one for 23.09.1?

Given that we've got an initial data point that this change is insufficient there's not much point to that. Let me look at the new log files first so we can run a more useful next test.

Actions #16

Updated by Kristof Provost about 1 year ago

It's also not clear to me why igmpproxy shuts down.

The only thing I can think of right now is that there's a new wan ip (or at least, a DHCP renew that's fooling rc.newwanip). Can someone take a look at their DHCP logs to confirm that (i.e. see if the time for "igmpproxy[<pid>]: Got an interrupt signal. Exiting." matches a DHCP event?

I'd expect to see something like "pfSense check_reload_status[<pid>]: rc.newwanip starting <if>" in system.log, and for the timestamp to be the same as (or very close to) the igmpproxy exit message.

Actions #17

Updated by Haraldinho D about 1 year ago

Kristof, I sent logs on rc.newwanip occurrence per email.

Actions #18

Updated by Kristof Provost about 1 year ago

I believe I've found the reason we don't send an igmp leave when we should, along with the cleanup improvements for MRT_DEL_MFC.

2.7.2 and 23.09.1 kernel images are on https://nc.netgate.com/nextcloud/index.php/s/L9ERQHXbtygQHrt

Actions #19

Updated by Haraldinho D about 1 year ago

Seems to work like a charm!! No more streams that stop and it also shows in the bandwidth graph. The peaks show the second stream kicking in when zapping and right after that the drop when the first stream is abandoned. Great work Kristof! Do you also understand how this regression happened?

Actions #20

Updated by Rai Wol about 1 year ago

Kristof Provost wrote in #note-18:

I believe I've found the reason we don't send an igmp leave when we should, along with the cleanup improvements for MRT_DEL_MFC.

2.7.2 and 23.09.1 kernel images are on https://nc.netgate.com/nextcloud/index.php/s/L9ERQHXbtygQHrt

Great job Kristof! I can confirm its working again. Thank you very much! For my interest, what was the cause of this?

I see igmp leave group messages now.

Actions #21

Updated by Kristof Provost about 1 year ago

Okay, thanks for confirming.

The relevant fixes are:
https://reviews.freebsd.org/D43031
https://reviews.freebsd.org/D43032
https://reviews.freebsd.org/D43033

The sending of igmp leave got broken when I fixed a memory leak, and failed to notice some of the subtleties in the igmp/multicast code. Once the reviews are approved (or time out) I'll push upstream and cherry-pick to the pfsense branches. I expect the fixes to be in snapshots around the middle of next week.

Actions #22

Updated by Haraldinho D about 1 year ago

Hi Kristof, there have been quite some igmpproxy regressions in the past releases. I’m always anxious when applying a new release and my IPTV setup is usually the first I check when upgrading. Perhaps it is a good idea to increase the automated test coverage in this area.
In any case happy with all the effort that went into fixing this issue. Much appreciated!

Actions #23

Updated by Haraldinho D about 1 year ago

Any news on an official patch yet?

Actions #24

Updated by Kristof Provost about 1 year ago

The relevant fixes have been merged to the devel-main and plus-devel-main branches, and are included in recent snapshot builds.

Actions #25

Updated by Haraldinho D about 1 year ago

When/how will this lead to an installable patch?

Actions #26

Updated by Kristof Provost about 1 year ago

This requires a kernel change, so it'll be part of the next release. This issue cannot be fixed through the System_Patches package.

Actions #27

Updated by Marcos M about 1 year ago

  • Subject changed from IGMP proxy not working reliably to IGMP proxy works intermitently
  • Status changed from New to Feedback
  • Assignee set to Kristof Provost
  • % Done changed from 0 to 100
Actions #28

Updated by Arturo de Vries about 1 year ago

That pretty much sucks.
So, we either have to wait for 2.8 final or install a snapshot/dev version to be able to use IGMP proxy?
How did this work out the last time with https://redmine.pfsense.org/issues/12079 where we also had to wait for the final Pf release, only to see the issue being "closed due to lack of feedback" just before 2.7.1 went stable.
In my case I've gone months without live television. And I certainly don't want to risk deploying a snapshot on a production system.

Actions #29

Updated by Haraldinho D about 1 year ago

I agree completely. This is a very disruptive bug for those that use IPTV. You can simply not use it. I think this bug is bad enough to allow for a minor intermediate release.

Actions #30

Updated by Arturo de Vries 12 months ago

It would make a lot of us happy if the kernel patch was made available as a separate download.
There was a downloadable kernel patch, but this has since been removed, no idea why.
Please give users the possibility to download and fix their Pfsense installation.

Actions #31

Updated by Jim Pingle 11 months ago

  • Subject changed from IGMP proxy works intermitently to IGMP proxy works intermittently
Actions #32

Updated by Martial G 10 months ago

This ticket can be closed and therefore included in the next release 2.8 or earlier (very important). Since the kernel fix from @Kristof Provost, IPTV with the IGMP proxy has been working perfectly for me.

I only wish that in future IPTV will always be tested before a new release. Because here in Switzerland and also in Europe, many ISPs rely on IPTV with multicast.

Actions #33

Updated by Jim Pingle 10 months ago

  • Status changed from Feedback to Resolved
Actions #34

Updated by Chris Collins 11 days ago

Can the kernel please be reuploaded for 2.7.2? as that link provided is dead and there is still no CE release since the fix.

Actions

Also available in: Atom PDF