Project

General

Profile

Actions

Bug #9453

closed

Reconfiguring a parent LAGG interface breaks its VLANs

Added by Daniele Palumbo about 5 years ago. Updated 20 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
LAGG Interfaces
Target version:
Start date:
04/04/2019
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
24.03
Release Notes:
Default
Affected Version:
2.4.4_2
Affected Architecture:
All

Description

Environment: SG-1000
Not sure if this is valid in other environment.

Upon boot, all the VLANs get orphaned.

The SG-1000 was working previously, with a very similar config, i am not sure what caused the issue (upgrade from 2.4.4 to 2.4.4-p1 or config upgrade).
I tend to exclude the config, see to the end of the bug the test done.
Seems similar to:
https://redmine.pfsense.org/issues/3976
https://redmine.pfsense.org/issues/8527

dmesg:
Trying to mount root from ufs:/dev/ufsid/5af4c96aa287b62c [rw,noatime]...
Warning: no time-of-day clock registered, system time will not be set accurately
random: unblocking device.
cpsw0: link state changed to UP
lagg0: IPv6 addresses on cpsw0 have been removed before adding it as a member to prevent IPv6 address scope violation.
lagg0: link state changed to UP
cpsw1: link state changed to UP
lagg0: IPv6 addresses on cpsw1 have been removed before adding it as a member to prevent IPv6 address scope violation.
vlan0: changing name to 'lagg0.7'
vlan1: changing name to 'lagg0.9'
vlan2: changing name to 'lagg0.10'
vlan3: changing name to 'lagg0.11'
vlan4: changing name to 'lagg0.12'
vlan5: changing name to 'lagg0.13'
vlan6: changing name to 'lagg0.8'
lagg0: link state changed to DOWN
lagg0.7: link state changed to DOWN
lagg0.8: link state changed to DOWN
lagg0.9: link state changed to DOWN
lagg0.10: link state changed to DOWN
lagg0.11: link state changed to DOWN
lagg0.12: link state changed to DOWN
lagg0.13: link state changed to DOWN
lagg0: link state changed to UP
cpsw0: promiscuous mode enabled
cpsw1: promiscuous mode enabled
lagg0: promiscuous mode enabled
carp: 1@lagg0: INIT -> BACKUP (initialization complete)
lagg0.11: promiscuous mode enabled
carp: demoted by 240 to 240 (interface down)
lagg0.12: promiscuous mode enabled
carp: demoted by 240 to 480 (interface down)
lagg0.10: promiscuous mode enabled
carp: demoted by 240 to 720 (interface down)
lagg0.9: promiscuous mode enabled
carp: demoted by 240 to 960 (interface down)
lagg0.7: promiscuous mode enabled
carp: demoted by 240 to 1200 (interface down)
carp: 7@lagg0: INIT -> BACKUP (initialization complete)
carp: 7@lagg0: BACKUP -> MASTER (master timed out)
pflog0: promiscuous mode enabled
carp: 7@lagg0: MASTER -> BACKUP (more frequent advertisement received)
ifa_maintain_loopback_route: deletion failed for interface lagg0: 3

Example nic.

Please note the following:
vlan: 0 vlanpcp: 0 parent interface: <none>

  1. ifconfig lagg0.7
    lagg0.7: flags=8903<UP,BROADCAST,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=80000<LINKSTATE>
    ether c8:df:84:c1:16:37
    inet6 fe80::cadf:84ff:fec1:1637%lagg0.7 prefixlen 64 tentative scopeid 0x8
    inet 172.16.77.242 netmask 0xffffff00 broadcast 172.16.77.255
    inet 172.16.77.240 netmask 0xffffffff broadcast 172.16.77.240 vhid 6
    groups: vlan
    carp: INIT vhid 6 advbase 1 advskew 100
    vlan: 0 vlanpcp: 0 parent interface: <none>
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
    [root@pf2-tos ~]# ifconfig lagg0.8
    lagg0.8: flags=8803<UP,BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=80000<LINKSTATE>
    ether c8:df:84:c1:16:37
    inet6 fe80::cadf:84ff:fec1:1637%lagg0.8 prefixlen 64 tentative scopeid 0xe
    inet 172.16.78.242 netmask 0xffffff00 broadcast 172.16.78.255
    groups: vlan
    vlan: 0 vlanpcp: 0 parent interface: <none>
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
  2. ifconfig lagg0
    lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1492
    options=8000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE>
    ether c8:df:84:c1:16:37
    inet6 fe80::cadf:84ff:fec1:1637%lagg0 prefixlen 64 scopeid 0x7
    inet 172.16.8.242 netmask 0xffffff00 broadcast 172.16.8.255
    inet 172.16.8.240 netmask 0xffffffff broadcast 172.16.8.240 vhid 1
    inet 172.16.8.251 netmask 0xffffffff broadcast 172.16.8.251 vhid 7
    laggproto lacp lagghash l2,l3,l4
    laggport: cpsw0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    laggport: cpsw1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    groups: lagg
    carp: BACKUP vhid 1 advbase 7 advskew 101
    carp: BACKUP vhid 7 advbase 1 advskew 100
    media: Ethernet autoselect
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> #

Test done:
1) remove all but 1 VLAN (11). reboot. Issue still present
2) remove all firewall rules but the anti-lockout. reboot. issue still present.
3) factory reset, restore a previous configuration (just in case)
4) remove again all VLANs but a couple, change VLAN for WAN, remove all frewall rules but anti-lockout. reboot. Issue still present
5) remove all CARP. reboot. issue still present
6) remove cpsw0 from the LAGG. vlans goes up (i would bet because of network restart). reboot. Issue still persist.
7) removed the package resulting installed (nrpe) -- from config file as webui was not helping out listing the installed packages
7) add cpsw0 on the LAGG, remove cpsw1. vlans goes up (i would bet because of network restart). reboot. Issue still persist.


Files

1000020721.jpg (320 KB) 1000020721.jpg Screenshot of failure point Steve N, 05/07/2024 05:13 PM
clipboard-202405081551-qgepl.png (73.1 KB) clipboard-202405081551-qgepl.png Steve N, 05/08/2024 10:51 PM

Related issues

Related to Bug #15452: Unexpected/Undefined behaviour of disabled interfacesNew

Actions
Has duplicate Bug #12926: Changing LAGG type on CARP interfaces makes VIPs go to an "init" StateDuplicate

Actions
Has duplicate Bug #13344: Vlan loses parent interface when changing LAGG mtu to jumbo framesDuplicate

Actions
Has duplicate Bug #14603: LAGG VLAN Interfaces report parent no longer existsDuplicate

Actions
Has duplicate Bug #14083: Adding MSS and MTU values on a LAGG VLAN interface breaks connectivityResolvedMarcos M

Actions
Has duplicate Bug #13473: No IPv6 address acquired after reboot/dhcp6c not startingDuplicate

Actions
Actions #1

Updated by Jim Pingle almost 5 years ago

  • Category changed from Interfaces to LAGG Interfaces
Actions #2

Updated by Marcos M 5 months ago

  • Has duplicate Bug #12926: Changing LAGG type on CARP interfaces makes VIPs go to an "init" State added
Actions #3

Updated by Marcos M 5 months ago

  • Has duplicate Bug #13344: Vlan loses parent interface when changing LAGG mtu to jumbo frames added
Actions #4

Updated by Marcos M 5 months ago

  • Has duplicate Bug #14603: LAGG VLAN Interfaces report parent no longer exists added
Actions #5

Updated by Marcos M 5 months ago

  • Has duplicate Bug #14083: Adding MSS and MTU values on a LAGG VLAN interface breaks connectivity added
Actions #6

Updated by Marcos M 5 months ago

  • Has duplicate Bug #13473: No IPv6 address acquired after reboot/dhcp6c not starting added
Actions #7

Updated by Marcos M 5 months ago

  • Subject changed from VLAN Interfaces on LAGG get orphaned at boot to Reconfiguring the parent LAGG interface does not handle its child VLANs
  • Status changed from New to In Progress
  • Assignee set to Marcos M
  • Target version set to 2.8.0
  • % Done changed from 0 to 50
  • Plus Target Version set to 24.03
  • Release Notes set to Default
Actions #8

Updated by Marcos M 5 months ago

There have been various bug reports related to this issue which seem to share the same root cause - a fix is in progress.

Actions #9

Updated by Marcos M 5 months ago

  • Status changed from In Progress to Feedback
  • % Done changed from 50 to 100
Actions #11

Updated by Marcos M 5 months ago

  • Subject changed from Reconfiguring the parent LAGG interface does not handle its child VLANs to Reconfiguring a parent LAGG interface breaks its VLANs

Mike Moore wrote in #note-10:

Could the fix resolve https://redmine.pfsense.org/issues/14659 or https://redmine.pfsense.org/issues/14483

Nope. Those are essentially the same issue: interfaces are reconfigured rather than "updated" when a change is made.

Actions #12

Updated by Jordan G about 1 month ago

I'm still seeing these connectivity issues following manipulating anything about the parent LAGG interface on 24.03. Complete loss of network connectivity but my switch still shows the port as up and connected.

Actions #13

Updated by Jim Pingle about 1 month ago

  • Status changed from Feedback to Resolved
Actions #14

Updated by Jordan G about 1 month ago

  • Status changed from Resolved to Confirmed

changing anything regarding the parent interface stops all communication

lagg0.4091: flags=8803<UP,BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500                                  
        description: LAN                                                                                  
        options=4000000<MEXTPG>                                                                           
        ether                                                                           
        inet 192.168.71.1 netmask 0xffffff00 broadcast 192.168.71.255                                     
        inet6 fe80::208:a2ff:fe10:1176%lagg0.4091 prefixlen 64 tentative scopeid 0x15                     
        groups: vlan                                                                                      
        vlan: 0 vlanproto: 0x0000 vlanpcp: 0 parent interface: <none>                                     
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

testing with 7100 on 24.03 release

Actions #15

Updated by Jim Pingle 29 days ago

  • Plus Target Version changed from 24.03 to 24.07
Actions #16

Updated by Steve Wheeler 29 days ago

I can't replicate that in 24.03. Setting the lagg0 interface MTU (after assigning it) in a 7100 results in a ~30s outage while the lagg re-establishes. But after that it starts pasing traffic again without intervention.

Actions #17

Updated by Marcos M 29 days ago

  • Status changed from Confirmed to Resolved

To reproduce the issue, the parent interface (lagg0) needs to be added to the configuration as disabled. When an interface is configured as disabled in the GUI, the interface is not added to the system, hence the child interfaces (e.g. lagg0.4091) have no parent and do not work. I believe this is the expected behavior; any inconsistencies that lead to lagg0 existing while disabled in the GUI should be detailed on a separate redmine.

Actions #18

Updated by Marcos M 29 days ago

  • Plus Target Version changed from 24.07 to 24.03
Actions #19

Updated by Steve N 21 days ago

Steve Wheeler wrote in #note-16:

I can't replicate that in 24.03. Setting the lagg0 interface MTU (after assigning it) in a 7100 results in a ~30s outage while the lagg re-establishes. But after that it starts passing traffic again without intervention.

Reboot the device. In my case, this is a surefire way to "break" it.

In fact, I recently updated to 24.03 after seeing this status as "Resolved", and it's worse on this version. My unit gets stuck in a loop at "Configuring VLAN interfaces" and never finishes coming up. I had to boot into single user mode and manually restore a configuration without the MTU/MSS configuration in order to get it to boot. Previously (23.09) it would boot but act dead, no traffic on LAN/WAN (lagg0.4091/lagg0.4090) but would at least boot into the normal serial console menu so I could easily reset config from there.

Steps to reproduce:

configure MTU=1428 and MSS=1388 on WAN (lagg0.4090) interface. Reboot.

Actions #20

Updated by Marcos M 20 days ago

@Steve N
Do you have the parent lagg interface assigned and disabled? See:
https://redmine.pfsense.org/issues/15452

Actions #21

Updated by Marcos M 20 days ago

  • Related to Bug #15452: Unexpected/Undefined behaviour of disabled interfaces added
Actions #22

Updated by Steve N 20 days ago

I don't even know how I would assign and disable the interface, my bug was actually https://redmine.pfsense.org/issues/14083 but it was marked as a duplicate of this one so I responded here. The LAGG0 interface is not assigned to anything !

! in the Assignments section of the web UI, if that answers the question.

Actions #23

Updated by Marcos M 20 days ago

Presumably you're running into this issue on a 7100; I've reopened that one for additional feedback. It would be helpful to get some additional info from your system - feel free to upload a status report here (get the report by going to <pfsenseip>/status.php).

Actions #24

Updated by Steve N 20 days ago

Correct, 7100. I have uploaded the status report as well.

Actions

Also available in: Atom PDF