Project

General

Profile

Bug #7119

Changing LAGG attributes results in a panic/crash

Added by Jim Pingle 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Category:
Interfaces
Target version:
Start date:
01/13/2017
Due date:
% Done:

100%

Affected version:
2.4
Affected Architecture:
amd64

Description

On 2.4, when changing attributes of an assigned LAGG such as the mode or membership, the firewall panics and reboots.

Tested on an 8860 and 4860, so it may be specific to igb. In this case, the lagg instance contained igb4,igb5 in LACP mode, and I attempted to change the mode to Failover. bjaffe encountered the same crash when changing member interfaces.

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address    = 0x0
fault code        = supervisor read data, page not present
instruction pointer    = 0x20:0xffffffff80e190c0
stack pointer            = 0x28:0xfffffe022c32fa30
frame pointer            = 0x28:0xfffffe022c32fa50
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 12 (swi6: task queue)
db:0:kdb.enter.default>  show pcpu
cpuid        = 2
dynamic pcpu = 0xfffffe02a9c86f00
curthread    = 0xfffff80006250500: pid 12 "swi6: task queue" 
curpcb       = 0xfffffe022c32fcc0
fpcurthread  = none
idlethread   = 0xfffff80006233500: tid 100005 "idle: cpu2" 
curpmap      = 0xffffffff829e5600
tssp         = 0xffffffff82a1dee0
commontssp   = 0xffffffff82a1dee0
rsp0         = 0xfffffe022c32fcc0
gs32p        = 0xffffffff82a24738
ldt          = 0xffffffff82a24778
tss          = 0xffffffff82a24768
db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100023 td 0xfffff80006250500
arp_iflladdr() at arp_iflladdr+0x10/frame 0xfffffe022c32fa50
lagg_port_setlladdr() at lagg_port_setlladdr+0x14e/frame 0xfffffe022c32faa0
taskqueue_run_locked() at taskqueue_run_locked+0x14a/frame 0xfffffe022c32fb00
taskqueue_run() at taskqueue_run+0xbf/frame 0xfffffe022c32fb20
intr_event_execute_handlers() at intr_event_execute_handlers+0x20f/frame 0xfffffe022c32fb60
ithread_loop() at ithread_loop+0xc6/frame 0xfffffe022c32fbb0
fork_exit() at fork_exit+0x85/frame 0xfffffe022c32fbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe022c32fbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

History

#1 Updated by Rolf Sommerhalder 5 months ago

Jim Pingle wrote:

On 2.4, when changing attributes of an assigned LAGG such as the mode or membership, the firewall panics and reboots.

Tested on an 8860 and 4860, so it may be specific to igb. In this case, the lagg instance contained igb4,igb5 in LACP mode, and I attempted to change the mode to Failover. bjaffe encountered the same crash when changing member interfaces.

With 2.4 amd64 Snapshot on Supermicro SuperServers 5018D-FN8T with X10SDV-TP8F motherboards, for example changing an IP address of a VLAN on LAGG interfaces igb1,igb2,igb3 that uses LACP also panics, and the kernel hangs subsequently.

It requires a manual Reset or Power Cycle, using BMC/IPMI from remote. Fortunately it will restart, and the changes will then take effect.

For such situations, getting the Watch Dog to work would be helpful, which is available in the BIOS...

#2 Updated by Renato Botelho 5 months ago

  • Status changed from New to Feedback
  • Assignee set to Renato Botelho
  • % Done changed from 0 to 100

#3 Updated by Jim Pingle 5 months ago

  • Status changed from Feedback to Confirmed

Still crashes on the latest factory snapshot: Wed Jan 18 19:49:46 CST 2017

#4 Updated by Renato Botelho 5 months ago

I couldn't reproduce it on a VM using em driver, probably something specific to igb as mentioned

#5 Updated by Rolf Sommerhalder 5 months ago

Snapshots from this morning still crash with igb hardware NICs.

#6 Updated by Rolf Sommerhalder 5 months ago

To be more precise: pfSense does not exactly "crash", as it is still ping-able. And SSH shells that were open from before the "crash" remain connected, while still being able to type commands, but do not return answers.

Only reset or power-cycle gets it out of this state (did not managed to get Watch Dog working yet).
Thereafter, the changes made to LAGG right before the "crash" take effect.

#7 Updated by Jim Pingle 5 months ago

Here, it still panics + dumps + reboots same as it did originally.

#8 Updated by Renato Botelho 5 months ago

  • Assignee changed from Renato Botelho to Luiz Otavio O Souza

#10 Updated by Jim Pingle 5 months ago

Seems better now, it doesn't crash. Logs of activity in the log, though:

Jan 27 19:47:40 master snmpd[47102]: SIOCGIFDESCR (lagg0): Device not configured
Jan 27 19:47:40 master kernel: igb4: lagg_port_destroy: lp_ifflags unclean
Jan 27 19:47:40 master kernel: igb5: lagg_port_destroy: lp_ifflags unclean
Jan 27 19:47:40 master kernel: lagg0: promiscuous mode disabled
Jan 27 19:47:40 master check_reload_status: Linkup starting lagg0
Jan 27 19:47:40 master kernel: lagg0: link state changed to DOWN
Jan 27 19:47:40 master check_reload_status: Syncing firewall
Jan 27 19:47:40 master php-fpm[43135]: /interfaces_lagg_edit.php: Beginning https://portal.pfsense.org configuration backup.
Jan 27 19:47:41 master check_reload_status: Reloading filter
Jan 27 19:47:43 master php-fpm[43135]: /interfaces_lagg_edit.php: End of portal.pfsense.org configuration backup (success).
Jan 27 19:47:43 master snmpd[47102]: SIOCGIFDESCR (lagg0_vlan10): Device not configured
Jan 27 19:47:43 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:43 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:43 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:43 master kernel: carp: demoted by -240 to 240 (vhid removed)
Jan 27 19:47:43 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:43 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:43 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:43 master kernel: carp: demoted by -240 to 0 (vhid removed)
Jan 27 19:47:43 master kernel: lagg0_vlan10: promiscuous mode disabled
Jan 27 19:47:43 master kernel: vlan0: changing name to 'lagg0_vlan10'
Jan 27 19:47:43 master snmpd[47102]: SIOCGIFDESCR (lagg0_vlan10): Device not configured
Jan 27 19:47:43 master snmpd[47102]: SIOCGIFDESCR (vlan0): Device not configured
Jan 27 19:47:43 master kernel: lagg0: promiscuous mode enabled
Jan 27 19:47:43 master kernel: lagg0_vlan10: promiscuous mode enabled
Jan 27 19:47:43 master check_reload_status: Restarting ipsec tunnels
Jan 27 19:47:43 master kernel: carp: demoted by 240 to 240 (interface down)
Jan 27 19:47:43 master kernel: carp: demoted by 240 to 480 (interface down)
Jan 27 19:47:45 master check_reload_status: updating dyndns opt2
Jan 27 19:47:45 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:45 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:45 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:45 master kernel: carp: demoted by -240 to 240 (vhid removed)
Jan 27 19:47:45 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:45 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:45 master kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0_vlan10: 3
Jan 27 19:47:45 master kernel: carp: demoted by -240 to 0 (vhid removed)
Jan 27 19:47:45 master kernel: lagg0: promiscuous mode disabled
Jan 27 19:47:45 master kernel: lagg0_vlan10: promiscuous mode disabled
Jan 27 19:47:46 master snmpd[47102]: SIOCGIFDESCR (lagg0_vlan20): Device not configured
Jan 27 19:47:46 master kernel: lagg0: promiscuous mode enabled
Jan 27 19:47:46 master kernel: lagg0_vlan10: promiscuous mode enabled
Jan 27 19:47:46 master kernel: carp: demoted by 240 to 240 (interface down)
Jan 27 19:47:46 master kernel: carp: demoted by 240 to 480 (interface down)
Jan 27 19:47:46 master kernel: vlan1: changing name to 'lagg0_vlan20'
Jan 27 19:47:46 master snmpd[47102]: SIOCGIFDESCR (vlan1): Device not configured
Jan 27 19:47:59 master php-fpm[94047]: /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
Jan 27 19:47:59 master check_reload_status: Reloading filter

If that is normal/expected then we can close this.

#11 Updated by Luiz Otavio O Souza 5 months ago

Yes, the messages does not seem related with the original bug (crash at ifconfig laggX destroy).

Let's open a new ticket to track these warnings.

#12 Updated by Luiz Otavio O Souza 5 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF