VT race condition panic at boot on ESXi 6.5.0U1 and FreeBSD 11.1 base
Some users occasionally encounter a panic during OS hardware detection on 2.4 running under ESXi 6.5.0 U1 (Build 6765664) -- before handoff to our code -- in vga_bitblt_text(). Because it is before the handoff to our code, DDB is not yet configured so the VM drops to a
db> prompt and waits for input. The crash is unusual in that it does not happen to every VM at every boot. It is random and only affects a small number of reboot attempts. The crash happens before disks are mounted so filesystem corruption is not a concern.
This appears to be a confirmed FreeBSD issue:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217282 (has a patch availble, and it's in -CURRENT)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220923 (appears to be a duplicate of 217282)
From reading those bug reports, it appears to be a race condition in the VT code.
A possible workaround is to set
debug.debugger_on_panic=0 in /boot/loader.conf.local and then configure a tunable in the pfSense GUI to set debug.debugger_on_panic=1 so that unrelated crash dumps can be collected afterward. That will not stop the panic, but it will allow the VM to reboot itself until it succeeds.
If the crash is in VT, another possible solution would be to switch affected VMs to the sc console by setting
So far over 12 VMs on 2.4.x and FreeBSD 11 I have only managed to make it happen once on my lab ESXi host. See the attached image for the backtrace.
#3 Updated by Jim Pingle about 2 years ago
For anyone experiencing this crash in the meantime, adding
/boot/loader.conf.local is confirmed to work around the issue. This can also be added to
/boot/loader.conf.local before upgrade if someone is worried they may encounter this race condition.
Once a patched version is available in a release, that change will no longer be necessary.
#12 Updated by Jim Pingle about 2 years ago
- Status changed from Assigned to Resolved
I ran some more tests:
kern.vty=sc ADDED to /boot/loader.conf.local: 72 reboots (6 VMs, 12 reboots each), no crashes
kern.vty=sc REMOVED from /boot/loader.conf.local: 72 reboots (6 VMs, 12 reboots each), no crashes
So that's 144 crash-free reboots total on 2.4.1, and half of those should have met the conditions to trigger the VT race if it was still a problem.
I was hoping to reproduce it again to see if I was related, but now I'm not seeing it either way.
If we can manage to reproduce the conditions for that swi/clock crash we can open a new ticket for it.