Bug #7925: VT race condition panic at boot on ESXi 6.5.0U1 and FreeBSD 11.1 base - pfSense - pfSense bugtracker

Actions

Copy link

Bug #7925

closed

VT race condition panic at boot on ESXi 6.5.0U1 and FreeBSD 11.1 base

Added by Jim Pingle over 7 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Luiz Souza

Category:

Operating System

Target version:

2.4.1

Start date:

10/11/2017

Due date:

% Done:

100%

Estimated time:

Plus Target Version:

Release Notes:

Affected Version:

2.4.x

Affected Architecture:

amd64

Description

Some users occasionally encounter a panic during OS hardware detection on 2.4 running under ESXi 6.5.0 U1 (Build 6765664) -- before handoff to our code -- in vga_bitblt_text(). Because it is before the handoff to our code, DDB is not yet configured so the VM drops to a db> prompt and waits for input. The crash is unusual in that it does not happen to every VM at every boot. It is random and only affects a small number of reboot attempts. The crash happens before disks are mounted so filesystem corruption is not a concern.

This appears to be a confirmed FreeBSD issue:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217282 (has a patch availble, and it's in -CURRENT)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220923 (appears to be a duplicate of 217282)

From reading those bug reports, it appears to be a race condition in the VT code.

A possible workaround is to set debug.debugger_on_panic=0 in /boot/loader.conf.local and then configure a tunable in the pfSense GUI to set debug.debugger_on_panic=1 so that unrelated crash dumps can be collected afterward. That will not stop the panic, but it will allow the VM to reboot itself until it succeeds.

If the crash is in VT, another possible solution would be to switch affected VMs to the sc console by setting kern.vty=sc in /boot/loader.conf.local

So far over 12 VMs on 2.4.x and FreeBSD 11 I have only managed to make it happen once on my lab ESXi host. See the attached image for the backtrace.

Files

Download all files

Selection_704.png (15.6 KB) Selection_704.png		Jim Pingle, 10/11/2017 09:21 AM
pfsense_panic.png (332 KB) pfsense_panic.png		Gianluca Toso, 10/12/2017 05:47 PM
vm_bug.png (86.9 KB) vm_bug.png		Constantine Kormashev, 10/18/2017 03:37 AM
vm_bug2.png (122 KB) vm_bug2.png		Constantine Kormashev, 10/18/2017 03:37 AM
Selection_709.png (14.6 KB) Selection_709.png		Jim Pingle, 10/18/2017 09:23 AM

Actions

Copy link

Updated by Jim Pingle over 7 years ago

Description updated (diff)

Actions

Copy link

Updated by Luiz Souza over 7 years ago

Status changed from Confirmed to Feedback
% Done changed from 0 to 100

The fix is already merge and will be available on next snapshot.

Actions

Copy link

Updated by Jim Pingle over 7 years ago

For anyone experiencing this crash in the meantime, adding kern.vty=sc to /boot/loader.conf.local is confirmed to work around the issue. This can also be added to /boot/loader.conf.local before upgrade if someone is worried they may encounter this race condition.

Once a patched version is available in a release, that change will no longer be necessary.

Actions

Copy link

Updated by Gianluca Toso over 7 years ago

File pfsense_panic.png pfsense_panic.png added

For information, the same problem occurs in Workstation 12.5.7 (build 5813279), vm hardware version 11.
It happened to me 3 consecutive times and then no more despite several restarts.

Actions

Copy link

Updated by Jim Pingle over 7 years ago

For reference, at least one person appears to have encountered it on ESX 5.5 as well, though the majority of users are only seeing it on 6.5.0 U1.

Actions

Copy link

Updated by Jim Pingle over 7 years ago

I can't reproduce this on 2.4.1 snapshots but it was so random before that doesn't give me much confidence.

Anyone else experiencing the issue can try upgrading to a 2.4.1 snapshot to see if it still crashes.

Actions

Copy link

Updated by Constantine Kormashev over 7 years ago

File vm_bug.png vm_bug.png added
File vm_bug2.png vm_bug2.png added

Tried on 2 different esxi hosts latest 2.4.1 ova rebooted 20 times each VM. Once got error for 2nd VM.

Actions

Copy link

Updated by Jim Pingle over 7 years ago

File Selection_709.png Selection_709.png added
Status changed from Feedback to Assigned

Ditto, I see a similar crash. I had to reboot 5 VMs a few times before one of them failed.

Actions

Copy link

Updated by Luiz Souza over 7 years ago

The recent crashes seems unrelated to the original crash in VT.

They actually seem to happen quite late in the kernel boot to be related to a VT crash.

We should open a new issue to track this new crash.

Actions

Copy link

#10

Updated by Luiz Souza over 7 years ago

Ok, I see now the two different crashes on the OP post.

While I take back part of what I said before, It still doesn't look related to the VT.

Actions

Copy link

#11

Updated by Jim Pingle over 7 years ago

To rule that out we should setup the kern.vty=sc workaround and continue testing for a bit to see if it still crashes. If it does, then it must be something new.

Actions

Copy link

#12

Updated by Jim Pingle over 7 years ago

Status changed from Assigned to Resolved

I ran some more tests:

kern.vty=sc ADDED to /boot/loader.conf.local: 72 reboots (6 VMs, 12 reboots each), no crashes
kern.vty=sc REMOVED from /boot/loader.conf.local: 72 reboots (6 VMs, 12 reboots each), no crashes

So that's 144 crash-free reboots total on 2.4.1, and half of those should have met the conditions to trigger the VT race if it was still a problem.

I was hoping to reproduce it again to see if I was related, but now I'm not seeing it either way.

If we can manage to reproduce the conditions for that swi/clock crash we can open a new ticket for it.

Actions

Copy link

#13

Updated by Nicolas Liaudat over 7 years ago

Jim Pingle wrote:

For reference, at least one person appears to have encountered it on ESX 5.5 as well, though the majority of users are only seeing it on 6.5.0 U1.

Problem confirmed on esxi 6.0

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

pfSense

Custom queries

Bug #7925

VT race condition panic at boot on ESXi 6.5.0U1 and FreeBSD 11.1 base

Updated by Jim Pingle over 7 years ago

Updated by Luiz Souza over 7 years ago

Updated by Jim Pingle over 7 years ago

Updated by Gianluca Toso over 7 years ago

Updated by Jim Pingle over 7 years ago

Updated by Jim Pingle over 7 years ago

Updated by Constantine Kormashev over 7 years ago

Updated by Jim Pingle over 7 years ago

Updated by Luiz Souza over 7 years ago

Updated by Luiz Souza over 7 years ago

Updated by Jim Pingle over 7 years ago

Updated by Jim Pingle over 7 years ago

Updated by Nicolas Liaudat over 7 years ago