Project

General

Profile

Actions

Bug #4803

closed

config.xml is empty if power loss or panic happens shortly after config write

Added by dem co almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
High
Category:
Operating System
Target version:
Start date:
06/30/2015
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
All
Affected Architecture:

Description

When running ver 2.2.3 nanobsd with filesystem kept permanently read-write enabled (due to 3 minutes+ waiting time when running conf_mount_ro() on CF card).

Config.xml can get corrupted when power is lost after saving configuration changes.

Situation can be reproduced on ESXi VM.
1. Convert nanobsd image to vmdk image and create VM using the image. No installation.
2. From factory default configuration, enable permanent read-write. reboot using pfsense command.
3. Make changes like adding FW rule or create a captive portal.
4. Click save, after iostat show disk activities finished. Power reset using VM command.
5. Reboot will show
- /conf/config.xml:1: parser error: Document is empty.
- config.xml get restored from backup, but with last changes lost.
- warnings about missing timezone settings.
- warnings about wrong function parameters in config.lib.inc

Version 2.2.3 64bit nanobsd is affected. While version 2.2.2 is not affected by the file corruption or slow conf_mount_ro.

Actions #1

Updated by Kill Bill almost 9 years ago

dem co wrote:

3 minutes+ waiting time when running conf_mount_ro() on CF card).

That's due to removal of this patch - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=176169 (Bug #2401). And yeah, this is a complete performance disaster on CF-based systems. Might as well reopen #2401.

Actions #2

Updated by Jim Pingle almost 9 years ago

  • Subject changed from Unsafe power off conrrupt config.xml to config.xml is empty if power loss or panic happens shortly after config write
  • Status changed from New to Confirmed
  • Target version set to 2.2.4

This does not appear to be specific to NanoBSD or even sync on the filesystem.

I can replicate this by causing a panic just after config save on a full install with sync on the filesystem. If a minute or two has passed before the power loss or crash, the config is fine, so it does appear to be somewhat related to filesystem data not being fully flushed to disk at at the time.

Checking the timestamp on the 0-byte config.xml it is from before the reboot.

Actions #3

Updated by Chris Buechler almost 9 years ago

  • Assignee set to Renato Botelho
  • Affected Version changed from 2.2.3 to All
Actions #4

Updated by Chris Buechler almost 9 years ago

#4814 opened re: the regression of #2401 for the slow ro->rw mount issue discussed here.

Actions #5

Updated by Jim Thompson almost 9 years ago

This needs similar work (and a PHP extension, because fsync() isn't possible via PHP) to what fixed the corruption of /etc/master.passwd and /etc/group.

Actions #6

Updated by Kill Bill almost 9 years ago

Jim Thompson wrote:

This needs similar work (and a PHP extension, because fsync() isn't possible via PHP) to what fixed the corruption of /etc/master.passwd and /etc/group.

Well, fsync()/fdatasync() is definitely possible with devel/pecl-eio (http://docs.php.net/manual/en/function.eio-fsync.php, http://docs.php.net/manual/en/function.eio-fdatasync.php)

Actions #7

Updated by Renato Botelho almost 9 years ago

  • Status changed from Confirmed to Feedback
  • % Done changed from 0 to 100

Please try next round of snapshots, a pfSense_fsync was implemented and is being used to make config.xml save operation safer.

Actions #8

Updated by Chris Buechler almost 9 years ago

this looks to be fixed. Up to 15 cycles with no issues in a circumstance that would fail at least 50% of the time before. Leaving the power cycle test rig running in a loop overnight.

Actions #9

Updated by Chris Buechler almost 9 years ago

The config.xml portion was fine with Renato's change, but missed other parts of /cf/conf/. Jim T's earlier change gets the entire directory. It's been through over 500 power cycles, with no issues. Just need to verify again with latest snapshot build, as the tested system was patched. That's running now.

Actions #10

Updated by Chris Buechler almost 9 years ago

  • Status changed from Feedback to Resolved

I'm confident in this, snapshots including all relevant changes have been through the config_write loop torture test, dropping power in the middle of writing the config repeatedly in a loop, upwards of 600 times with no ill effects.

Actions

Also available in: Atom PDF