Project

General

Profile

Bug #4803

config.xml is empty if power loss or panic happens shortly after config write

Added by dem co over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Category:
Operating System
Target version:
Start date:
06/30/2015
Due date:
% Done:

100%

Estimated time:
Affected Version:
All
Affected Architecture:

Description

When running ver 2.2.3 nanobsd with filesystem kept permanently read-write enabled (due to 3 minutes+ waiting time when running conf_mount_ro() on CF card).

Config.xml can get corrupted when power is lost after saving configuration changes.

Situation can be reproduced on ESXi VM.
1. Convert nanobsd image to vmdk image and create VM using the image. No installation.
2. From factory default configuration, enable permanent read-write. reboot using pfsense command.
3. Make changes like adding FW rule or create a captive portal.
4. Click save, after iostat show disk activities finished. Power reset using VM command.
5. Reboot will show
- /conf/config.xml:1: parser error: Document is empty.
- config.xml get restored from backup, but with last changes lost.
- warnings about missing timezone settings.
- warnings about wrong function parameters in config.lib.inc

Version 2.2.3 64bit nanobsd is affected. While version 2.2.2 is not affected by the file corruption or slow conf_mount_ro.

Associated revisions

Revision de7ae0bb (diff)
Added by Renato Botelho over 4 years ago

Use right function pfSense_fsync to make sure config file is safe on disk, ticket #4803

Revision d0577bd2 (diff)
Added by Renato Botelho over 4 years ago

Use right function pfSense_fsync to make sure config file is safe on disk, ticket #4803

Revision b318432e (diff)
Added by Renato Botelho over 4 years ago

Make sure temporary config file is safe on disk before rename, ticket #4803

Revision a83602e8 (diff)
Added by Renato Botelho over 4 years ago

Make sure temporary config file is safe on disk before rename, ticket #4803

Revision 38b35612 (diff)
Added by Renato Botelho over 4 years ago

Make sure config.xml is safe on disk when restoring a backup, ticket #4803

Revision 7c771d19 (diff)
Added by Renato Botelho over 4 years ago

Make sure config.xml is safe on disk when restoring a backup, ticket #4803

History

#1 Updated by Kill Bill over 4 years ago

dem co wrote:

3 minutes+ waiting time when running conf_mount_ro() on CF card).

That's due to removal of this patch - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=176169 (Bug #2401). And yeah, this is a complete performance disaster on CF-based systems. Might as well reopen #2401.

#2 Updated by Jim Pingle over 4 years ago

  • Subject changed from Unsafe power off conrrupt config.xml to config.xml is empty if power loss or panic happens shortly after config write
  • Status changed from New to Confirmed
  • Target version set to 2.2.4

This does not appear to be specific to NanoBSD or even sync on the filesystem.

I can replicate this by causing a panic just after config save on a full install with sync on the filesystem. If a minute or two has passed before the power loss or crash, the config is fine, so it does appear to be somewhat related to filesystem data not being fully flushed to disk at at the time.

Checking the timestamp on the 0-byte config.xml it is from before the reboot.

#3 Updated by Chris Buechler over 4 years ago

  • Assignee set to Renato Botelho
  • Affected Version changed from 2.2.3 to All

#4 Updated by Chris Buechler over 4 years ago

#4814 opened re: the regression of #2401 for the slow ro->rw mount issue discussed here.

#5 Updated by Jim Thompson over 4 years ago

This needs similar work (and a PHP extension, because fsync() isn't possible via PHP) to what fixed the corruption of /etc/master.passwd and /etc/group.

#6 Updated by Kill Bill over 4 years ago

Jim Thompson wrote:

This needs similar work (and a PHP extension, because fsync() isn't possible via PHP) to what fixed the corruption of /etc/master.passwd and /etc/group.

Well, fsync()/fdatasync() is definitely possible with devel/pecl-eio (http://docs.php.net/manual/en/function.eio-fsync.php, http://docs.php.net/manual/en/function.eio-fdatasync.php)

#7 Updated by Renato Botelho over 4 years ago

  • Status changed from Confirmed to Feedback
  • % Done changed from 0 to 100

Please try next round of snapshots, a pfSense_fsync was implemented and is being used to make config.xml save operation safer.

#8 Updated by Chris Buechler over 4 years ago

this looks to be fixed. Up to 15 cycles with no issues in a circumstance that would fail at least 50% of the time before. Leaving the power cycle test rig running in a loop overnight.

#9 Updated by Chris Buechler over 4 years ago

The config.xml portion was fine with Renato's change, but missed other parts of /cf/conf/. Jim T's earlier change gets the entire directory. It's been through over 500 power cycles, with no issues. Just need to verify again with latest snapshot build, as the tested system was patched. That's running now.

#10 Updated by Chris Buechler over 4 years ago

  • Status changed from Feedback to Resolved

I'm confident in this, snapshots including all relevant changes have been through the config_write loop torture test, dropping power in the middle of writing the config repeatedly in a loop, upwards of 600 times with no ill effects.

Also available in: Atom PDF