Project

General

Profile

Actions

Bug #4822

closed

nanobsd corruption issues after unclean shut down when rw mounted and SU

Added by Kill Bill almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Very High
Assignee:
-
Category:
Operating System
Target version:
Start date:
07/09/2015
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.2.3
Affected Architecture:
All

Description

https://forum.pfsense.org/index.php?topic=96326.0

Trying to mount root from ufs:/dev/ufs/pfsense0 [ro,sync,noatime]...
WARNING: / was not properly dismounted
Configuring crash dumps...
** SU+J Recovering /dev/ufs/pfsense0
** Reading 7565312 byte journal from inode 4.
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
/dev/ufs/pfsense0: Block missing from ino 8254 at lbn 18
/dWARNING: /cf was not properly dismounted
ev/ufs/pfsense0: UNEXPECTED SU+J INCONSISTENCY
/dev/ufs/pfsense0: INTERNAL ERROR: GOT TO reply()
/dev/ufs/pfsense0: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
Filesystems are clean, continuing...
Mounting filesystems...
mount: /dev/ufs/pfsense0: R/W mount of / denied. Filesystem is not clean - run fsck. Forced mount will invalidate journal contents: Operation not permitted
mount: /dev/ufs/cf: R/W mount of /cf denied. Filesystem is not clean - run fsck.: Operation not permitted
Setting up memory disks... done.
Disabling APM on /dev/ad0
pwd_mkdb: /etc//pwd.db.tmp: Read-only file system

     ___
 ___/ f \
/ p \___/ Sense
\___/   \
    \___/

Welcome to pfSense 2.2.4-DEVELOPMENT  ...

cp: /var/db/pbi/keys/pfSense-BETA.ssl: Read-only file system
Creating symlinks......ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout
done.
cap_mkdb: /etc/login.conf.db: Read-only file system
>>> Under 512 megabytes of ram detected.  Not enabling APC.

There is freezes forever, until you CTRL+C out of it and force the fsck to run manually.

^CJul  9 11:04:50 init: /bin/sh on /etc/rc terminated abnormally, going to single user mode
# /sbin/fsck -p -y /dev/ufs/pfsense0
** SU+J Recovering /dev/ufs/pfsense0
** Reading 7565312 byte journal from inode 4.
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
/dev/ufs/pfsense0: Block missing from ino 8254 at lbn 18
/dev/ufs/pfsense0: UNEXPECTED SU+J INCONSISTENCY
/dev/ufs/pfsense0: INTERNAL ERROR: GOT TO reply()
/dev/ufs/pfsense0: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
# /sbin/fsck -y /dev/ufs/pfsense0
** /dev/ufs/pfsense0

USE JOURNAL? yes

** SU+J Recovering /dev/ufs/pfsense0
** Reading 7565312 byte journal from inode 4.

RECOVER? yes

** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
Block missing from ino 8254 at lbn 18
UNEXPECTED SU+J INCONSISTENCY

FALLBACK TO FULL FSCK? yes

** Skipping journal, falling through to full fsck

** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
11510 files, 732309 used, 1127049 free (2049 frags, 140625 blocks, 0.1% fragmentation)

***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****
# /sbin/reboot

Wonderful. :-(

Actions #1

Updated by Kill Bill almost 9 years ago

http://lists.freebsd.org/pipermail/freebsd-fs/2014-April/019253.html

Can we get rid of the journal "improvement" ASAP?

Actions #2

Updated by Phillip Davis almost 9 years ago

Here's one from my Alix at home that happened a while ago, but I thought it might have had a flakey CF card and I did not see the problem again. There had been a power failure (yes, we have them in Nepal, and my home UPS is not perfect at always transitioning the weird voltages and attempts at a sine wave that our mains power gives). Power feed from the UPS has been lost for some fraction of a second. The Alix powered up again. I waited... and no internet. I attached a cable to the console and press enter, the "#" prompt appeared and I dutifully typed "exit" here is the rest of the story:

@# exit
Configuring crash dumps...
  • SU+J Recovering /dev/ufs/pfsense1
  • Reading 7565312 byte journal from inode 4.
  • Building recovery table.
  • Resolving unreferenced inode list.
  • Processing journal entries.
    /dev/ufs/pfsense1: Block missing from ino 40918 at lbn 14
    /dev/ufs/pfsense1: UNEXPECTED SU+J INCONSISTENCY
    /dev/ufs/pfsense1: INTERNAL ERROR: GOT TO reply()
    /dev/ufs/pfsense1: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
    Filesystems are clean, continuing...
    Mounting filesystems...
    mount: /dev/ufs/pfsense1: R/W mount of / denied. Filesystem is not clean - run fsck. Forced mount will invalidate journal contents: Operation not permitted
    mount: /dev/ufs/cf: R/W mount of /cf denied. Filesystem is not clean - run fsck.: Operation not permitted
    Setting up memory disks... done.
    Disabling APM on /dev/ad0
    (pass0:ata0:0:0:0): SETFEATURES. ACB: ef 85 00 00 00 40 00 00 00 00 00 00
    (pass0:ata0:0:0:0): CAM status: ATA Status Error
    (pass0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
    (pass0:ata0:0:0:0): RES: 51 04 00 00 00 00 00 00 00 00 00
    Failed to configure APM: No such file or directory
    pwd_mkdb: /etc//pwd.db.tmp: Read-only file system

_
/ f \
/ p \
__/ Sense
\___/ \
\___/

Welcome to pfSense 2.2.3-DEVELOPMENT ...

cp: /var/db/pbi/keys/pfSense-BETA.ssl: Read-only file system
Creating symlinks......ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout
done.
cap_mkdb: /etc/login.conf.db: Read-only file system

Under 512 megabytes of ram detected. Not enabling APC.

External config loader 1.0 is now starting...
Launching the init system...rm: /cf/conf/backup/backup.cache: Read-only file system
done.
Initializing...................... done.
Disk is dirty. Running fsck -y
Starting device manager (devd)...done.
Loading configuration......done.
Updating configuration...done.
Cleaning backup cache...done.
Setting up extended sysctls...done.
/usr/local/pkg/pfblockerng/pfblockerng.sh: not found
Setting timezone...done.
Configuring loopback interface...done.
Starting syslog...done.
Starting Secure Shell Services...done.
Setting up polling defaults...done.
Setting up interfaces microcode...done.
Configuring loopback interface...done.
Creating wireless clone interfaces...done.
Configuring LAGG interfaces...done.
Configuring VLAN interfaces...done.
Configuring QinQ interfaces...done.
Configuring WAN_NCELL interface...done.
Configuring LAN interface...done.
Configuring OPT1_LUMBINI interface...done.
Configuring OPT2WIFI interface...done.
Configuring OPT3VLAN20 interface...done.
Configuring OPT4VLAN30 interface...done.
Configuring OPT5VLAN40 interface...done.
Configuring OPT6VLAN50 interface...done.
Configuring OPT7VLAN60 interface...done.
Configuring CARP settings...done.
Syncing OpenVPN settings...route: writing to routing socket: No such process
done.
Configuring firewall..DUMMYNET 0 with IPv6 initialized (100409)
load_dn_sched dn_sched FIFO loaded
load_dn_sched dn_sched QFQ loaded
load_dn_sched dn_sched RR loaded
load_dn_sched dn_sched WF2Q+ loaded
load_dn_sched dn_sched PRIO loaded
....done.
Starting PFLOG...done.
Setting up gateway monitors...done.
Synchronizing user settings...done.
Starting webConfigurator...done.
Configuring CRON...done.
Starting DNS Resolver...
@

From memory, I had to press "enter" twice after the "Under 512 megabytes" message.

I have a system in Jumla (remote district - look it up) that got "stuck" last week. They could ping to pfSense LAN but no DHCP, no internet, no webGUI response, no SSH response... We sent them up another Alix with their config on it all ready to go. They put that in, powered it up, and have internet... again. I am waiting to receive the old Alix back so I can see what it is doing - I really expect it to be sitting in the boot somewhere complaining about some fsck + journal thing. I will post that when I know (might be a few days depending on monsoon rains and when flights actually go).

Actions #3

Updated by Chris Buechler almost 9 years ago

  • Subject changed from fsck broken, rendering box read-only, unable to boot and inaccessible via network to nanobsd corruption issues after unclean shut down when rw mounted and SU
  • Status changed from New to Feedback
  • Affected Version changed from 2.2.x to 2.2.3

updated subject to actual issue. SU+J was reverted in nanobsd today after verifying an APU made it through hundreds of power cycles when left rw in that circumstance with no ill effects. At 463 power cycles with that change at the moment and it's fine. Leaving for confirmation on latest snapshots.

Actions #4

Updated by Kill Bill almost 9 years ago

SU+J gone -> sanity restored. Good riddance.

Actions #5

Updated by Phillip Davis almost 9 years ago

I got the Alix back from Jumla. The replacement came up first time - thank goodness for AutoConfigBackup and being able to easily grab the latest version of the config to install on spare...
The old one has the trouble with the password database:
Immelasak Immelasak pfSense
2 pfSense

F6 PXE
Boot: 1
/boot/config: -S9600 -hConsoles: serial port
BIOS drive C: is disk0
BIOS 640kB/261120kB available memory

FreeBSD/x86 bootstrap loader, Revision 1.1
(root@pfs22-i386-builder, Tue Jun 23 16:57:02 CDT 2015)
Loading /boot/defaults/loader.conf
/boot/kernel/kernel text=0x12183a7 data=0x833648+0x279f60 syms=[0x4+0xf5430+0x4+0x16e5c1]

Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [/boot/kernel/kernel]...
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.1-RELEASE-p13 #0 c77d1b2(releng/10.1)-dirty: Tue Jun 23 17:06:06 CDT 2015
root@pfs22-i386-builder:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_wrap.10.i386 i386
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
VT: running with driver "vga".
CPU: Geode(TM) Integrated Processor by AMD PCS (498.06-MHz 586-class CPU)
Origin = "AuthenticAMD" Id = 0x5a2 Family = 0x5 Model = 0xa Stepping = 2
Features=0x88a93d<FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CLFLUSH,MMX>
AMD Features=0xc0400000<MMX+,3DNow!+,3DNow!>
real memory = 268435456 (256 MB)
avail memory = 226222080 (215 MB)
pnpbios: Bad PnP BIOS data checksum
random device not loaded; using insecure entropy
ipw_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw/.
ipw_bss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf.
module_register_init: MOD_LOAD (ipw_bss_fw, 0xc08122b0, 0) error 1
ipw_ibss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw/.
ipw_ibss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf.
module_register_init: MOD_LOAD (ipw_ibss_fw, 0xc0812360, 0) error 1
ipw_monitor: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw/.
ipw_monitor: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf.
module_register_init: MOD_LOAD (ipw_monitor_fw, 0xc0812410, 0) error 1
iwi_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi/.
iwi_bss: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf.
module_register_init: MOD_LOAD (iwi_bss_fw, 0xc083aa40, 0) error 1
iwi_ibss: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi/.
iwi_ibss: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf.
module_register_init: MOD_LOAD (iwi_ibss_fw, 0xc083aaf0, 0) error 1
iwi_monitor: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi/.
iwi_monitor: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf.
module_register_init: MOD_LOAD (iwi_monitor_fw, 0xc083aba0, 0) error 1
wlan: mac acl policy registered
kbd0 at kbdmux0
K6-family MTRR support enabled (2 registers)
random: <Software, Yarrow> initialized
ACPI BIOS Error (bug): A valid RSDP was not found (20130823/tbxfroot-223)
ACPI: Table initialisation failed: AE_NOT_FOUND
ACPI: Try disabling either ACPI or apic support.
cryptosoft0: <software crypto> on motherboard
padlock0: No ACE support.
pcib0 pcibus 0 on motherboard
pci0: <PCI bus> on pcib0
Geode LX: PC Engines ALIX.2 v0.99h tinyBIOS V1.4a (C)1997-2007
pci0: <encrypt/decrypt, entertainment crypto> at device 1.2 (no driver attached)
vr0: <VIA VT6105M Rhine III 10/100BaseTX> port 0x1000-0x10ff mem 0xe0000000-0xe00000ff irq 10 at device 9.0 on pci0
vr0: Quirks: 0x2
vr0: Revision: 0x96
miibus0: <MII bus> on vr0
ukphy0: <Generic IEEE 802.3u media interface> PHY 1 on miibus0
ukphy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
vr1: <VIA VT6105M Rhine III 10/100BaseTX> port 0x1400-0x14ff mem 0xe0040000-0xe00400ff irq 11 at device 10.0 on pci0
vr1: Quirks: 0x2
vr1: Revision: 0x96
miibus1: <MII bus> on vr1
ukphy1: <Generic IEEE 802.3u media interface> PHY 1 on miibus1
ukphy1: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
vr2: <VIA VT6105M Rhine III 10/100BaseTX> port 0x1800-0x18ff mem 0xe0080000-0xe00800ff irq 15 at device 11.0 on pci0
vr2: Quirks: 0x2
vr2: Revision: 0x96
miibus2: <MII bus> on vr2
ukphy2: <Generic IEEE 802.3u media interface> PHY 1 on miibus2
ukphy2: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
isab0: <PCI-ISA bridge> port 0x6000-0x6007,0x6100-0x61ff,0x6200-0x623f,0x9d00-0x9d7f,0x9c00-0x9c3f at device 15.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <AMD CS5536 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 15.2 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
ohci0: <OHCI (generic) USB controller> mem 0xefffe000-0xefffefff irq 12 at device 15.4 on pci0
usbus0 on ohci0
ehci0: <AMD CS5536 (Geode) USB 2.0 controller> mem 0xefffd000-0xefffdfff irq 12 at device 15.5 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
cpu0 on motherboard
pmtimer0 on isa0
orm0: <ISA Option ROM> at iomem 0xe0000-0xea7ff pnpid ORM0000 on isa0
atrtc0: <AT realtime clock> at port 0x70 irq 8 on isa0
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> at port 0x40 on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
ppc0: parallel port not found.
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
uart0: console (9600,n,8,1)
uart1: <16550 or compatible> at port 0x2f8-0x2ff irq 3 on isa0
Timecounters tick every 1.000 msec
IPsec: Initialized Security Association Processing.
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 480Mbps High Speed USB v2.0
ugen0.1: <AMD> at usbus0
uhub0: <AMD OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <AMD> at usbus1
uhub1: <AMD EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <SanDisk SDCFH2-002G HDX 4.32> CFA-4 device
ada0: Serial Number 012604J2208I1945
ada0: 16.700MB/s transfers (WDMA2, PIO 2048bytes)
ada0: 1953MB (4001760 512 byte sectors: 16H 63S/T 3970C)
ada0: Previously was known as ad0
random: unblocking device.
Timecounter "TSC" frequency 498060505 Hz quality 800
Root mount waiting for: usbus1 usbus0
uhub0: 4 ports with 4 removable, self powered
Root mount waiting for: usbus1
uhub1: 4 ports with 4 removable, self powered
Trying to mount root from ufs:/dev/ufs/pfsense0 [ro,sync,noatime]...
Configuring crash dumps...
/dev/ufs/pfsense0: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/pfsense0: clean, 1436513 free (1505 frags, 179376 blocks, 0.1% fragmentation)
/dev/ufs/cf: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/cf: clean, 97316 free (52 frags, 12158 blocks, 0.1% fragmentation)
Filesystems are clean, continuing...
Mounting filesystems...
Setting up memory disks...chown: dialer: illegal group name
done.


_
/ f \
/ p \
__/ Sense
\___/ \
\___/

Welcome to pfSense 2.2.3-RELEASE ...

Creating symlinks......ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout
done.

Under 512 megabytes of ram detected. Not enabling APC.

Jul 13 14:32:21 system279: [ERROR] [pool lighty] cannot get uid for user 'root'
[ERROR] [pool lighty] cannot get uid for user 'root'
Jul 13 14:32:21 system279: [ERROR] FPM initialization failed
[ERROR] FPM initialization failed
fcgicli: Could not connect to server(/var/run/php-fpm.socket).
Launching the init system... done.
Initializing...................... done.
Starting device manager (devd)...
Warning: chown(): Unable to find uid for root in /etc/inc/config.lib.inc on line 867

Warning: chgrp(): Unable to find gid for proxy in /etc/inc/config.lib.inc on line 868
done.
Loading configuration......done.
Updating configuration...done.
Cleaning backup cache........done.
Setting up extended sysctls...done.
Setting timezone...done.
Configuring loopback interface...done.
Starting syslog...done.
Starting Secure Shell Services...done.
Setting up polling defaults...done.
Setting up interfaces microcode...done.
Configuring loopback interface...done.
Creating wireless clone interfaces...done.
Configuring LAGG interfaces...done.
Configuring VLAN interfaces...done.
Configuring QinQ interfaces...done.
Configuring WAN interface...done.
Configuring LAN interface...done.
Configuring CARP settings...done.
Syncing OpenVPN settings...done.
Configuring firewall..DUMMYNET 0 with IPv6 initialized (100409)
load_dn_sched dn_sched FIFO loaded
load_dn_sched dn_sched QFQ loaded
load_dn_sched dn_sched RR loaded
load_dn_sched dn_sched WF2Q+ loaded
load_dn_sched dn_sched PRIO loaded
...0 addresses deleted.
.done.
Starting PFLOG...done.
Setting up gateway monitors...done.
Synchronizing user settings...done.
Starting webConfigurator...done.
Configuring CRON...done.
Starting DNS forwarder...done.
Starting NTP time client...done.
Starting DHCP service...done.
Configuring firewall.....0 addresses deleted.
0 addresses deleted.
.done.
Generating RRD graphs...
Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 289

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935

Warning: chown(): Unable to find uid for nobody in /etc/inc/rrd.inc on line 935
done.
Starting syslog...done.
Segmentation fault (core dumped)
Starting CRON... done.
fcgicli: Could not connect to server(/var/run/php-fpm.socket).
pfSense (nanobsd) 2.2.3-RELEASE i386 Tue Jun 23 16:37:33 CDT 2015
Bootup complete

FreeBSD/i386 (jumla-rt-01.np.net.inf.org) (ttyu0)

Password:
@

Actions #6

Updated by Phillip Davis almost 9 years ago

Today another remote site is reporting similar symptoms. I am in the process of turning around the old Jumla one, putting a Mugu config on it and sending it. But the dirt airstrip in Mugu is closed (we think) - it might have to take a walk over some high mountain passes.
I have never had trouble like this before. After doing a bunch of 2.2.4-DEVELOPMENT testing at the weekend I think it is time to upgrade to 2.2.4-DEVELOPMENT. This problem on 2.2.3 is just too bad for remote sites with bad power, which is what I have.

Actions #7

Updated by Chris Buechler almost 9 years ago

  • Status changed from Feedback to Resolved

ALIX and APU both made it through 1000 power cycles while rw mounted on the slowest SD/CF I could find with no problems.

Actions

Also available in: Atom PDF