hubertf's NetBSD Blog
Send interesting links to hubert at feyrer dot de!
 
[20080823] Trying out journaling
After NetBSD got journaling integrated into FFS recently, I've built and installed -current, and had a look. In short: it works just as expected. In other words: Yai! :-) :-) :-)

The wapbl(4) manpage gives more details: To enable, a kernel with "options WAPBL" needs to run, which is available in NetBSD-current since end of July 2008. Userland from a similar date is useful, as the mount(8) command needs to know about the new "log" option. With the proper system, it's pretty much a no-brainer:

  1. In /etc/fstab, enable logging for the file system(s) you need, in my case it's just /:
         /dev/wd0a       /       ffs     rw,log        1 1 

    This is actually the only thing that needs to be done. All the rest writen here just explains things in a bit more details.

  2. Note that journaling is not active on the file system(s) at this point, so pressing the reset button for testing will result in a file system check (fsck) - don't do it right now. :)

  3. Reboot the system. Nothing special will show up in the boot messages:
         ...
         audio2 at pad0: half duplex
         boot device: wd0
         root on wd0a dumps on wd0b
         root file system type: ffs
         Fri Aug 22 20:45:55 CEST 2008
         swapctl: adding /dev/wd0b as swap device at priority 0
         Starting file system checks:
         /dev/rwd0a: file system is clean; not checking
         Setting tty flags.
         ...  

  4. Let's recall what happens here: after probing the hardware and initializing device drivers (audio, ...), the kernel looks at disk drives for a file system with a root partition (i.e. a disk with BSD disklabel, "a" partition, and a known file system in it). It will use the first root file system it finds, and mount it read-only.

    As the above output is from a multi-user boot (not a single-user boot), the kernel continues to run init(8), which in turn runs /etc/rc (which then runs all of /etc/rc.d/* etc.). First things in the boot process can be determined by using the rcorder(8) tool just like /etc/rc does:

         $ cd /etc/rc.d/
         $ rcorder * | head
         wdogctl
         raidframe
         cgd
         ccd
         swap1
         fsck
         root
         ... 

    Of the above scripts, raidframe, cgd and ccd configure additional disk devices, wdogctl and swap1 are of minor interest here. The two interesting scripts are "fsck" and "root": "fsck" runs fsck(8), which in turn goes through the list of known file systems in /etc/fstab, and checks for each file system if it was unmounted cleanly last time. If not, the file system will be checked, possibly repaired, and marked as clean. This is the much-hated, time consuming process preventing a fast reboot when the system crashed.

    After ensuring all file systems are in a consistent state, the "root" script mounts the root (/) file system read-write.

    Following that, all other scripts run, create temporary files, configure network devices, enable login and whatnot. Important parts here are the order of the kernel first mounting the root file system read-only, and after checking enable writing.

  5. As we have marked the root file system for journaling, the log (journal) is created when mounting the file system read-write. For NetBSD, the log has only meta-data, i.e. information on what changes were made to the file system's management data structures like directories, link counts, etc. No data blocks are journaled. This may not be 100% optimal from a user point, but it ensures that the file system is in a consistent state with respect to meta-data.

  6. When the file system is mounted with journaling enabled, bad things are welcome (well, sort of :-) to happen, and the system will handle them gracefully: kernel panics, power failures, someone pressing the reset button - everything that disrupts system operation and gets the file system into an inconsistent state will be caught by replaying the journal on the next boot.

    Note that journaling will not help about user/admin errors like when you accidentally remove a file!

  7. After the system went down in flames -- for research purpose and better predictability, let's assume we've pressed the reset button -- with the file system in an unclean state, this will be displayed on the next boot:
         ...
         audio2 at pad0: half duplex
         boot device: wd0
         root on wd0a dumps on wd0b
         /: replaying log to memory
         root file system type: ffs
         Fri Aug 22 20:49:55 CEST 2008
         swapctl: adding /dev/wd0b as swap device at priority 0
         Starting file system checks:
         /dev/rwd0a: file system is journaled; not checking
         /: replaying log to disk
         Setting tty flags.
         ...  
  8. After finding the root file system, the kernel first recognizes the journal, and assumes that the system crashed. The system doesn't know what's up with the disk so far, so won't go and alter the disk by writing the changes from the log onto the disk. Instead, those changes are replayed to memory only. This leaves the disk as-is, but the in-memory view of the file system will be consistent.

    Running fsck then recognizes the file system as journaled, and won't touch it, assuming that the log caught all bads. Mounting the file system in the next step finally replays the changes in the journal onto the disk, and finally sets it into a consistent state permanently. After that, the regular boot process can proceed as usual.

    Please note that the messages "/: replaying log to memory/disk" are printed by the kernel, as it's the kernel that runs all the file system code.

  9. When the system is up and running, the mount(8) command can be used to determine if logging is enabled or not:
         # mount
         /dev/wd0a on / type ffs (log, local) 
    The "log" here in the mount options indicates that journaling is enabled.

First impressions of journaling are pretty good, the facts that the journal needs no further maintenance. The fact that it's placed inside the file system per default and doesn't need extra space is very nice, too. People that want to keep the log after a partition for a reason can do so, plus also specify a maximum journal size.

The enduser impact of this is that lenghty file system checks are (hopefully :-) a thing of the past now!

[Tags: , ]


Disclaimer: All opinion expressed here is purely my own. No responsibility is taken for anything.

Access count: 35086845
Copyright (c) Hubert Feyrer