Recently I’ve had a discussion about filecorruptions / corruptions on the oracle (database) level, and what journalling filesystems do to protect that.
It seems that the general believe is journalling filesystems work the following way:
1. write is done to the filesystems’ “intent log”
2. write is done to the filesystem at the actual location
3. previous write action in the “intent log” is flagged as truly written
This way, when a disaster happens, the system just has to redo all writes not flagged as written to be consistent. In fact, that’s what I’ve been taught in HPUX system administration classes.
As you’ve probably guessed by now, that is not the case. In fact a thesis of Vijayan Prabhakaran describes how the journaling works for Ext3, ReiserFS, JFS, XFS and Windows NTFS.
The thesis investigated the capability of failure resolution of above described filesystems. To do so, it investigated the way the journaling works. This investigation shows the default setup of all these filesystems do only do journalling for the filesystem metadata, NOT for the data (!!!) Ext3 and ReiserFS can be configured to do journalling for data too, but that requires reconfiguration.
This means that after a crash, your filesystem itself is in a consistent state after online recovery, but the data inside your files might not be…
Does anyone know how journalling is done on filesystems on AIX, SUN, HPUX, True64 and 3rd party filesystems like Veritas?