[NBLUG/talk] Missing ReiserFS superblock!
Lincoln Peters
petersl at sonoma.edu
Mon Jul 24 00:33:31 PDT 2006
(Yes, I *am* e-mailing from a different e-mail address, due to the
severity of the problem I'm having with my primary computer.)
I'm not sure exactly what triggered this failure, but if I had to
guess, I'd say that the heat wave we're experiencing caused the
computer to overheat, ultimately triggering some sort of failure. At
about 4:00pm, I noticed that the computer's screen was off
(presumably because I set it to go to sleep if idle for more than 40
minutes), but I couldn't get the screen to come back on. Next, I
tried using another computer to SSH into the misbehaving computer. I
can't connect. On further investigation, I can't even ping it!
Finally, I power it down.
I opened the tower, and sure enough, the air temperature inside the
case was noticeably high (I don't know exactly how high, as I didn't
have a thermometer handy). After giving it time to cool, I turned it
back on. It went through the low-level boot process (POST,
installing BIOS from my add-on disk controllers, etc.) without any
difficulty, but when it tried to mount my home directory (a ReiserFS
filesystem on a RAID-5 array), I got an error message indicating that
the superblock was either missing or corrupted.
After verifying that the RAID array was actually running (I've
previously encountered similar errors when, for whatever reason, the
array failed to start), I tried running reiserfsck with the "--
rebuild-sb" option. That seemed to work, although I neglected to
make note of exactly what output it generated (apparently I didn't
realize I might want to review it later). I then tried again to
mount the filesystem, and got the exact same superblock error!
I've searched the Internet for any clues as to what kind of error
might cause the superblock to be unreadable AND impossible to rebuild
using the "--rebuild-sb" option. So far, I've turned up absolutely
nothing.
Here's what I do know:
* One of the pages I *did* turn up indicates that, if you run
"reiserfsck --rebuild-sb" twice in a row, you can cause irreversible
filesystem corruption. If not for that, I might try it again, if for
no other reason to re-examine its output for clues. (Now I'm
starting to think that the guys who used teletypewriters to interact
with the early UNIX mainframes may have been on to something...)
* I don't know if the high temperatures caused any hardware failures
that wouldn't have made themselves known during the POST sequence.
I'm scanning my hard drives for bad blocks as I write this; no
results yet, but since each hard drive is at least 250GB, this will
have to run overnight.
* I am using a rather unusual dm-crypt setup along with the RAID-5
array. While most people would set up dm-crypt on top of RAID, I
didn't want to have to rebuild my entire 500GB filesystem, so one by
one, I disconnected each drive from the array, applied dm-crypt to
it, and reattached it to the array (actually, I didn't quite finish
this process; one of the three disks is still unencrypted). This
does allow me to do a few interesting things, such as assign a
different key to each disk (though I'm not yet sure if that's any
better than using the same key on all of them) and periodically
change the key by repeating the process by which I originally set the
whole thing up. On the other hand, I think it has a higher
performance cost than running dm-crypt on top of RAID (especially
when rebuilding a disk!), and I had to rearrange the order in which
my initscripts were run so that cryptdisks would start *before*
mdadm. This probably doesn't have any bearing on this particular
problem, but I thought I should mention it, just in case. (I was
planning to throw this idea out to the list once I had it fully
operational, but it seems that fate had other plans!)
* I'm not sure if ReiserFS is inherently any better or worse than any
of the more popular high-performance journalled filesystems, but
since it's proving much more difficult to support, I *am* very
seriously considering going to the trouble of migrating to something
else (if I get started soon, I could burn the contents of the array
to rewritable DVD's, although I'd probably need about 125 of them).
But only after I've done what I can to recover the existing ReiserFS
filesystem.
* If I want to prevent a similar incident from happening at a
critical time (I seem to recall that I had a similar but apparently
less severe problem last year the day before a project was due), does
that mean I am going to have to come up with a way to do secure
backups to an offsite unit with a capacity of at least 500GB??? (I'm
starting to have strange visions of an operational, weatherproofed
NAS rig buried somewhere in or around my backyard, possibly connected
to my network over a WiFi link [encrypted with AES, of course] and
powered by a geothermal vent...)
--
Lincoln Peters
<petersl at sonoma.edu>
We secure our friends not by accepting favors but by doing them.
-- Thucydides
More information about the talk
mailing list