[NBLUG/talk] Missing ReiserFS superblock!

Lincoln Peters petersl at sonoma.edu
Mon Jul 24 00:33:31 PDT 2006


(Yes, I *am* e-mailing from a different e-mail address, due to the  
severity of the problem I'm having with my primary computer.)

I'm not sure exactly what triggered this failure, but if I had to  
guess, I'd say that the heat wave we're experiencing caused the  
computer to overheat, ultimately triggering some sort of failure.  At  
about 4:00pm, I noticed that the computer's screen was off  
(presumably because I set it to go to sleep if idle for more than 40  
minutes), but I couldn't get the screen to come back on.  Next, I  
tried using another computer to SSH into the misbehaving computer.  I  
can't connect.  On further investigation, I can't even ping it!   
Finally, I power it down.

I opened the tower, and sure enough, the air temperature inside the  
case was noticeably high (I don't know exactly how high, as I didn't  
have a thermometer handy).  After giving it time to cool, I turned it  
back on.  It went through the low-level boot process (POST,  
installing BIOS from my add-on disk controllers, etc.) without any  
difficulty, but when it tried to mount my home directory (a ReiserFS  
filesystem on a RAID-5 array), I got an error message indicating that  
the superblock was either missing or corrupted.

After verifying that the RAID array was actually running (I've  
previously encountered similar errors when, for whatever reason, the  
array failed to start), I tried running reiserfsck with the "-- 
rebuild-sb" option.  That seemed to work, although I neglected to  
make note of exactly what output it generated (apparently I didn't  
realize I might want to review it later).  I then tried again to  
mount the filesystem, and got the exact same superblock error!

I've searched the Internet for any clues as to what kind of error  
might cause the superblock to be unreadable AND impossible to rebuild  
using the "--rebuild-sb" option.  So far, I've turned up absolutely  
nothing.


Here's what I do know:

* One of the pages I *did* turn up indicates that, if you run  
"reiserfsck --rebuild-sb" twice in a row, you can cause irreversible  
filesystem corruption.  If not for that, I might try it again, if for  
no other reason to re-examine its output for clues.  (Now I'm  
starting to think that the guys who used teletypewriters to interact  
with the early UNIX mainframes may have been on to something...)

* I don't know if the high temperatures caused any hardware failures  
that wouldn't have made themselves known during the POST sequence.   
I'm scanning my hard drives for bad blocks as I write this; no  
results yet, but since each hard drive is at least 250GB, this will  
have to run overnight.

* I am using a rather unusual dm-crypt setup along with the RAID-5  
array.  While most people would set up dm-crypt on top of RAID, I  
didn't want to have to rebuild my entire 500GB filesystem, so one by  
one, I disconnected each drive from the array, applied dm-crypt to  
it, and reattached it to the array (actually, I didn't quite finish  
this process; one of the three disks is still unencrypted).  This  
does allow me to do a few interesting things, such as assign a  
different key to each disk (though I'm not yet sure if that's any  
better than using the same key on all of them) and periodically  
change the key by repeating the process by which I originally set the  
whole thing up.  On the other hand, I think it has a higher  
performance cost than running dm-crypt on top of RAID (especially  
when rebuilding a disk!), and I had to rearrange the order in which  
my initscripts were run so that cryptdisks would start *before*  
mdadm.  This probably doesn't have any bearing on this particular  
problem, but I thought I should mention it, just in case.  (I was  
planning to throw this idea out to the list once I had it fully  
operational, but it seems that fate had other plans!)

* I'm not sure if ReiserFS is inherently any better or worse than any  
of the more popular high-performance journalled  filesystems, but  
since it's proving much more difficult to support, I *am* very  
seriously considering going to the trouble of migrating to something  
else (if I get started soon, I could burn the contents of the array  
to rewritable DVD's, although I'd probably need about 125 of them).   
But only after I've done what I can to recover the existing ReiserFS  
filesystem.

* If I want to prevent a similar incident from happening at a  
critical time (I seem to recall that I had a similar but apparently  
less severe problem last year the day before a project was due), does  
that mean I  am going to have to come up with a way to do secure  
backups to an offsite unit with a capacity of at least 500GB???  (I'm  
starting to have strange visions of an operational, weatherproofed  
NAS rig buried somewhere in or around my backyard, possibly connected  
to my network over a WiFi link [encrypted with AES, of course] and  
powered by a geothermal vent...)


--
Lincoln Peters
<petersl at sonoma.edu>

We secure our friends not by accepting favors but by doing them.
                 -- Thucydides




More information about the talk mailing list