Monday, June 8, 2009

Statistical outlier in the MTBF (a sad story)

It was late yesterday—just hours from the anticipated BleachBit 0.5.1 release. All the QA was done. The new web pages were ready. I finished manually building installation packages for Windows, Debian 5, Ubuntu 6.06, and Ubuntu 9.04, The openSUSE Build Service finished building the remaining installation packages except for just one. When OBS finished building SLE 9, I was ready to "push the button" for the release. I was happy with the progress in 0.5.1—a good combination of features, enhancements, bug fixes, and translation updates— and looked forward to making the public announcement.

After all this work, it was time to take a break. On my main computer where I develop BleachBit I was starting to (legally) rip a new music CD in SoundJuicer. Suddenly, the screen went blank. Had I accidentally caused the system to hibernate? Maybe I had brushed the sleep key? I checked the caps lock key for a "pulse": no, it doesn't respond. Eventually the dark Linux kernel console emerged to coldly repeat two lines of cryptic errors about sda (sda is the hard drive). `'Tis some visitor,' I muttered, `tapping at my chamber door - Only this, and nothing more.' I powered the machine down forcefully while reassuring myself it was merely a fluke I/O driver malfunction: surely a reboot would restore my precious system. I haven't experienced any Linux driver stability issues in about five years, and then it was only when I made the careless mistake of ejecting a DVD while it was burning. Lately this computer had been very stable and actually had been running continuously without reboot for some 50-60 days. Disappointed to lose progress on my uptime record, I resigned to the need to reboot. Strangely the BIOS POST hard drive detection took longer than usual. Then, the BIOS flashed a message that caused my stomach to sink: INSERT BOOTABLE DISK, OPERATING SYSTEM NOT FOUND. I rebooted again: same results. I swapped cables and the SATA controller port—nothing.

So the six-month-old 500GB hard drive in a five-year-old Compaq desktop failed, and the BIOS will not acknowledge it exists. I'm still diagnosing whether it's actually the hard drive or the controller on the motherboard. I hope it's not the hard drive because some recent family photos were not backed up, but I hope it's not the motherboard because I don't have a budget for a new PC.

All that to say two things. First, BleachBit 0.5.1 is delayed (though all the source code safe thanks to SourceForge SVN). Second, never become complacent about performing regular backups! In my day job many times I've left the "operating room" to inform the worried "relatives" of the "patient's" terminal state without fully internalizing the potential for me to be the next person blind-sighted by a statistical outlier in the MTBF. Now that outlier is me.

1 comment:

  1. :(
    Too bad men, too bad news.

    I recommend RAID+LVM on Mirroring, 2 HDD,
    and if you have an old disk, use it with Back In Time,
    just like OSX's Time Machine.

    I dont have money for 2 big HDD,
    but i use an old 20Gb IDE disk with Back In Time,
    mirroring my /home.


Note: Only a member of this blog may post a comment.