best blog this so i don't forget.
the day: thursday, march 17, 2005.
under the influence of the off-season typhoon auring, rains sweep into our corner of the metro. now, in our old building, such rain was cause for concern, as the electrical grid there seemed unusually reactive to wet weather. it would rain, and then the power would go out. not good for computers in general, and servers in particular, to have power outages whilst in the middle of extensive hard-disk activity (moreso for servers).
couldn't possibly happen in the new building. we're state of the art, so i've been told.
somewhere about an hour to going-home-time, lights flicker -- it rained just a while before. building's uninterruptible power supply kicks in -- computers all stay on as the building lights go dark. outside, the generator starts up -- and all the computers die. odd, but at the time no one was really concerned...
...until it was discovered that in the course of the outage, the main server had been impacted. seems that in the process of writing to the raid, the power outage had scrambled a file somewhere in the vastness of the array's terabytes of storage with the end result that the raid could no longer be accessed by the host computer.
odd, also, that such a seeming inconsequence as a single file system error would bring the whole server down like a pack of cards. small wonder that other hardware/software solutions make up the bulk of the internet hardware backbone.
for heaven's sake, not even the os-supplied raid administration or disk repair tools can seem to fix the error, and we're now approaching 24 hours since the event. for a fix, even the support line (so i've been told) could only offer that we ought to use a third-party disk repair tool -- and be prepared to wait three days to see whether we've lost anything.
so here we wait.
Friday, March 18, 2005
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment