Dutch Power Cows RC5 verslaafden zullen gemerkt hebben dat de Distributed stats server de afgelopen dagen regelmatig in puin heeft gelegen. Nugget en Decibel melden in een .plan update dat de Onze Lieve Opperkoe aan een dementerende SCSI drive lijdt. De hoogste prioriteit ligt op dit moment bij het bouwen van een backup:
Nugget: First off, thanks for all your patience with statsbox's instability the past few days. It looks like we're losing the 9gb drive that's in the machine. getting scsi bus errors, timeouts, and system panics. (panics are, we think, because that drive housed some swap space in addition to database data).We're in the process of doing a backup of all the data to a few off-site machines and then we're going to do some more detailed diagnostics to determine the source of the problem. The symptoms are still a bit sketchy at this point and we don't have a solid theory. Data integrity first, then we'll worry about replacing the faulty hardware.
It's unlikely that we'll bring stats up tonight, most likely it'll be tomorrow at the earliest.
Look for further updates from myself, decibel, or peter as we hone in on the problem.
]:8)
Decibel: Statsbox update....
The box has definitely degraded... it's been down more than it's been up today. At this point, our priority is just to get a solid backup before we're totally dead in the water.
Based on the few useful error messages from /var/log/messages, this seems to be a virtual memory issue, and we can also tell that one of the drives is failing, so a safe bet would be that the swap partition on that drive is giving us fits. A donated RAID controller should be on it's way, which would solve this *IF* it was in fact the drive that was to blame, but that's a pretty big *IF*. There's also been discussion of flat-out purchasing a box from a major manufacturer, with on-site tech support, etc. But it would take some time before such a box could be put into production, so hopefully we can stabilize statsbox in the meantime.
More info as available...