@matigo Meh. Simple drive mirror with a hot spare ready and waiting in the case. That's how THIS one was but someone apparently took that drive and repurposed it elsewhere. He doesn't work here now so I can't strangle him.
@matigo Nothing wrong with the machine. The problem is that it cannot be down during anyone's normal business hours as it contains authentication data for an application of ours. (A decision I was not fond of but I wasn't in charge when it was made)
The way I am doing it means a simple plugging in of the failing drive and it's up in one minute.
The drive's bad spots are at the very stinking end of the partition. No data there.
@matigo …..Hopefully, the fsck -c run will mark the bad blocks off of the failing logical volume and mdadm will know to ignore them when restoring to the new drive. That way, I can just remove the bad drive, add another at leisure and be done with it.
@matigo It's the straightforward way but so time consuming. Make appropriate partitions on new drive, mount each logical volume on the failing raid volume and cpio them over, install bootloader, generate initial ram disk, hopefully boot on new drive, insert another, duplicate the partition layout, convert it to a new raid volume, cpio everything over to THAT drive, do the bootloader and ramdisk dance again, boot from THAT drive, duplicate the layout on the prior drive, add it to the raid array and let Linux stripe it.
2:45am. At office. Round two. Round three will take place if a knockout is not scored this round. Round three is the brute force round where I will certainly win via knockout but I'll have to start at midnight as it will take seven hours for the brute force approach.
We'll have to be down for upwards of three hours in order to have fsck mark the bad blocks on the logical volume causing the headaches. THEN.maybe I can get a reimaging to the second drive to take. THEN pull the original out and reimage another new one.
Bad blocks appear to be at the very end of the filesystem where no data is written. This is good.
@matigo sdb failed utterly. Dead. Kaput. Replaced with new drive. Now, it turns out, that sda has errors. SMART reported nothing. Never once got a degraded raid event until sdb failed. I should have gotten one the moment sda started developing errors. Unless, of course, it picked JUST NOW to develop them.
I'm forcing a sync as best I can onto the new drive, booting from it and replacing sda too.
I may wind up having to put a bare drive in, cpio the filesystem over, go through the gyrations of making the new drive bootable and a raid device and going from there.
Aug 20 06:03:27 server kernel: md/raid1:md1: sda: unrecoverable I/O read error for block 4799744Well, this isn't going very well at all.
[>………………..] recovery = 1.3% (13498240/975735676) finish=164.4min speed=97542K/secAnd while the RAID volume recovers, I will busy myself with other matters, I guess. Thank God that system is only 1TB RAID 1/.