I’ve recently come into a set of large hard drives. These are cast offs from a large RAID array. It sounds like hardware RAID controllers can be a bit picky wit drives. If they start seeing bad blocks on a drive, they’ll kick it out of the array. This is fair enough since the drive remaps bad blocks elsewhere on the drive, slowing it down like data fragmentation does. Rather than scrap these drives (that are out of warrenty anyway), I’m using them for a software RAID array. Since I’m more interesting in storage space (these are 2.0TB each!), not access times, I can live with the bad blocks.
First off is actually plugging them in: I’ve got a generic “RAID” controller, that I’ve been using for a 3x500GB RAID5 array (1.0TB with 1 drive redundency). Plugging these new drives just hung the system on boot. The controller BIOS seemed to not understand such big drives and died trying. So I went hunting for a BIOS update.
The controller chip is made by Silicon Image, model SiI3114. Their website has BIOS updates for the chip, but only in Windows and DOS formats. Fortunatly I managed to get FreeDOS working. I formatted a USB stick to FAT16 (after wasting hours using FAT32), and got UNetbootin to download and write FreeDOS 1.0. I added in the DOS flash tool, and the BIOS and RAID binaries.
Booting into FreeDOS, (and after remembering all the DOS syntax) I got the flash tool to write the bios only binary. The documentation seems to suggest that this lets the controller access SATA drives but omits all the RAID code. This would be fine for doing software RAID, if it worked. Instead it picked up 2/4 HDDs as CDROMs, then hung if you tried to look at them. Finally I flashed in the RAID binary, and the BIOS read 4 2TB HDDs as it booted.
So, now I could get all the way into linux and take a look at the drives. First up, how to keep track of which is which? I used
smartctl -i /dev/sdX | grep Serial to get the serial numbers for each drive. Then I made a directory for each serial number, and a text file that mapped the linux device descriptor to a serial number. This way, if I see problem with a drive, I can match it to the physical disk and replace it. I then fired up a
screen session for each drive, and wrote out all the SMART info into a log file for each drive. All drives passed the basic SMART tests with
smartctl -d ata -H /dev/sdX.
I decided to take a log of bad blocks using, of course,