The Linux Page

Currently Unreadable Sectors detected by SMART...

Hard Drives for a RAID system

I got 4 x 10Tb HDD -- HGST Ultrastar He10 -- click the link to find them on Amazon

What Happened?

One of my RAID1 drives, the important ones, started sending me error email messages about 8 sectors being "unreadable".

So first I waited a bit to see whether it would resolve itself. It did go silent for a few months and came back...

At that point, I decided to purchase a replacement and retire the drive giving me errors. The new drive looks good so far!

Check Current SMART Status

To see what the SMART system discovered, you can use the smartctl command like so:

smartctl -a /dev/sdg1

The output is pretty long. The specific field I would look for is the:

Current_Pending_Sector

The number of pending sectors is the same as the number I would get in the emails.

To know whether the hard drive may need to go to the trash, check the:

Reallocated_Event_Count

If that second number is really high, then there are issues on your drive and the SMART system is trying to save data to new blocks. It may just be a small area on the drive which is bust (it could even be a manufacturing issue, although they low format and verify drives before selling them so it's unlikely that you would get such a drive).

The Number of Hours used can also be of interest:

Power_On_Hours

You have about 8,766 hours per year. An HDD can live for about 10 years, so once you reach around 87,000, you may want to consider changing the drive even if it still works just fine... I don't know as much about SSD, it is likely that these have a similar lifetime warranty, however, I think that the number of writes will determine the life of an SSD whereas the HDD is both read & write and just number of hours turned on (especially if the motor runs nearly permanently).

Using the Old Drive as a Scrap Drive

Since I do collect a lot of data with a few tools (spider like, if you wish) I can make use of a 10Tb drive because if it ends up failing I won't be losing anything of any remote importance to me.

But the problem is that the SMART device would continue to send me one email a day about those 8 sectors (on a 10Tb, 8 sectors is really nothing and it has not grown at all since these errors started!)

So I decided to search about how to fix the error. That was complicated. No one had a good idea. I've seen all sorts of commands to do it, but none that I thought would make sense on such a large hard drive. The main problem, to my point of view, is that these 8 sectors where not labeled anywhere. That is, SMART knows errors occured, it just doesn't know where. That means you've got to check the entire drive, probably in read/write mode. There is a test you can run for that purpose and that's what I've done:

sudo fsck -C -f -c -c -y /dev/sdg1

This command (which is definitely not the default fsck command line!) will go through every single block of the specified partition (/dev/sdg1 in my example) and read the block and then write the data back to it. Obviously, for a partition of some 9.7Tb + all the inodes, etc. that means quite a bit of time spent running. For my hard drive, this took about 40 hours.

The -c options is what asks fsck to run the badblocks utility to verify the data of all sectors. If you think the bad blocks could be outside of the partition, then you've got yet another problem and fsck won't suffice, but in my case that worked.

The first -c means "read" and the second -c means "write". Using both was important, although I first tried with a single -c. This was only 20 hours, but it fixed nothing at all.

So I think that what fixed the sectors is the write.

The good news for me is that in the end the disk did not reallocate the sectors anywhere else. That means the drive is probably just fine.

Adding/Removing Drives in a RAID1 System

I use a software RAID1 with an md0 device (i.e. the partitions are of type Linux RAID).

To add/remove drives, you use the mdadm command.

In my case, I did not want to remove the drive that was "breaking down" since it seemed to be mostly working just fine. I never saw I/O errors on it to that point, other than those 8 sectors reported by the SMART system.

So at first, I added my new drive:

sudo mdadm --grow /dev/md0 --raid-devices=3 --add /dev/sdh1

The drive needs to be formatted first to support raid.

Then I waited for the data to be duplicated, which took a few days. I can see by looking at the status in the /proc folder. It will look like the following once done:

$ cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdg1[1] sdf1[0] sdd1[3]
      9766302720 blocks super 1.2 [3/3] [UUU]
      bitmap: 24/73 pages [96KB], 65536KB chunk

Before it's done, there is an extra line showing you the percent completion.

Now we want to remove the failing drive and reduce the size so mdadm doesn't complain about a missing drive:

sudo mdadm /dev/md0 --fail /dev/sdg1
sudo mdadm /dev/md0 --remove /dev/sdg1
sudo mdadm --grow /dev/md0 --raid-devices=2

Notice how we use --grow to reduce the number of devices in the array.

We are required to mark the drive as failing (--fail) before we can remove it. This is a safety measure so you don't end up removing drives that are still viewed as working in your array.

Now you can check the mdstat parameters again and see that your array is again just 2 drives.

Can I Replace the Failing Drive with a Bigger Drive?

Yes.

You can absolutely have a larger drive. However, when you format that new drive, make sure your partition is the same size as the existing array drives.

The one thing you must keep in mind in this case is that the extra space will be "lost". That is, you can't use it without disturbing the good functioning of the hard drive in your array.

i.e. imagine that you create a separate partition and mount it to /tmp. Now your new drive head is going to move to that new partition each time something is read/written to /tmp and that means the RAID speed is impacted dearly.

So it's not that you can't use the extra space, but you will lose on speed if you do. If you really need more space, you probably want to get yet another drive and use it separately. You could easily get 16Tb in one HDD as of 2021, so saving just 1 or 2 Tb on a RAID drive would not be that useful. If you're really tight money wise, it's still a solution for you, but again, the impact on your RAID is going to be really high.