smartctl notes

Submitted by sandip on Thu, 07/15/2010 - 10:25

Below is a list of smartctl commands I frequently use to quickly verify disk health and status, specially when you have smartd logging errors to messages log file.

Print all SMART (Self-Monitoring, Analysis and Reporting Technology) information for drive /dev/sda (Primary Master).
smartctl -a /dev/sda
Enable SMART on device.
smartctl --smart=on /dev/sda
Get info about the device:
smartctl -i /dev/sda
Show the capabilities of drive. Also provides status when tests are being carried out.
smartctl -c /dev/sda
Basic health status:
smartctl -H /dev/sda
Display attributes. The attributes to look out for failing disk is Reallocated_Sector_Ct, Reallocated_Event_Count, Current_Pending_Sector and Offline_Uncorrectable. Their RAW_VALUE should normally be "0".
smartctl -A /dev/sda
Immediate offline test which updates attributes value. Good to run after a badblocks fsck check before checking on the attributes values.
smartctl -t offline /dev/sda
Run a thorough long test if you see suspect attributes with -A option as mentioned above.
smartctl -t long /dev/sda
Examine self-test log. Shows if tests failed or passed.
smartctl -l selftest /dev/sda
Display most recent error log.
smartctl -l error /dev/sda

There are more examples in man smartctl.

Bookmark/Search this post with

sandip's blog
Login or register to post comments

Resolving sector errors on raid partition

Submitted by Anonymous on Tue, 01/18/2011 - 12:22.

On software raid partitions, CurrentPendingSector or OfflineUncorrectableSector errors as logged in syslog could be corrected just failing/removing the drive and re-attaching it back so the drive is rebuilt and the problem sectors get over-written.

Below, I have 4 CurrentPending and OfflineUncorrectable sectors:

# smartctl -A /dev/sdb | grep "Current_Pending_Sector\|Offline_Uncorrectable"
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       4

Doing a selftest, confirms that the first sector lies in the second partition:

# smartctl -l selftest /dev/sdb
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     18654         3166126

Sector 3166126 lies in the second partition:

# fdisk -lu /dev/sdb

Disk /dev/sdb: 750.1 GB, 750156374016 bytes
255 heads, 63 sectors/track, 91201 cylinders, total 1465149168 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *          63      401624      200781   fd  Linux raid autodetect
/dev/sdb2          401625  1465144064   732371220   fd  Linux raid autodetect

Locate the raid partition:

# grep sdb2 /proc/mdstat 
md1 : active raid10 sdb2[4] sdd2[3] sdc2[2] sda2[0]

Make the partition faulty and remove:

# mdadm --manage /dev/md1 -f /dev/sdb2
# mdadm --manage /dev/md1 -r /dev/sdb2

Re-attach the partition and let it rebuild:

# mdadm --manage /dev/md1 -a /dev/sdb2

Once rebuilt redo selftest and check on errors:

# smartctl -t long /dev/sdb
# smartctl -A /dev/sdb | grep "Current_Pending_Sector\|Offline_Uncorrectable"
# smartctl -l selftest /dev/sdb

Drive keeps extra space available to "remap" bad sectors. This happens automatically. If uncorrectable sector errors does not resolve or comes back time and again, it means re-mappable sectors are used up and drive will probably fail soon, so best to just replace the drive.

smartctl notes

Comment viewing options

Resolving sector errors on raid partition

See Also

Navigation

User login

Recent blog posts

Who's online