Replacing the (inevitable) failed hard disk

Here’s the procedure (c1t2d0 is the culprit disk)

*note: this applies to a now obsolete Solaris Nevada distribution from ca. 2008*

[root@solaris ~]\$ zpool status -v
pool: opel
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:NAME        STATE     READ WRITE CKSUM
opel        DEGRADED     0     0     0
raidz2    DEGRADED     0     0     0
c1t0d0  ONLINE       0     0     0
c1t1d0  ONLINE       0     0     0
c1t2d0  UNAVAIL      0     0     0  cannot open
c1t3d0  ONLINE       0     0     0
c3d1    ONLINE       0     0     0
errors: No known data errors
[root@solaris ~]\$ zpool replace opel c1t2d0
[root@solaris ~]\$ zpool status -v
pool: opel
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h0m, 0,00% done, 0h0m to go
config:NAME              STATE     READ WRITE CKSUM
opel              DEGRADED     0     0     0
raidz2          DEGRADED     0     0     0
c1t0d0        ONLINE       0     0     0
c1t1d0        ONLINE       0     0     0
replacing     DEGRADED     0     0     0
c1t2d0s0/o  FAULTED      0     0     0  corrupted data
c1t2d0      ONLINE       0     0     0
c1t3d0        ONLINE       0     0     0
c3d1          ONLINE       0     0     0
errors: No known data errors
[root@solaris ~]\$

Amazingly, the failed disk (a 1-Tb Samsung) brought down the whole system! This, I suspect has to do with the nature of the failure and the specifics of how the motherboard chipset handled it (a Celeron running on an Intel 915 GEV). Luckily, it wasn’t too hard to figure out, but this proved that even Solaris servers can be unforgiving.

One more thing – I didn’t go through the “detach” procedure mentioned in, say, this thread, but I guess that’s alright…

Reference in Solaris docs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: