David Bakin’s programming blog.


Bad cables can masquerade as other errors

This is just a reminder:  A bad cable can masquerade as other errors.

In this case I was building a new server system – new motherboard, new drives, etc. keeping only the splendid case (Antec P180 as seen on Amazon because Antec’s site no longer shows it).  I had one particular new 4TB HDD that would drop out of a Windows Spaces storage pool (“Retired”, as if it had done some hard work and was now taking a well-earned rest)—sometimes within 10 minutes after booting, sometimes it would take an hour.

The event log before one of these drop outs would show a few bad commands, followed by a “bad block” error.

A S.M.A.R.T. diagnostic utility showed excessive errors in a couple of categories, none of them bad blocks.  One such category was “command timeouts” which was a clue, but I didn’t know how to interpret it.

Anyway, it was a brand new disk.  And I buy high-capacity HDDs all the time (I have nearly 50) yet I’ve never had any die of infant mortality.  So I tried moving the disk to a different port on the same brand-new motherboard controller (bad socket?), moving it to an add-in card controller (of the same type, Marvell) (bad motherboard chip?), moving it to a different port on a different motherboard controller (Intel chipset this time, bad driver?).  Failed each time!  So I ordered a new drive (same-day delivery!) and tried it…and it failed too in the same way!

Wracked my brain…and finally…swapped cables with an adjacent hard drive in the same enclosure.  Now the other drive failed!

So it was the cable.  Replaced the cable and all works fine.  I’ve copied 8Tb of data onto the new Storage Spaces array with no issues now.

I’ve never had a bad (internal case) cable before either.  And these were new cables.

Update 2014-09-09:  And bad network cables can make your PC connect at 100Mbps instead of 1Gbps!  This just happened to a colleague at work.