We just lost 3TB of data on a SanDisk Extreme SSD

@floofloof@lemmy.ca · 11 months ago

We just lost 3TB of data on a SanDisk Extreme SSD

@jarfil@beehaw.org · edit-2 11 months ago

Can we agree that brand new drives aren’t supposed to fail?

No.

The typical failure rates, for pretty much all electronics, even mechanic stuff, form a “bathtub graph”: relatively many early failures, very few failures for a long time, with a final increasing number of failures tending to a 100%.

That’s why you’re supposed to have a “burn in” period for everything, before you can trust it within some probably (still make backups), and beware of it reaching end of life (make sure the backups actually work).

@u_tamtam@programming.dev · 11 months ago

That’s absolutely true in the physical sense, but in the “commercial”/practical sense, most respectable companies’ QA process would shave off a large part of that first bathtub slope through testing and good quality practices. Not everything off of the assembly line is meant to make it into a boxed up product.

@jarfil@beehaw.org · 11 months ago

Apparently even respectable companies are finding out that it’s cheaper to skimp on QA and just ship a replacement item when a customer complains. Particularly when it’s small items that aren’t too expensive to ship, but some are doing it even with full blown HDDs.

@agentsquirrel@beehaw.org · 11 months ago

Indeed. An old EE mentor told me once that most component aging takes place the first two weeks of operation. If it operates for two weeks, it will probably operate for a long, long time after that. When you’re burning in a piece of gear, it helps the testing process if you put it in a high temperature environment as well (within reason) to place more stress on the components.

@jarfil@beehaw.org · edit-2 11 months ago

The high temperature part is kind of a trap with SSDs: flash memory is easier to write (less likely to error out) at temperatures above 50C, so if you run a write heavy application at higher temperature, it’s less likely to fail than if it was kept colder.

Properly stress testing an SSD would be writing to it while cold (below 20C) and checking read errors while hot (above 60C).

For normal use you’d want the opposite: write hot, read cold.