Why Consumers Can Expect More Flaky Flash SSDs!
Editor:- October 18, 2011 - Like anyone who makes a lot of market predictions I'm delighted if any of them come true - but there are 1 or 2 cases where I would be just as happy to be proved wrong - in particular - on the subject of consumer SSD reliability.
2 years ago I wrote an article called - Why can consumers expect to see more flaky flash SSDs? - which had the sub-headline - "You need to stay vigilant because it's
not going to get better anytime soon." (That earlier article is at the bottom of this page.)I get a lot of emails about this subject - which is threatening to tarnish the reputation of the whole SSD market - and not just the small part which is consumer drives.
Today I got an email from a reader who told me that out of 13 client SSDs he'd bought 7 months ago "4 have died so far."
He gave me a link to a blog on codinghorror.com which illustrates the soul searching and frustration that SSD unreliability is causing to so many thoughtful people who want to get the speedup advantages of SSDs but are rightly anxious about the flaky reputation of consumer SSDs.
I agree this is a lamentable state of affairs - which needs some explanations. This is a tidied up version of what I said.
SSDs aimed at the consumer market are designed to deliver basic functionality at the lowest price. That means the designers (originally due to ignorance – but nowadays with foreknowledge) have to decide what shortcuts they can take in the production process and what design factors they can leave out to reduce the price - compared to a reliable industrial / military / enterprise grade SSD.
There are countless techniques they can use to get the cost down.
- Shutting off reliability features in the controller. For example the SandForce SF-2200 controller (launched in Feb 2011 and optimized for consumer markets) has an option which enables oems to deliver an SSD with a smaller or larger usable capacity when using exactly the same set of flash chips. The bigger capacity sounds like it's better value for money to the consumer – but they are losing some of the RAIS protection which means the SSD won't survive the failure of an entire flash chip. And that's just the tip of the SSD capacity iceberg.
- Using cheaper components in the power loss management system. The consequences are that the trigger events to save data may come at the wrong time or that the capacitors don't hold enough charge to maintain reliable operation for vital data saves – because they have drifted out of tolerance. That's before considerations like whether the controller has an intrinsic foolproof auto recovery architecture in the first place.
- Using no-name cheap flash memory. The difference between the best and worst flash manufacturers is a many times factor. Also if the memory is unknown – the controller parameters may not be set up correctly for it – leading to wrong handling by the controller.
- Saving time and cost on testing the design. Many consumer SSD products aren't adequately validated before they're shipped. That's why you hear about firmware upgrades and recalls. It's expensive for SSD makers to invest in comprehensive tests before they ship - and some consumer SSD marketers worry that if they delay their launches they run the risk of losing market share.
But even when all those precautions are taken - expensive SLC flash enterprise SSDs can fail too. The difference in the enterprise is that the data is more likely to be backed up and the storage system is likely to be protected by a RAID-like or fail-over architecture which means that life can go on without too much disruption.
If it's any consolation – the hard disk industry was even worse at one time. In 1986 when I was designing a demo RAID controller - most of the brand new drives I got for the project arrived with serious faults.
What is the SSD industry doing to improve the state of the art? You can get an idea of who's doing what in the SSD reliability papers.
Why can consumers expect to see more flaky flash SSDs? - the 2009 version
Editor:- August 10, 2009 - Intel has been in the computer news in recent weeks for suspending further shipments of its new X25-M - SATA 2.5" MLC flash SSD due to a serious design problem.
Potential customers were advised that shipments will resume after what is euphemistically called - a "firmware upgrade".
This isn't the 1st time that Intel has shipped a flaky SSD, and it's not the 1st time that a flash SSD manufacturer has shipped products where the design was incompletely verified (or specified) in the 1st place - requiring a frantic firmware upgrade to make its operational use more satisfactory.
And it won't be the last either. These stories have become commonplace. And because the latest Intel fiasco wasn't a surprise - it didn't rate more than a footnote in these pages.
Newcomers to StorageSearch.com may be shocked that by the frequency with which the storage market's reputation is being splatted by the residue from so many unreliable new products. And by that I mean - products whose operation you cannot rely on to be what you reasonable expected - instead of the narrower meaning of "products which are dead on arrival" - or which terminally cease operation due to some form of wear-out, environmental or age related process.
I named storage reliability as 1 of the 3 most important future trends in my state of the storage market article published in 2005. In that article I also predicted that uncorrectable failures in storage systems (due to embedded design assumptions made in earlier generations) could, if not dealt with by drive and interface designers, pose a more serious threat to enterprise computer systems than the Y2K bug in the late 1990s.
It's reasonable to ask - why has the flash SSD market gained such a poor reputation?
I explained why users shouldn't trust published flash SSD benchmarks in an article published a year ago - in which I discussed the technical environment and specific reasons related to performance. But it's clear from readers emails that many concerned SSD consumers don't have the technical background to understand much of the content in this and the more detailed articles comparing MLC and SLC, RAM and flash SSDs etc. I often have to explain that when it comes to SSDs - StorageSearch.com isn't aimed at the consumer market - but at enterprise users and oem specifiers of SSD technology who - in the course of their vendor qualification - invest a lot of time and resources into learning about SSDs - before making a commitment which could cost them their jobs or business if the choice turns out to be wrong. When we started our intense SSD coverage more than 10 years ago, a typical flash SSD cost $50K and a typical rackmount cost an order of magnitude more.
Why do you have to be so "extremely" careful to understand the internals of a flash SSD?
You probably don't - if all you're doing is buying a single notebook SSD for a single notebook. (Treading "very" carefully - will suffice.) You definitely do need to exercise extreme caution- if you're the person designing SSDs into a new notebook or new SSD storage array, IPTV server, voicemail server, defense appliance or search-engine architecture. That's only a small part of the spectrum of reader questions I get asked about related to SSDs - and why I try to avoid giving simplistic answers. Until now...
Here's a simplistic explanation of why you have to be careful about understanding flash SSD technology before you deploy it in serious apps, and why SSD vendors will continue shipping flaky SSDs and then recalling them or fixing them after the sale for several more years...
Flash SSDs are solid state - but different to processors because... When Intel or AMD or Sun design a new processor - they run test suites on the new design - which encapsulate market knowledge amassed during more than 20 years. These verification suites simulate the kinds of loads which are common and uncommon in the market. When a new processor emerges into the market - the design is already compatible with what the market expects - because the test suite defines the product. Similar test suites have been developed by all hard drive manufacturers - which include a knowledge awareness of all the tricky quirky ways that operating systems and applications are going to hit the hard drive - and how it is expected to respond.
There are no such industry test suites for flash SSDs. That's because a flash SSD can look like a disk drive in some contexts - and it can look like a processor accelerator in others. The range of applications for SSDs is much greater than for hard drives - and SSDs have many different ways they can implement the same features depending on which market they were designed for. One reason why the flash SSD market has been going wrong with so many new products is that most consumer SSD manufacturers have simply re-used their hard drive test suites to validate their new SSDs. That has put pressure on designers to tweak the SSD controllers to make their products look good in benchmarks (and look more saleable). But the test suites don't test the weaknesses of the SSDs - only the strengths.
There are 2 general exceptions.
Flash SSDs designed specifically for the industrial oem market tend to be better designed - because these vendors know their products will be hammered by lengthy customer evaluations before being deployed. Many industrial flash SSD oems are used to testing their products with the entire code of their customers' embedded products. Over many years they have built up experience of the weak parts of their products - and either adapted the firmware in their SSDs - or suggested ways in which their customers can change their software to work better with their SSDs.
Rackmount flash SSD arrays designed for enterprise server acceleration include a span of products and companies - whose applications experience ranges from nil to decades. But I can offer a simple solution to shortlisting an SSD supplier here.
In the world's first comprehensive survey of What SSD Users Want - instigated by StorageSearch.com in 2004 - we posed the question - "What would make it easier for you to buy SSD technology and remove doubts and risks which currently act as roadblocks?" - The top 2 factors quoted in replies were performance guarantees and try before you buy. Some enterprise SSD oems seeing this market feedback were quick to adapt these concepts to the way they did business.
My advice? - If you're planning to make a big enterprise SSD purchase - tell suppliers that you'll only consider their products if they offer you a money back performance guarantee (which they can easily do if they have enough experience with your type of application) or ask them if they will let you "try before you buy" - (if your application environment is unusual and outside the scope of their speedup models).
How can consumers and SMBs navigate around SSD landmines?
If you're a consumer or small business looking at a modest spend on flash SSDs it's probably unrealistic for you to invest the resources to learn about this technology and safely qualify products for yourself. As I've already said above - you can't trust magazine reviews either. They should just be regarded as an indication of what is possible - rather than a guarantee of what you'll see in practice.
My advice is - talk to a specialist SSD reseller.
I know of less than 10 SSD VARs worldwide who have been focusing exclusively on SSDs for consumers as their primary business for many years - but I'm not an expert on VARs. There may be more.
It's counter productive for SSD VARs to recommend products which are difficult to get hold of - or which have high return rates. Tell them what is important to you - and ask what they recommend. Ask them how long they've been in the SSD market too. If it's less than 2 years - go somewhere else. One way you can independently verify if what they say is true - is checking out web references to their SSD activities - including for example their website listings in past years in the Internet Archive.
Will it get easier to navigate the SSD market in future?
Yes. Sure. But that could be another 5 years in the future. I think the SSD market will get a lot more complicated and confusing before it gets any simpler.
To help you understand what's going on and see the future clearly I hope you'll come back to StorageSearch.com as we continue our long term mission - of "leading the way to the new storage frontier".