This article will tell you how important it is to improve the reliability of flash.
Flash storage is already ubiquitous in our real world, such as in smartphones, laptops, and servers related to various cloud applications. Although flash technology is pervasive, most of us don't still realize that flash technology is not inherently a reliable storage medium. In fact, the lifetime of a flash cell is limited, which means that a strong wear-leveling technique is needed to achieve better performance.
The good news is that the wear-leveling technology in modern flash controllers has made great strides in overcoming the inherent weaknesses of flash storage media and exploiting its advantages. For modern flash systems, the choice of flash controller is more important than the flash memory itself, because the system's durability and reliability can be improved by choosing the right flash controller for a specific application.
This is a great advantage for end users and device manufacturers, as a growing number of low-cost, high-capacity MLCs (multi-level cells)can be used in more critical applications with the right high-quality controllers.
Challenge of flash reliability
Since almost all of the electronic devices we are exposed to today are using flash, it's easy to forget that the technology itself is a very critical medium and also faces many reliability challenges.
Although flash cells can be read nearly infinitely, they are programmed or erased (P/E) for a limited number of times. The durability of being programmed or erased depends on the type of flash. Generally speaking, most NAND flash devices such as SSD or eMMC use commercial MLC flash with only thousands of programming and erasing cycles per cell.
To make matters worse, the processes of writing to flash will certainly be more, even though there are not too many problems in reading. Flash can be written at the page in kilobytes. The page must be kept empty until the data is properly written. Unfortunately, flash can only be erased one block at a time, while its size is megabytes. Therefore, it is necessary to first erase the large block of page before writing the flash. If updating a flash cell, you need to update all the cells in the block, resulting in a shorter overall life. This is often referred to as Write Amplification.
Wear-leveling techniques, targeting at evenly distributing wear on the drive to maximize system durability, must be used in all flash storage devices to reduce the wear of flash cells. Temporary buffers in DRAM, SRAM, or unused flash cells can be used to track where the drive will be written next and where needs to be erased.
Another major issue with flash drives is power failure protection. The temporary buffer contains information such as the data that the drive should write next and the old location that must be erased, which are stored in volatile memory. In this case, a sudden power failure will cause the buffer to be erased, leading to a catastrophic loss of drive data.
As the size of lithography processes decreases, and the density and performance of flash increase, the last issue that affects flash reliability is the increasing number of errors. The original flash drive used single-level cell (SLC) flash, where each cell stores one byte, but modern flash drive use typically MLC/TLC flash, where each cell stores multiple bytes. Each physical cell supports more bytes to increase storage density, but it reduces the threshold between the on/off states of each byte. This not only increases the bit error rate, but also reduces the lifetime. As the size of the lithography process decreases, the flash density will increase further and the error rate will increase.
Advanced controller technology
While flash storage reliability faces these challenges, we can still use it for everyday consumer, commercial, and even mission-critical applications, largely thanks to advanced flash controller technology. These controllers combine advanced technologies in wear leveling, power failure management and error correction to enable us to safely and reliably use current high-density flash.
Wear leveling
Flash Translation Layer (FTL) is one of the most important aspects of flash controllers. The SSD can be wear-leveling by converting the host's logical address to flash’s physical address. For example, if the host system updates data at the same address, FTL will translate the logical address to a new physical address to evenly distribute wear on the flash drive for maximum durability.
The mapping granularity from logical to physical addresses in FTL has a significant impact on performance and durability. Simpler flash media such as consumer USB and SD card, use block-based mapping to perform mapping at the block (in size of megabytes). Wear leveling occurs at the block level, and since each logical page is simply mapped to a fixed physical page, there is no optimization occurs at the page.
Since the size of the block is the smallest size of the erase operation, this mapping is very simple and affordable to implement. However, this simple method results in a large amount of write amplification and shortens the life of the device.
Page-based mapping is typically used for modern SSDs, which maps more granular logical data pages (in kilobytes) to