How Can We Achieve the Lowest Failure Rate in NAND Flash-based Systems?
How can we achieve the lowest failure rate in NAND Flash-based system? You may have already discussed with the engineering team or memory system suppliers. What measures are being taken to ensure that your quality solution can not only effectively correct the inevitable mistakes, but also establish powerful system architecture to avoid the mistakes in the beginning?
As the process geometry size of NAND flash is shortened, the bit error rate is increasing, thus resulting in the low system failure rate. Everybody who knows about SD card, USB flash drive and other NAND flash-based basic solution knowledge realizes that the key component to control lowest failure rate is NAND flash controller. You may have been familiar with this component and discussed the error correction code (ECC) strength. Have you ever imaged what is happening in this small package? What does flash controller do to avoid failure? ECC is a unit of a different set of building blocks. The system is well designed with reliability and error proofing interleaving throughout the process array, including ECC for unavoidable bit errors. If you want to impress your boss and provide more valuable things, you should continue to read because we will explain the powerful functions of flash controller.
Even before system is assembled, there is an important planning standard to enter flash certification internally or via the system integrator. In other words, flash controller matches the proper flash strategy. So what does the qualification means? Qualification not only means that controllers will use selected flash. Above all, it means tests which are not just a few. In Hyperstone, we ensure the combination has been tested. The first is to characterize the flash memory itself. Characterization is finished by extensively testing NAND flash in all life cycle phases for different usage scenarios. This knowledge is conductive to help right design to correct false unit and extract the logarithmic likelihood ratio (LLR) table for soft decoding used for error correction, thus realizing most effective overall error recovery process.
In planning and design, most of companies often discuss some flash memories related to cost, but many people just forget to consider the performance of flash memory because of their system architecture, environment and exposed use cases. Each of plan needs special approach, correction and recovery options to achieve the best outcome. This characterization is very significant because the data collected can use the most precise and effective ways to validate the tool. The complex and deliberated qualification is the foundation of robust and stable system. For the demanding systems, it is worthwhile to question and discuss the process of qualification with system integrator. Possibly, if you design your solution internally for more flexibility, consult the controller company directly. While reliable authentication sets up successful systems, calibration and controller functions, such as read interference management, wear balancing, and dynamic data refresh, are more like direct error prevention.
The effective calibration process can keep low bit error rate in the whole lifetimes of the component while it dynamically adapts to the change of threshold voltage in the memory cell. There is much interference affecting threshold voltage of the battery, including programming and erasing cycles, reading interference, data retention temperature changes, etc. Flash memory doesn’t automatically follow the change of threshold. On the contrary, flash controller can decide when to calibrate and perform the proper sequence of operations.
The discussed below, calibration has changed the reference voltage of battery. Because different blocks or pages may experience different disturbances, the best calibration of one page may not be applicable to another.
In addition, the error prevention mechanisms work together, such as WL, RDM, ECC and DDR, to manage the efficient and reliable transfer of data to flash memory. Wear equalization can ensure that all blocks in a flash or storage system are close to their defined erase cycle budget at the same time, rather than some blocks that have previously approached it. Reading interference management computes all reading operations on flash. If a threshold value is reached, the surrounding area is refreshed. All wrong data will be read by the ECC refresh application exceeds the configured error threshold while the dynamic data refresh scan reads all data and identifies the error status of all blocks as a background operation. A refresh operation is triggered if a specific threshold error for each block or ECC cell is exceeded in this scan read. These functions are usually named in different ways by different controller company and finally with the logic behind the algorithm as the goal. At the same time, for a common goal, people should establish close relationships with their controller in different ways in order to understand how these features and qualified flash work together.
Finally, error correction has become one of the most famous and important tasks in flash controllers and error prevention should shoulder more weight in terms of its value. The complexity and intensity of error correction ultimately make it the most valuable mechanism of cake controller. When considering area and power limits, error correction coding becomes more and more difficult. As the need for error correction capability increases, old code can no longer provide the required corrective performance based on the limited spare areas available in the latest flash.
In order to provide the best solution, Hyperstone developed its own error correction engine which is a kind of hard and soft error correction module based on generalized cascade code. The great advantage this code construct provides is in one particular area that the number of correctable errors per code word can be analytically determined, which means that for each code word, error correction can ensure a certain degree of correction performance. For all available flash memory, guaranteed bit error rates are specified to ensure reliable operation within the specified parameters.
When the data is read back from the flash and passed to the error correction module, the bit errors are determined based only on redundant information added to the code word. Using this information alone means that it can also be true or false for each bit. Consider probabilities using soft information, which indicates the probability that the received bit is the received bit or whether it is another value in which the bit can be "zero" or "one". These probabilities are taken from the logarithmic likelihood table (LLR) tables that have been generated and stored in the lookup tables in the controller. With this information, error correction now has more input: for each individual bit, the probability information now indicates the probability that the bit is received, for example, zero means 74 percent confidence and the original value of zero. Error correction has clear instructions about which bits are likely to be wrong and which bits are unlikely to be wrong. This additional information significantly increases the ability to correct errors.
Flash controllers are a key component in ensuring reliable and secure processing of flash memory. They deal with a range of functions designed to efficiently manage data transfers on flash and not only correct errors but also prevent errors. However, these features are designed in different ways and depending on the company's business model and focus, your controller can be minimal. At Hyperstone, we call it high-quality functions, mechanisms and complex processes designed to improve the durability of the ecosystem and thus lifting the reliability of flash memory.
Related Articles:
How to exploit IO performance of NAND Flash
Explain the principle and use of NAND Flash with examples (4)
YMTC makes progress in NAND Flash production