TY - GEN
T1 - On the criticality of caches in fault-tolerant processors for space
AU - Di Mascio, Stefano
AU - Menicucci, Alessandra
AU - Gill, Eberhard
AU - Furano, Gianluca
AU - Monteleone, Claudio
PY - 2019/10/1
Y1 - 2019/10/1
N2 - This paper analyzes the contribution of caches to failures at processor level due to soft errors. In order to do this, approximated methodologies to estimate the percentage of the total Sensitive Area (SA) of a processor for each unit during early design exploration are proposed. Then, to identify the most vulnerable units, a metric called Relative Soft Error Vulnerability (RSEV) is defined. The analysis shows that caches are the most vulnerable units of state-of-the-art processors and that, even when considering higher-frequency and more complex pipelines representative of next-generation processors for space applications, the final in-orbit failure rate is dominated by failures caused by upsets in cache arrays. Even when protecting memory arrays with information redundancy, the large fraction of upsets occurring in caches is potentially the biggest threat to processor availability and reliability, especially if errors are modelled with invalid assumptions and are not properly handled when detected.
AB - This paper analyzes the contribution of caches to failures at processor level due to soft errors. In order to do this, approximated methodologies to estimate the percentage of the total Sensitive Area (SA) of a processor for each unit during early design exploration are proposed. Then, to identify the most vulnerable units, a metric called Relative Soft Error Vulnerability (RSEV) is defined. The analysis shows that caches are the most vulnerable units of state-of-the-art processors and that, even when considering higher-frequency and more complex pipelines representative of next-generation processors for space applications, the final in-orbit failure rate is dominated by failures caused by upsets in cache arrays. Even when protecting memory arrays with information redundancy, the large fraction of upsets occurring in caches is potentially the biggest threat to processor availability and reliability, especially if errors are modelled with invalid assumptions and are not properly handled when detected.
UR - http://www.scopus.com/inward/record.url?scp=85074386145&partnerID=8YFLogxK
U2 - 10.1109/DFT.2019.8875424
DO - 10.1109/DFT.2019.8875424
M3 - Conference contribution
T3 - 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2019
BT - 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2019
PB - IEEE
T2 - 32nd IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2019
Y2 - 2 October 2019 through 4 October 2019
ER -