Back to Search Start Over

An Efficient Bottleneck Planes Exclusion Method for Reconfiguring 3D VLSI Arrays

Authors :
Qian, Junyan
Qiu, Kunzhu
Ding, Hao
Zhang, Huimin
Zhai, Zhongyi
Source :
IEEE Transactions on Parallel and Distributed Systems; February 2024, Vol. 35 Issue: 2 p250-263, 14p
Publication Year :
2024

Abstract

With the ever-increasing integration and parallel computing capabilities of 3D processor arrays, the occurrence of processor elements (PEs) failures caused by various factors has become more prevalent. Therefore, the implementation of a fault-tolerant mechanism that uses the remaining fault-free PEs to reconfigure sub-array becomes critical. In this paper, we study the problem of reconfiguring a 3D subarray with as many fault-free PEs as possible, which has been shown to be NP-complete in previous work. Although prior algorithms have been effective under low fault densities, they are severely limited when faced with high fault densities. To address this, we first define the bottleneck of the 3D processor array, proposed a novel method to identify the physical bottleneck plane that restricts the reconfigurable size of the logical sub-array and prove its correctness. Then, we propose an effective compensation strategy that can fully utilize the fault-free PEs in the bottleneck plane. Under this strategy, a sliding-window weight calculation method is proposed to determine the priority of compensation. Finally, we proposed a heuristic algorithm, which can construct the maximum target array from different dimensions in polynomial time. Experimental results demonstrate that the proposed algorithm exhibits favorable performance in terms of harvest and degradation. For the random-failure model, the improvement in the harvest for fault-free PEs is up to 32.03% on a <inline-formula><tex-math notation="LaTeX">$32 \times 32 \times 32$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>32</mml:mn><mml:mo>×</mml:mo><mml:mn>32</mml:mn><mml:mo>×</mml:mo><mml:mn>32</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="ding-ieq1-3339961.gif"/></alternatives></inline-formula> host array with a 20% fault density. And for the clustered fault model, the improvement in harvest is up to 70.63% on a <inline-formula><tex-math notation="LaTeX">$32 \times 32 \times 32$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>32</mml:mn><mml:mo>×</mml:mo><mml:mn>32</mml:mn><mml:mo>×</mml:mo><mml:mn>32</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="ding-ieq2-3339961.gif"/></alternatives></inline-formula> host array distributed with 12 cluster failures of size <inline-formula><tex-math notation="LaTeX">$6 \times 6 \times 6$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>6</mml:mn><mml:mo>×</mml:mo><mml:mn>6</mml:mn><mml:mo>×</mml:mo><mml:mn>6</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="ding-ieq3-3339961.gif"/></alternatives></inline-formula>.

Details

Language :
English
ISSN :
10459219 and 15582183
Volume :
35
Issue :
2
Database :
Supplemental Index
Journal :
IEEE Transactions on Parallel and Distributed Systems
Publication Type :
Periodical
Accession number :
ejs64995160
Full Text :
https://doi.org/10.1109/TPDS.2023.3339961