Start Over

Loop Invariant Code Motion Algorithm for Deep Learning Operators.

Authors :: LIANG Jiali
HUA Baojian
LYU Yashuai
SU Zhenyu
Source :: Journal of Frontiers of Computer Science & Technology; Jan2023, Vol. 17 Issue 1, p127-139, 13p
Publication Year :: 2023
Abstract: TVM (tensor virtual machine) is a deep learning compiler, that translates the deep learning operators described by tensor expression to TVM IR (TVM intermediate representation) programs. After a series of operatorlevel optimizations on TVM IR, TVM generates the target code across diverse hardware back-ends. Tensor expression, a domain- specific language for tensor computation, performs loop transformation to operators. The result of loop transformation is a number of complicated expressions emerging in nested loop statements, which contain loop invariant code. However, in the context of deep learning applications, the traditional loop invariant code motion algorithm has severe limitations. Firstly, it's difficult to determine the extra-benefit of moving certain invariant code out of loops. Secondly, it's difficult to detect loop invariant code which has different orders of operands. Thirdly, it cannot process nested condition expressions well. Furthermore, there are conflicts with target hardware compiler optimizations. The application of loop invariant code motion technique is constrained by the aforementioned problems. In this paper, a new loop invariant code motion algorithm is proposed, which takes deep learning application characteristics into consideration in a heuristics way. The algorithm normalizes the program by manipulating the expression operands and simplifying the nested condition expression. This paper introduces a new cost model, which evaluates the cost of moving certain loop invariant code while the characteristics of TVM IR and target hardware back-ends are fully considered. The algorithm is implemented as a registered TVM pass on opensource compiler TVM version 0.7. To testify the effectiveness and correctness of this algorithm, experiments are conducted on TVM TOPI benchmark with 27 operators and 511 test cases under different input. Experimental results show that this algorithm improves 47.6% of operators'performance, and achieves speedups up to 40.0%. [ABSTRACT FROM AUTHOR]