Exploiting Computation Reuse for Stencil Accelerators

Yuze Chi; Jason Cong

doi:10.1109/dac18072.2020.9218680

Exploiting Computation Reuse for Stencil Accelerators

Proc Des Autom Conf. 2020 Jul:2020:10.1109/dac18072.2020.9218680. doi: 10.1109/dac18072.2020.9218680. Epub 2020 Oct 9.

Authors

Yuze Chi¹, Jason Cong¹

Affiliation

¹ University of California, Los Angeles.

Abstract

Stencil kernel is an important type of kernel used extensively in many application domains. Over the years, researchers have been studying the optimizations on parallelization, communication reuse, and computation reuse for various target platforms. However, challenges still exist, especially on the computation reuse problem for accelerators, due to the lack of complete design-space exploration and effective design-space pruning. In this paper, we present solutions to the above challenges for a wide range of stencil kernels (i.e., stencil with reduction operations), where the computation reuse patterns are extremely flexible due to the commutative and associative properties. We formally define the complete design space, based on which we present a provably optimal dynamic programming algorithm and a heuristic beam search algorithm that provides near-optimal solutions under an architecture-aware model. Experimental results show that for synthesizing stencil kernels to FPGAs, compared with state-of-the-art stencil compiler without computation reuse capability, our proposed algorithm can reduce the look-up table (LUT) and digital signal processor (DSP) usage by 58.1% and 54.6% on average respectively, which leads to an average speedup of 2.3× for compute-intensive kernels, outperforming the latest CPU/GPU results.

Grants and funding

U01 MH117079/MH/NIMH NIH HHS/United States