Rajkishore Barik, Bit-sensitivity and Efficiency of Register Allocation
For embedded applications, the power consumption of register files makes it desirable for embedded processors to use as few registers as possible. Past research has indicated the potential for leveraging bitwidth analysis and bitwidth-aware register allocation to reduce register usage in embedded applications. We make the following contributions in evaluating and enhancing bitwidth-aware register allocation for embedded applications. First, we compare the Tallam-Gupta [POPL'03] bitwidth analysis with an idealized limit study, and show significant opportunities for enhancements. Second, we show how bitwidth-aware register allocation can be enhanced by enhanced bitwidth analysis for scalar and array variables, and also by enhanced coalescing of variables. Third, we use our prototype implementation of bitwidth-aware register allocation in GCC to compare the number of dynamic spill load/store instructions resulting from a) bitwidth-unaware allocation, b) bitwidth-aware allocation, c) enhanced bitwidth-aware allocation, and d) ideal profile-driven bitwidth-aware allocation. Our results show that our enhancements can reduce the number of dynamic spill load/store instructions to between 3% and 27% of the number obtained from the Tallam-Gupta algorithm. These results are also applicable to reconfigurable multicore processors that allow software to configure the number and bitwidth of registers in different phases for power efficiency.
In the second part of the talk we present recent extensions to the Poletto-Sarkar Linear Scan register allocation algorithm [TOPLAS'99]. We propose two Extended Linear Scan (ELS) algorithms that retain the compile-time efficiency of past Linear Scan algorithms at the same time delivering performance that can match or surpass that of Graph Coloring. Specifically, we highlight three fundamental theoretical limitations in using Graph Coloring as a foundation for global register allocation, and introduce a basic Extended Linear Scan algorithm, ELS_0, which addresses all three limitations for the problem of Spill-Free Register Allocation. We introduce the ELS_1 algorithm which extends ELS_0 to obtain a greedy algorithm for the problem of Register Allocation with Total Spills. Finally, we present experimental results to compare the Graph Coloring and Extended Linear Scan algorithms. Our results show that the space and time used by ELS_1 is significantly smaller than those used by GC --- the compile-time speedups for ELS_1 relative to GC varied from 15 to 68 times. In addition, the resulting execution time improved by up to 5.8% for ELS_1 relative to GC, with an average improvement of 2.3%.