DOI QR코드

DOI QR Code

Locality-Conscious Nested-Loops Parallelization

  • Parsa, Saeed (School of Computer Engineering, Iran University of Science and Technology) ;
  • Hamzei, Mohammad (School of Computer Engineering, Iran University of Science and Technology)
  • Received : 2013.03.20
  • Accepted : 2013.09.06
  • Published : 2014.02.01

Abstract

To speed up data-intensive programs, two complementary techniques, namely nested loops parallelization and data locality optimization, should be considered. Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor. Therefore, locality and parallelization may demand different loop transformations. As such, an integrated approach that combines these two can generate much better results than each individual approach. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate loop transformation. Applying this transformation results in coarse grain parallelism through exploiting the largest possible groups of outer permutable loops in addition to data locality through dependence satisfaction at inner loops. These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions.

Keywords

References

  1. G. Chen and M. Kandemir, "Compiler-Directed Code Restructuring for Improving Performance of MPSoCs," IEEE Trans. Parallel Distrib. Syst., vol. 19, no. 9, Sept. 2008, pp. 1201- 1214. https://doi.org/10.1109/TPDS.2007.70760
  2. U. Bondhugula et al., "A Practical Automatic Polyhedral Parallelizer and Locality Optimizer," ACM SIGPLAN Notices, vol. 43, no. 6, 2008, pp. 101-113.
  3. L.-N. Pouchet et al., "Iterative Optimization in the Polyhedral Model: Part II, Multidimensional Time," ACM SIGPLAN Notices, vol. 43, no. June 6, 2008, pp. 90-100. https://doi.org/10.1145/1379022.1375594
  4. M. Griebl, P. Faber, and C. Lengauer, "Space-Time Mapping and Tiling: A Helpful Combination," Concurrency Comput., Practice Experience, vol. 16, no. 2/3, Jan. 2004, pp. 221-246. https://doi.org/10.1002/cpe.772
  5. S. Lotfi and S. Parsa, "Parallel Loop Generation and Scheduling," J. Supercomput., vol. 50, no. 3, 2009, pp. 289-306. https://doi.org/10.1007/s11227-008-0262-5
  6. M.E. Wolf and M.S. Lam, "A Data Locality Optimizing Algorithm," ACM SIGPLAN Notices, vol. 26, no. 6, 1991, pp. 30- 44. https://doi.org/10.1145/113446.113449
  7. Y. Song and Z. Li, "New Tiling Techniques to Improve Cache Temporal Locality," ACM SIGPLAN Notices, vol. 34, no. 5, 1999, pp. 215-228. https://doi.org/10.1145/301631.301668
  8. J. Xue and C-H. Huang, "Reuse-Driven Tiling for Improving Data Locality," Int. J. Parallel Programming, vol. 26, no. 6, 1998, pp. 671-696. https://doi.org/10.1023/A:1018734612524
  9. J. Ramanujam and P. Sadayappan, "Tiling Multidimensional Iteration Spaces for Multicomputers," J. Parallel Distrib. Comput., vol. 16, no. 2, Oct. 1992, pp. 108-120. https://doi.org/10.1016/0743-7315(92)90027-K
  10. M.E. Wolf and M.S. Lam, "A Loop Transformation Theory and an Algorithm to Maximize Parallelism," IEEE Trans. Parallel Distrib. Syst., vol. 2, no. 4, Oct. 1991, pp. 452-471. https://doi.org/10.1109/71.97902
  11. A.W. Lim, G.I. Cheong, and M.S. Lam, "An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication," Proc. 13th Int. Conf. Supercomput., Rhodes, Greece, June 20-25, 1999, pp. 228-237.
  12. P. Feautrier, "Some Efficient Solutions to the Affine Scheduling Problem, Part II: Multidimensional Time," Int. J. Parallel Programming, vol. 21, no. 6, 1992, pp. 389-420. https://doi.org/10.1007/BF01379404
  13. O. Ozturk, "Data Locality and Parallelism Optimization Using a Constraint-Based Approach," J. Parallel Distrib. Comput., vol. 71, no. 2, Feb. 2011, pp. 280-287. https://doi.org/10.1016/j.jpdc.2010.08.005
  14. A. Cohen, S. Girbal, and O. Temam, "A Polyhedral Approach to Ease the Composition of Program Transformations," Euro-Par Parallel Process., 2004, pp. 292-303.
  15. L.-N. Pouchet, Iterative Optimization in the Polyhedral Model, doctoral dissertation, University of Paris-Sud XI, France, 2010.
  16. C. Bastoul, Extracting Polyhedral Representation from High Level Languages, Technical report, Paris-Sud University, 2008.
  17. C. Bastoul, "Efficient Code Generation for Automatic Parallelization and Optimization," Proc. 2nd Int. Symp. Parallel Distrib. Comput., Oct. 13-14, 2003, pp. 23-30.

Cited by

  1. Subspace Projection-Based Clustering and Temporal ACRs Mining on MapReduce for Direct Marketing Service vol.37, pp.2, 2015, https://doi.org/10.4218/etrij.15.2314.0068