DOI QR코드

DOI QR Code

Acceleration of LU-SGS Code on Latest Microprocessors Considering the Increase of Level 2 Cache Hit-Rate

최신 마이크로프로세서에서 2차 캐쉬 적중률 증가를 고려한 LU-SGS 코드의 가속

  • 최정열 (부산대학교 항공우주공학과) ;
  • 오세종 (부산대학교 항공우주공학과)
  • Published : 2002.10.01

Abstract

An approach for composing a performance optimized computational code is suggested for latest microprocessors. The concept of the code optimization, called here as localization, is maximizing the utilization of the second level cache that is common to all the latest computer system, and minimizing the access to system main memory. In this study, the localized optimization of LU-SGS (Lower-Upper Symmetric Gauss-Seidel) code for the solution of fluid dynamic equations was carried out in three different levels and tested for several different microprocessor architectures most widely used in these days. The test results of localized optimization showed a remarkable performance gain up to 7.35 times faster solution, depending on the system, than the baseline algorithm for producing exactly the same solution on the same computer system.

최신 마이크로프로세서에서 성능 최적화된 수치 코드를 작성하는 접근법을 제시하였다. 국소화로 지칭한 이 코드 최적화 방법은 모든 최신 마이크로프로세서에서 채용되는 2차 캐시의 사용을 최대화하고 시스템의 주기억장치에 대한 접근을 최소화하는 개념이다. 본 연구에서는 유체역학 문제의 해결을 위한 LU-SGS 해법을 3 단계에 걸쳐 국소화 하였으며, 요즈음 널리 이용되고 있는 여러 가지 마이크로프로세서들에 대해 시험을 수행하였다. 국소 최적화 개념의 시험 결과는, 컴퓨터 시스템에 따라서는, 같은 컴퓨터에서 완전히 동일한 해를 기본 알고리즘에 비해 7.35배까지 빨리 얻을 수 있는 주목할 만한 성능을 보여 주었다.

Keywords

References

  1. Moore, G.E., "Cramming more components onto integrated circuits," Electronics, Vol.38, No. 8, April 19, 1965, http://www.intel.com/research/silicon /mooreslaw.htm.
  2. Crandall, R.E., "PowerPC G4 for Engineering, Science, and Education," Apple Computer, Inc., Oct. 2000, http://www.apple .com/powermac/ pdf/PowerPC-G4velocityengine.pdf.
  3. Tendler, J.M., Dodson, S., Fields, S., Le, H. and Sinharoy, B., "Power 4 System Micro architecture," IBM Corp., Oct. 2001, http:// www-l.ibm.com/servers/eserver/pseries/hardw are/whitepapers/power4.pdf.
  4. Intel Corp., "The Xeon Processor MP Product Overview," Intel Corp., http://www .intel.com/ design/ Xeon/ xeonmp/prodbref/inde x.htm.
  5. Johnson, J.J., "The AMD-$760^{TM}$ MPX Platform for the AMD -$Athlon^{TM}$ MP Processor," White Paper PID# 25787A, AMD Inc., Jan. 2002, http://www.amd.com/us-en/Processors/ Productlnformation/0?30_118_756_809,00.html.
  6. Schreiber, R. and Dongarra, J., "Automatic Blocking of Nested Loops," University of Tennessee Computer Science Technical Report, CS-90-108, May 1990, http://www.netHb.org /utk/people/ JackDongarra/pdf/autoblock.pdf.
  7. Dongarra, J. J., Du Croz, J., Duff, I. S. and Hammarling, S., "A Set of Level 3 Basic Linear Algebra Subprograms", ACM Trans. Math. Soft, 16 (1990), pp. 1-17, http://www.netlib.org /bias/index.html. https://doi.org/10.1145/77626.79170
  8. Intel Corp., "$Intel^{\circled R}$ 850 Chipset: 82850 Memory Controller Hub (MCH) Datasheet," Intel Document Number 290691-001, Nov. 2000, http://www.intel.com/design/chipsets/850/.
  9. Intel Corp., "$Intel{\circled R}$ 845 Chipset: 82845 Memory Controller Hub (MCH) for SDR Datasheet," Intel Document Number 290725-002, Jan. 2002., http://www.intel.com/design/chip sets/845/.
  10. Intel Architecture Optimization Reference Manual, Intel Corp., 1998-1999, http://developer .intel.com.
  11. Intel Pentium 4 and Xeon Processor Optimization Reference Manual, Intel Corp., 1999-2001, http://developer.intel.com.
  12. http://www.netlib.org/atlas/index.html.
  13. Anderson, E., et al., LAPACK Users' Guide Third Edition, SIAM 1999, Philadelphia, PA, http://www.netlib.org/lapack/index.html.
  14. Patankar, S.V., Numerical Heat Transfer and Fluid Plow, Hemisphere, 1980.
  15. Yoon, S. and Jameson, A., "Lower-Upper Symmetric-Gauss-Seidel Method for the Euler and navier-Stokes Equations," AIAA Journal, Vol.26, No. 9, 1988, pp.1025-1026. https://doi.org/10.2514/3.10007
  16. Choi, J.-Y., Jeung, I.-S. and Yoon, Y., "Computational Fluid Dynamics Algorithms for Unsteady Shock-Induced Combustion, Part 1: Validation," AIAA Journal, Vol. 38, No. 7, July 2000, pp.1179-1187. https://doi.org/10.2514/2.1112
  17. http://www.polyhedron.co.uk.

Cited by

  1. Optimization of LU-SGS Code for the Acceleration on the Modern Microprocessors vol.14, pp.2, 2013, https://doi.org/10.5139/IJASS.2013.14.2.112