DOI QR코드

DOI QR Code

Instructions and Data Prefetch Mechanism using Displacement History Buffer

변위 히스토리 버퍼를 이용한 명령어 및 데이터 프리페치 기법

  • Jeong, Yong Su (Department of Electronic Engineering, Inha University) ;
  • Kim, JinHyuk (Department of Electronic Engineering, Inha University) ;
  • Cho, Tae Hwan (Department of Electronic Engineering, Inha University) ;
  • Choi, SangBang (Department of Electronic Engineering, Inha University)
  • Received : 2015.07.01
  • Accepted : 2015.09.24
  • Published : 2015.10.25

Abstract

In this paper, we propose hardware prefetch mechanism with an efficient cache replacement policy by giving priority to the trigger block in which a spatial region and producing a spatial region by using the displacement field. It could be taken into account the sequence of the program since a history is based on the trigger block of history record, and it could be quickly prefetching the instructions or data address by adding a stored value to the trigger address and displacement field since a history is stored as a displacement value. Also, we proposed a method of replacing at random by the cache replacement policy from the low priority block when the cache area is full after giving priority to the trigger block. We analyzed using the memory simulator program gem5 and PARSEC benchmark to assess the performance of the hardware prefetcher. As a result, compared to the existing hardware prefecture to generate the spatial region using a bit vector, L1 data cache miss rate was reduced about 44.5% on average and an average of 26.1% of L1 instruction misses occur. In addition, IPC (Instruction Per Cycle) showed an improvement of about 23.7% on average.

본 논문에서는 변위 필드를 이용해 히스토리 레코드를 생성하는 방법과 히스토리 레코드의 기준이 되는 트리거 블록에 우선 순위를 부여하여 효율적인 캐시 교체를 가능하게 하는 하드웨어 프리페치 기법을 제안한다. 히스토리 레코드의 트리거 블록을 기준으로 히스토리를 생성하기 때문에 프로그램의 시퀀스를 고려할 수 있으며, 히스토리를 변위 값으로 저장하기 때문에 트리거 주소와 변위필드에 저장된 값을 더해 빠르게 명령어 또는 데이터 주소를 프리페치 할 수 있다. 또한, 트리거 블록에 우선순위를 부여하고 캐시 교체 정책으로 랜덤 교체 방법을 사용해 캐시 공간이 가득 찼을 때 우선순위가 낮은 블록부터 랜덤하게 교체하는 방법을 제안한다. 제안하는 하드웨어 프리페처의 성능을 평가하기 위해 메모리 분석 시뮬레이터인 gem5와 PARSEC 벤치마크 프로그램을 사용하였다. 그 결과 비트벡터를 이용해 공간영역을 생성하는 기존의 하드웨어 프리페처와 비교해 L1 데이터 캐시의 미스율은 평균 약 44.5% 감소하였고 L1 명령어 캐시의 미스율은 평균 약 31% 감소하였다. 또한 IPC (Instruction Per Cycle)는 평균 약 23.7% 향상을 보였다.

Keywords

References

  1. E. H. Gornish, E. D. Granston, and A. V. Veidenbaum, "Compiler-directed data prefetching in multiprocessors with memory hierarchies," In Proceedings of the 4th International Conference on Supercomputing, pp. 354-368, Jun. 1990.
  2. C. H. Yu, K. H. Kim, S. B. Choi, "Design of Serial Decimal Multiplier using Simultaneous Multiple-digit Operations," Journal of The Institute of Electronics Engineers of korea, Vol. 52, no. 3, pp. 115-124, Apr. 2015.
  3. I. K. Hwang, K. H. Kim, W. O. Yoon, and S. B. Choi, "Design of Parallel Decimal Multiplier using Limited Range of Signed-Digit Number Encoding," Journal of The Institute of Electronics Engineers of korea, Vol. 50, no. 3, pp. 50-58, Mar. 2013.
  4. J. Pierce and T. Mudge, "Wrong-path instruction prefetching," In Proceedings of the 29th International Symposium on Microarchitecture, pp. 165-175, Dec. 1996.
  5. R. L. Lee, P. C. Yew, and D. H. Lawrie, "Data prefetching in shared memory multiprocessors," In Proceedings of International Conference on Parallel Processing, pp. 28-31, Aug. 1987.
  6. T. S. B. Sudarshan et al, "Highly efficient lru implementations for high associativity cache memory," In Proceedings of the 12th International Conference and Expo on advanced Ceramics and Composites, pp 87-95, Dec. 2004.
  7. S. Somogyi, T. Wenisch, M. Ferdman, and B. Falsafi, "Spatial memory streaming," The Journal of Instruction-Level Parallelism, vol. 13, Jan. 2011.
  8. M. Ferdman, C. Kaynak, and B. Falsa, "Proactive instruction fetch," In Proceedings of the 44th International Symposium on Microarchitecture, pp. 152-162, Dec. 2011.
  9. C. Bienia, S. Kumar, J. P. Jaswinder, and K. Li, "The PARSEC benchmark suite: characterization and architectural implications," In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72-81, Oct. 2008.