Bandwidth Efficient Summed Area Table Generation for CUDA

Ha, Sang-Won;Choi, Moon-Hee;Jun, Tae-Joon;Kim, Jin-Woo;Byun, Hye-Ran;Han, Tack-Don;

doi:10.7583/JKGS.2012.12.5.67

Journal of Korea Game Society (한국게임학회 논문지)

Volume 12 Issue 5
/
Pages.67-78
/
2012
/
1598-4540(pISSN)
/
2287-8211(eISSN)

Korea Game Society (한국게임학회)

DOI QR Code

Bandwidth Efficient Summed Area Table Generation for CUDA

CUDA를 이용한 효율적인 합산 영역 테이블의 생성 방법

Ha, Sang-Won (Dept. of Computer Science, Yonsei Univ.) ;
Choi, Moon-Hee (Samsung Electronics Corp.) ;
Jun, Tae-Joon (Dept. of Computer Science, Yonsei Univ.) ;
Kim, Jin-Woo (Dept. of Computer Science, Yonsei Univ.) ;
Byun, Hye-Ran (Dept. of Computer Science, Yonsei Univ.) ;
Han, Tack-Don (Dept. of Computer Science, Yonsei Univ.)

하상원 (연세대학교 컴퓨터 과학과) ;
최문희 (삼성전자(주)) ;
전태준 (연세대학교 컴퓨터 과학과) ;
김진우 (연세대학교 컴퓨터 과학과) ;
변혜란 (연세대학교 컴퓨터 과학과) ;
한탁돈 (연세대학교 컴퓨터 과학과)

Received : 2012.08.31
Accepted : 2012.09.27
Published : 2012.10.20

https://doi.org/10.7583/JKGS.2012.12.5.67 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Summed area table allows filtering of arbitrary-width box regions for every pixel in constant time per pixel. This characteristic makes it beneficial in image processing applications where the sum or average of the surrounding pixel intensity is required. Although calculating the summed area table of an image data is primarily a memory bound job consisting of row or column-wise summation, previous works had to endure excessive access to the high latency global memory in order to exploit data parallelism. In this paper, we propose an efficient algorithm for generating the summed area table in the GPGPU environment where the input is decomposed into square sub-images with intermediate data that are propagated between them. By doing so, the global memory access is almost halved compared to the previous methods making an efficient use of the available memory bandwidth. The results show a substantial increase in performance.

합산 영역 테이블은 모든 픽셀에 대해 임의의 크기 사각영역의 이미지 필터링 처리를 일정시간 안에 가능케 한다. 이러한 특성은 각각의 픽셀에 대해서 주변 픽셀의 밝기의 합 혹은 평균을 필요로 하는 이미지 처리 적용 분야에 유용하게 쓰일 수 있다. 합산 영역 테이블의 생성은 단지 행 혹은 열 단위의 합만을 구하는 메모리 바운드 작업임에도 불구하고 기존 연구들은 이미 존재하는 데이터 병렬성만을 활용하기 위하여 대기 시간이 긴 전역 메모리에 과도한 접근을 하여야만 했다. 본 논문에서는 입력 데이터를 정방의 서브 이미지로 분할하고 매개 데이터를 이들 간에 파급시킴으로써 GPGPU 환경 적합한 알고리즘을 제안하고자 한다. 이를 통하여 기존 방법 대비 전역 메모리 접근 량을 거의 반으로 줄임으로써 주어진 메모리 대역폭을 효율적으로 사용한다. 결과에서도 성능이 대폭 향상되었다.

Keywords

References

Crow, F. C. "Summed-area tables for texture mapping," In SIGGRAPH '84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques, NY, NY, USA, pp 207-212, 1984.
Heckbert, P. S., "Filtering by Repeated Integration," ACM SIGGRAPH Computer Graphics, Vol. 20, No. 4, pp 315-321, 1986. https://doi.org/10.1145/15886.15921
Hensley, J., Scheuermann, T., Coombe, G., Singh, M., and Lastra, A. "Fast summed-area table generation and its applications," Computer Graphics Forum, Vol. 24, No. 3, pp 547-555, Sept. 2005. https://doi.org/10.1111/j.1467-8659.2005.00880.x
Demers, J., "Depth of Field: A Survey of Techniques," GPU Gems, Addison Wesley, pp 375-390, 2004.
Grabner, M., Grabner, H., and Bischof, H., "Fast approximated SIFT," ACCV 2006, LNCS, Vol. 3851, pp 918-927, 2006.
Bay, H., Tuytelaars, T., and Gool, L. V., "SURF: Speeded Up Robust Features," ECCV 2006, LNCS, Vol. 3951, pp 404-417, 2006.
Harris, M., Sengupta, S., and Owens, J. D. "Parallel prefix sum (scan) with CUDA," In Nguyen, H., ed., GPU Gems 3. Addison Wesley, 2007.
NVIDIA CUDA C Programming Guide, Ver. 4.0, 2011.
Harris, M., Sengupta, S., and Owens, J.D., "Parallel Prefix Sum (Scan) with CUDA," GPU Gems 3, H. Nguyen, Addison-Wesley, Ch. 31, Aug. 2007.
Kogge, P. M. and Stone, S. S., "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," IEEE Trans. on Computers, Vol. C-22, No. 8, pp 786-793, 1973. https://doi.org/10.1109/TC.1973.5009159
CUDA Data Parallel Primitives Library, http://code.google.com/p/cudpp

Cited by

CUDA based Lossless Asynchronous Compression of Ultra High Definition Game Scenes using DPCM-GR vol.14, pp.6, 2014, https://doi.org/10.7583/JKGS.2014.14.6.59

Journal of Korea Game Society (한국게임학회 논문지)

Bandwidth Efficient Summed Area Table Generation for CUDA

CUDA를 이용한 효율적인 합산 영역 테이블의 생성 방법

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)