DOI QR코드

DOI QR Code

Implementing a Depth Map Generation Algorithm by Convolutional Neural Network

깊이맵 생성 알고리즘의 합성곱 신경망 구현

  • Lee, Seungsoo (Department of Computer and Communications Eng., Kangwon National University) ;
  • Kim, Hong Jin (Department of Computer and Communications Eng., Kangwon National University) ;
  • Kim, Manbae (Department of Computer and Communications Eng., Kangwon National University)
  • 이승수 (강원대학교 컴퓨터정보통신공학과) ;
  • 김홍진 (강원대학교 컴퓨터정보통신공학과) ;
  • 김만배 (강원대학교 컴퓨터정보통신공학과)
  • Received : 2017.10.31
  • Accepted : 2017.11.20
  • Published : 2018.01.30

Abstract

Depth map has been utilized in a varity of fields. Recently research on generating depth map by artificial neural network (ANN) has gained much interest. This paper validates the feasibility of implementing the ready-made depth map generation by convolutional neural network (CNN). First, for a given image, a depth map is generated by the weighted average of a saliency map as well as a motion history image. Then CNN network is trained by test images and depth maps. The objective and subjective experiments are performed on the CNN and showed that the CNN can replace the ready-made depth generation method.

깊이맵은 현재 다양한 분야에서 활용되고 있다. 이러한 깊이맵을 인공 신경망으로 생성하는 연구가 최근 관심을 받고 있다. 본 논문에서는 기존의 기 제작된 깊이맵 생성 알고리즘을 합성곱 신경망으로 구현할 수 있는지에 대한 타당성을 검증한다. 먼저 깊이맵은 관심맵과 운동 히스토리 영상의 가중치 합으로 얻는다. 실험영상과 깊이맵을 합성곱 신경망의 입력과 출력으로 하여, 신경망을 학습시킨다. 정성적, 정량적 실험 결과는 제안한 합성곱 신경망이 깊이맵 생성 방법을 대체할 수 있다는 것을 보여준다.

Keywords

References

  1. S. Kim and J. Yoo, "3D conversion of 2D video using depth layer partition," Journal of Broadcast Engineering, Vol. 15, No. 2, Jan. 2011.
  2. J. Jung, J. Lee, I Shin, J. Moon and Y. Ho, "Improved depth perception of single view images", ECTI Transactions on Electrical Engineering, Electronics and Communications, Vol. 8, No. 2, Aug. 2010.
  3. W. Tam and L. Zhang, "3D-TV Content Generation: 2D-To-3D Conversion," Proc. of IEEE ICME, 2006.
  4. D. Eigen, C. Puhrsch, and R. Fergus, "Depth map prediction from a single image using a multi-scale deep network", Advances in Neural Information Processing Systems, 27, 2014.
  5. A. Afifi and O. Hellwich, "Object Depth Estimation from a Single Image using Fully Convolutional Neural Network", Int' Conf. on Digital Image Computing: Techniques and Applications (DICTA), Nov. 2016.
  6. F. Liu, C. Shen, G. Lin, and I. Reid, "Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields", IEEE Trans. Pattern Analysis and Machine Intellegence, Vol. 38, No. 10, Oct. 2016.
  7. M. Kim, "Generation of Stereoscopic image from 2D Image based on saliency and edge modeling", Journal of Broadcast Engineering, Vol. 20, No. 3, May 2015.
  8. W. Kim, J. Gil and M. Kim, "Motion depth generation using MHI for 3D video conversion", Journal of Broadcast Engineering, Vol. 22, No. 4, July 2017.
  9. Y. Zhang, G. Jiang, M. Yu, and K. Chen, "Stereoscopic visual attention model for 3D video", Advances in Multimedia Modeling, 2010.
  10. J. Kim, A. Baik, Y. Jung and D. Park, "2D-to-3D image/video conversion by using visual attention analysis," Int' Conf. on Image Processing, 2009.
  11. Y. Zhai, and M. Shah, "Visual attention detection in video sequences using spatiotemporal cues," 14th Annual ACM Int' Conf. on Multimedia, pp. 815-824, 2006.
  12. A. Bobick and J. Davis, "The recognition of human movement using temporal templates," IEEE Trans. Pattern Recognition and Pattern Analysis, Vol 23, No. 3 Mar. 2001.
  13. R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk, "SLIC superpixels compared to state-of-the-art superpixel methods", IEEE Trans. Pattern Analysis and Machine Intelligence, 34, (11), pp. 2274-2281, 2012. https://doi.org/10.1109/TPAMI.2012.120
  14. X. Glorot and Y. Bengio, "Understanding the difficulty of training deep forward neural networks", Int' Conf. Artificial Intelligence and Statistics, Society for Artificial Intelligence and Statistics, 2010.
  15. K. Fukuchi, K. Miyazato, A. Kimura, S. Takagi, and J. Yamato, "Saliency-based video segmentation with graph cuts and sequentially updated priors," in Proc. IEEE Int. Conf. Multimedia Expo, pp. 638-641, June-July, 2009.
  16. D. Tsai, M. Flagg, and J. M. Rehg, "Motion coherent tracking with multi-label MRF optimization," Proc. Brit. Mach. Vis. Conf., 2010.
  17. D. Baltieri, R. Vezzani and R. Cucchiara, "3DPes: 3D People Dataset for Surveillance and Forensics," in Proceedings of the 1st International ACM Workshop on Multimedia access to 3D Human Objects, pp. 59-64, Nov-Dec, 2011. (http://imagelab.ing.unimore.it/visor/3dpes.asp)