DOI QR코드

DOI QR Code

SuperDepthTransfer: Depth Extraction from Image Using Instance-Based Learning with Superpixels

  • Zhu, Yuesheng (Shenzhen Key Lab of Information Theory & Future Network Arch, Communication & Information Security Lab, Institute of Big Data Technologies Shenzhen Graduate School, Peking University) ;
  • Jiang, Yifeng (Shenzhen Key Lab of Information Theory & Future Network Arch, Communication & Information Security Lab, Institute of Big Data Technologies Shenzhen Graduate School, Peking University) ;
  • Huang, Zhuandi (Shenzhen Key Lab of Information Theory & Future Network Arch, Communication & Information Security Lab, Institute of Big Data Technologies Shenzhen Graduate School, Peking University) ;
  • Luo, Guibo (Shenzhen Key Lab of Information Theory & Future Network Arch, Communication & Information Security Lab, Institute of Big Data Technologies Shenzhen Graduate School, Peking University)
  • Received : 2016.11.18
  • Accepted : 2017.05.28
  • Published : 2017.10.31

Abstract

In this paper, we primarily address the difficulty of automatic generation of a plausible depth map from a single image in an unstructured environment. The aim is to extrapolate a depth map with a more correct, rich, and distinct depth order, which is both quantitatively accurate as well as visually pleasing. Our technique, which is fundamentally based on a preexisting DepthTransfer algorithm, transfers depth information at the level of superpixels. This occurs within a framework that replaces a pixel basis with one of instance-based learning. A vital superpixels feature enhancing matching precision is posterior incorporation of predictive semantic labels into the depth extraction procedure. Finally, a modified Cross Bilateral Filter is leveraged to augment the final depth field. For training and evaluation, experiments were conducted using the Make3D Range Image Dataset and vividly demonstrate that this depth estimation method outperforms state-of-the-art methods for the correlation coefficient metric, mean log10 error and root mean squared error, and achieves comparable performance for the average relative error metric in both efficacy and computational efficiency. This approach can be utilized to automatically convert 2D images into stereo for 3D visualization, producing anaglyph images that are visually superior in realism and simultaneously more immersive.

Keywords

References

  1. Holliman, N.S., Dodgson, N.A., Favalora, G.E., Pockett, L., "Three-dimensional displays: a review and applications analysis," Broadcasting, IEEE Transactions on, vol. 57, pp. 362-371, 2011. https://doi.org/10.1109/TBC.2011.2130930
  2. Karsch, K., Liu, C., Kang, S.B., "Depth extraction from video using non-parametric sampling," in Proc. of Computer Vision-ECCV 2012, pp. 775-788, 2012.
  3. Karsch, K., Liu, C., Kang, S.B., "Depth transfer: Depth extraction from video using non-parametric sampling," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 36, pp. 2144-2158, 2014. https://doi.org/10.1109/TPAMI.2014.2316835
  4. Chen, J.C., Huang, M., "2D-to-3D conversion system using depth map enhancement," KSII Transactions on Internet & Information Systems, vol. 10, 2016.
  5. Yang, H., Zhang, H., "Efficient 3d room shape recovery from a single panorama," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5422-5430, 2016.
  6. Song, Y., Tang, J., Liu, F., Yan, S., "Body surface context: A new robust feature foraction recognition from depth videos," IEEE Transactions on Circuits & Systems for Video Technology, vol. 24, 2014.
  7. Xiang, Y., Kim, W., Chen, W., Ji, J., Choy, C., Su, H., Mottaghi, R., Guibas, L., Savarese, S., "ObjectNet3D: A Large Scale Database for 3D Object Recognition," Springer International Publishing, 2016.
  8. Li, Z., Liu, J., Tang, J., Lu, H., "Robust structured subspace learning for data representation," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 37, 2015.
  9. Urtasun, R., Lenz, P., Geiger, A., "Are we ready for autonomous driving? the kitti vision benchmark suite," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 3354-3361, 2012.
  10. Liao, M., Gao, J., Yang, R., Gong, M., "Video stereolization: Combining motion analysis with user interaction," Visualization and Computer Graphics, IEEE Transactions on, vol. 18, pp. 1079-1088, 2012. https://doi.org/10.1109/TVCG.2011.114
  11. Guttmann, M., Wolf, L., Cohen-Or, D., "Semi-automatic stereo extraction from video footage," in Proc. of Computer Vision, 2009 IEEE 12th International Conference on, pp. 136-142, 2009.
  12. Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M., "Shape-from-shading: a survey," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, pp. 690-706, 1999. https://doi.org/10.1109/34.784284
  13. Forsyth, David A., and J. Ponce, "Computer Vision: A Modern Approach, 2/E," Prentice Hall Professional Technical Reference, 2002.
  14. Subbarao, M., Surya, G., "Depth from defocus: a spatial domain approach," International Journal of Computer Vision, vol. 13, pp. 271-294, 1994. https://doi.org/10.1007/BF02028349
  15. Huang, C., Liu, Q., Yu, S., "Regions of interest extraction from color image based on visual saliency," The Journal of Supercomputing, vol. 58, pp. 20-33, 2011. https://doi.org/10.1007/s11227-010-0532-x
  16. Huang, X., Wang, L., Huang, J., Li, D., Zhang, M., "A depth extraction method based on motion and geometry for 2d to 3d conversion," in Proc. of 2009 Third International Symposium on Intelligent Information Technology Application, pp. 294-298, 2009.
  17. Hoiem, D., Efros, A.A., Hebert, M., "Geometric context from a single image," in Proc. of Computer Vision, Tenth IEEE International Conference on, vol. 1, pp. 654-661, 2005.
  18. Delage, E., Lee, H., Ng, A.Y., "A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image," in Proc. of Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 2, pp. 2418-2428, 2006.
  19. Saxena, A., Sun, M., Ng, A.Y., "Make3d: Learning 3d scene structure from a single still image," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, pp. 824-840, 2009. https://doi.org/10.1109/TPAMI.2008.132
  20. Saxena, A., Chung, S.H., Ng, A.Y., "Learning depth from single monocular images," in Proc. of Advances in Neural Information Processing Systems, pp. 1161-1168, 2005.
  21. Liu, B., Gould, S., Koller, D., "Single image depth estimation from predicted semantic labels," in Proc. of Computer Vision and Pattern Recognition, IEEE Conference on, pp. 1253-1260, 2010.
  22. Konrad, J., Wang, M., Ishwar, P., Wu, C., Mukherjee, D., "Learning-based, automatic 2d-to-3d image and video conversion," Image Processing, IEEE Transactions on, vol. 22, pp. 3485-3496, 2013. https://doi.org/10.1109/TIP.2013.2270375
  23. Herrera, J.L., Konrad, J., del Bianco, C.R., Garcia, N., "Learning-based depth estimation from 2d images using gist and saliency," in Proc. of Image Processing, IEEE International Conference on, pp. 4753-4757, 2015.
  24. Konrad, J., Brown, G., Wang, M., Ishwar, P., Wu, C., Mukherjee, D., "Automatic 2d-to-3d image conversion using 3d examples from the internet," in Proc. of IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, pp. 82880F-82880F, 2012.
  25. Konrad, J., Wang, M., Ishwar, P., "2d-to-3d image conversion by learning depth from examples," in Proc. of Computer Vision and Pattern Recognition Workshops, IEEE Computer Society Conference on, pp. 16-22, 2012.
  26. Liu, M., Salzmann, M., He, X., "Discrete-continuous depth estimation from a single image," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 716-723, 2014.
  27. Eigen, D., Puhrsch, C., Fergus, R., "Depth map prediction from a single image using a multi-scale deep network," in Proc. of Advances in neural information processing systems, pp. 2366-2374, 2014.
  28. Liu, F., Shen, C., Lin, G., "Deep convolutional neural fields for depth estimation from a single image," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162-5170, 2015.
  29. Baig, Mohammad Haris, and L. Torresani, "Coupled depth learning," Applications of Computer Vision, pp. 1-10, 2016.
  30. Su, C.C., Cormack, L.K., Bovik, A.C., "Depth estimation from monocular color images using natural scene statistics models," in Proc. of IVMSP Workshop, pp. 1-4, 2013.
  31. Wang, X., Hou, C., Pu, L., Hou, Y., "A depth estimating method from a single image using foe crf," Multimedia Tools and Applications, vol. 74, pp. 9491-9506, 2015. https://doi.org/10.1007/s11042-014-2130-z
  32. Liu, C., Yuen, J., Torralba, A., "Nonparametric scene parsing via label transfer," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, pp. 2368-2382, 2011. https://doi.org/10.1109/TPAMI.2011.131
  33. Wang, M., Konrad, J., Ishwar, P., Jing, K., Rowley, H., "Image saliency: Fromintrinsic to extrinsic context," in Proc. of Computer Vision and Pattern Recognition, IEEE Conference on, pp. 417-424, 2011.
  34. Oliva, A., Torralba, A., "Modeling the shape of the scene: A holistic representation of the spatial envelope," International journal of computer vision, vol. 42, pp. 145-175, 2001. https://doi.org/10.1023/A:1011139631724
  35. Ren, X., Malik, J., "Learning a classification model for segmentation," in Proc. of Computer Vision, Proceedings, Ninth IEEE International Conference on, pp. 10-17, 2003.
  36. Felzenszwalb, P.F., Huttenlocher, D.P., "Efficient graph-based image segmentation," International Journal of Computer Vision, vol. 59, pp. 167-181, 2004. https://doi.org/10.1023/B:VISI.0000022288.19776.77
  37. Malisiewicz, T., Efros, A.A., "Recognition by association via learning per-exemplar distances," in Proc. of Computer Vision and Pattern Recognition, IEEE Conference on, pp. 1-8, 2008.
  38. Tighe, J., Lazebnik, S., "Superparsing: scalable nonparametric image parsing with superpixels," in Proc. of Computer Vision-ECCV, Springer, pp. 352-365, 2010.
  39. Durand, F., Dorsey, J., "Fast bilateral filtering for the display of high-dynamic-range images," in Proc. of ACM transactions on graphics, vol. 21, pp. 257-266, 2002.
  40. Angot, L.J., Huang, W.J., Liu, K.C., "A 2d to 3d video and image conversion technique based on a bilateral filter," in Proc. of IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, pp. 75260D-75260, 2010.
  41. Saxena, A., Sun, M., Ng, A.Y., "Learning 3-d scene structure from a single still image," in Proc. of Computer Vision, IEEE 11th International Conference on, pp. 1-8, 2007.
  42. Zhang, L., Tam, W.J., "Stereoscopic image generation based on depth images for 3d tv," Broadcasting, IEEE Transactions on, vol. 51, pp. 191-19, 2005. https://doi.org/10.1109/TBC.2005.846190

Cited by

  1. Improved Sliding Shapes for Instance Segmentation of Amodal 3D Object vol.12, pp.11, 2017, https://doi.org/10.3837/tiis.2018.11.021