DOI QR코드

DOI QR Code

Evaluation of Building Detection from Aerial Images Using Region-based Convolutional Neural Network for Deep Learning

딥러닝을 위한 영역기반 합성곱 신경망에 의한 항공영상에서 건물탐지 평가

  • Lee, Dae Geon (Dept. of Environment, Energy & Geoinformatics, Sejong University) ;
  • Cho, Eun Ji (Dept. of Environment, Energy & Geoinformatics, Sejong University) ;
  • Lee, Dong-Cheon (Dept. of Environment, Energy & Geoinformatics, Sejong University)
  • Received : 2018.10.02
  • Accepted : 2018.11.12
  • Published : 2018.12.31

Abstract

DL (Deep Learning) is getting popular in various fields to implement artificial intelligence that resembles human learning and cognition. DL based on complicate structure of the ANN (Artificial Neural Network) requires computing power and computation cost. Variety of DL models with improved performance have been developed with powerful computer specification. The main purpose of this paper is to detect buildings from aerial images and evaluate performance of Mask R-CNN (Region-based Convolutional Neural Network) developed by FAIR (Facebook AI Research) team recently. Mask R-CNN is a R-CNN that is evaluated to be one of the best ANN models in terms of performance for semantic segmentation with pixel-level accuracy. The performance of the DL models is determined by training ability as well as architecture of the ANN. In this paper, we characteristics of the Mask R-CNN with various types of the images and evaluate possibility of the generalization which is the ultimate goal of the DL. As for future study, it is expected that reliability and generalization of DL will be improved by using a variety of spatial information data for training of the DL models.

딥러닝은 인간의 학습 및 인지능력을 닮은 인공지능을 실현하기 위해 여러 분야에서 활용하고 있으며, 높은 사양의 컴퓨팅 파워가 요구되고 연산 시간이 많이 소요되는 복잡한 구조의 인공신경망에 의한 딥러닝은 컴퓨터 사양이 향상됨에 따라 성능이 개선된 다양한 딥러닝 모델이 개발되고 있다. 본 논문의 주요 목적은 영상의 딥러닝을 위한 합성곱 신경망 중에서 최근에 FAIR (Facebook AI Research)에서 개발한 Mask R-CNN을 이용하여 항공영상에서 건물을 탐지하고 성능을 평가하는 것이다. Mask R-CNN은 영역기반의 합성곱 신경망으로서 픽셀 정확도까지 객체를 의미적으로 분할하기 위한 딥러닝 모델로서 성능이 가장 우수한 것으로 평가받고 있다. 딥러닝 모델의 성능은 신경망 구조뿐 아니라 학습 능력에 의해 결정된다. 이를 위해 본 논문에서는 모델의 학습에 이용한 영상에 다양한 변화를 주어 학습 능력을 분석하였으며, 딥러닝의 궁극적 목표인 범용화의 가능성을 평가하였다. 향후 연구방안으로는 영상에만 의존하지 않고 다양한 공간정보 데이터를 복합적으로 딥러닝 모델의 학습에 이용하여 딥러닝의 신뢰성과 범용화가 향상될 것으로 판단된다.

Keywords

GCRHBD_2018_v36n6_469_f0001.png 이미지

Fig. 1. Learning process in general ANN

GCRHBD_2018_v36n6_469_f0002.png 이미지

Fig. 2. Architecture of generic CNN model

GCRHBD_2018_v36n6_469_f0003.png 이미지

Fig. 3. Zero padding

GCRHBD_2018_v36n6_469_f0004.png 이미지

Fig. 4. ReLU function

GCRHBD_2018_v36n6_469_f0005.png 이미지

Fig. 5. Resizing feature map by max pooling

GCRHBD_2018_v36n6_469_f0006.png 이미지

Fig. 6. Anchor boxes for object detection

GCRHBD_2018_v36n6_469_f0007.png 이미지

Fig. 7. Demonstration of nine possible anchor boxes

GCRHBD_2018_v36n6_469_f0008.png 이미지

Fig. 8. Mask R-CNN model architecture

GCRHBD_2018_v36n6_469_f0009.png 이미지

Fig. 9. Progress of CNN: From object detection to instance segmentation

GCRHBD_2018_v36n6_469_f0010.png 이미지

Fig. 10. Examples of RGB image and corresponding annotation data

GCRHBD_2018_v36n6_469_f0011.png 이미지

Fig. 11. A sample of training image

GCRHBD_2018_v36n6_469_f0012.png 이미지

Fig. 12. Rotated image without padding

GCRHBD_2018_v36n6_469_f0013.png 이미지

Fig. 13. Mirror padding for image rotation

GCRHBD_2018_v36n6_469_f0014.png 이미지

Fig. 14. Building detection with geometrically transformed images

GCRHBD_2018_v36n6_469_f0015.png 이미지

Fig. 15. Building detection with radiometrically degraded images

GCRHBD_2018_v36n6_469_f0016.png 이미지

Fig. 16. Building detection from unseen images

References

  1. Audebert, N., Le Saux, B., and Lefevre, S. (2018), Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 140, pp. 20-32. https://doi.org/10.1016/j.isprsjprs.2017.11.011
  2. Back, C.S. and Yom, J.H. (2018), Comparison of point cloud volume calculated by artificial intelligence learning method and photogrammetric method, Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, 19-20 April, Yongin, Korea, pp. 227-230.
  3. Ball, J., Anderson, D., and Chan, C. (2017), A comprehensive survey of deep learning in remote sensing: Theories, tools and challenges for the community, Journal of Applied Remote Sensing, Vol. 11. No. 4, pp. 1-54.
  4. Campos-Taberner, M., Romero-Soriano, A., Gatta, C., Camps-Valls, G., Lagrange, A., Le Saux, B., Beaupere, A., Boulch, A., Chan-Hon-Tong, A., Herbin, S., Randrianarivo, H., Ferecatu, M., Shimoni, M., Moser, G., and Tuia, D. (2016), Processing of extremely highresolution LiDAR and RGB data: Outcome of the 2015 IEEE GRSS data fusion contest-Part A: 2-D contest, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 9, No. 12, pp. 5547-5559. https://doi.org/10.1109/JSTARS.2016.2569162
  5. Choe, Y.J. and Yom, J.H. (2017), Downscaling of MODIS land surface temperature to LANDSAT scale using multi-layer perceptron, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Vol. 35, No. 4, pp. 313-318. (in Korean with English abstract) https://doi.org/10.7848/KSGPC.2017.35.4.313
  6. Chung, D. and Lee, I. (2017), Point cloud classification base on deep learning, Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography, Yeosu, Korea, pp. 110-113. (in Korean with English abstract)
  7. Deng, Z., Sun, H., Zhou, S., Zhao, Lei, L., and Zou, H. (2018), Multi-scale object detection in remote sensing imagery with convolutional neural networks, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 3-22. https://doi.org/10.1016/j.isprsjprs.2018.04.003
  8. Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., and Garcia-Rodriguez, J. (2017), A review on deep learning techniques applied to semantic segmentation, arXiv:1704.06857.
  9. Girshick, R. (2015), Fast R-CNN, IEEE International Conference on Computer Vision, ICCV 2015, 13-16 December, Santiago, Chile, pp. 1440-1448.
  10. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2016), Region-based convolutional networks for accurate object detection and segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 38, No. 1, pp. 1-16. https://doi.org/10.1109/TPAMI.2016.2592468
  11. Hazirbas, C., Ma, L., Domokos, C., and Cremers, D. (2016), FuseNet: Incorporating depth into semantic segmentation via fusion-based CNN architecture, Proceedings of the Asian Conference on Computer Vision, Vol. 2, 20-24 November, Taipei, Taiwan.
  12. He, k., Gkioxari, G., Dollar, p., and Girshick, R. (2017), Mask R-CNN, Proceedings of IEEE International Conference on Computer Vision (ICCV) 2017, 22-29 October, Venice, Italy, pp. 2980-2988.
  13. Hertz, J., Krogh, A., and Palmer, R. (1991), Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 327p.
  14. Kang, J., Korner, M., Wang, Y., Taubenbock, H., and Zhu, X. (2018), Building instance classification using street view images, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 44-59. https://doi.org/10.1016/j.isprsjprs.2018.02.006
  15. Kemker, R., Salvaggio, C., and Kanan, C. (2018), Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 60-77. https://doi.org/10.1016/j.isprsjprs.2018.04.014
  16. Kim, H. and Bae, T., (2017), Preliminary study of deep learning-based precipitation prediction, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography, Vol. 35, No. 5, 423-430. https://doi.org/10.7848/KSGPC.2017.35.5.423
  17. Krizhevsky, A., Sutskever, I., and Hinton, G. (2012), ImageNet classification with deep convolutional neural networks, Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1, 3-8 December, Lake Tahoe, Nevada, pp. 1097-1105.
  18. LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R. Hubbard, W., and Jackel, L. (1989), Backpropagation applied to handwritten zip code recognition. Neural Computation, No. 1, Vol. 4, pp. 541-551. https://doi.org/10.1162/neco.1989.1.4.541
  19. Lee, G. and Yom, J.H. (2018), Design and implementation of web-based automatic preprocessing system of remote sensing imagery for machine learning modeling, Journal of the Korean Society for Geospatial Information Science, Vol. 26 No. 1, pp. 61-67. (in Korean with English abstract)
  20. Long, J., Shelhamer, E., and Darrell, T. (2015), Fully convolutional networks for semantic segmentation, Proceedings of IEEE Conference on Computer Vision and Patton Recognition, 7-12 June, Boston, MA, pp. 3431-3440.
  21. Marmanis, D., Wegner, J., Galliani, S., Schindler, K., Datcu, M., and Stilla, U. (2016), Semantic segmentation of aerial images with an ensemble of CNNS, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 3-3, XXIII ISPRS Congress, 12-19 July, Prague, Czech Republic, pp. 473-480.
  22. Maturana, D. and Scherer, S. (2015), 3D Convolutional neural networks for landing zone detection from LiDAR, IEEE International Conference on Robotics and Automation, Seattle, Washington, 26-30 May, pp. 3471-3478.
  23. McCulloch, W. and Pitts, W. (1943), A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, Vol. 7, pp. 115-133.
  24. Oh, H. (2010), Landslide detection and landslide susceptibility mapping using aerial photos and artificial neural networks, Korean Journal of Remote Sensing, Vol. 26, No. 1, pp. 47-57. (in Korean with English abstract)
  25. Pang, Y., Sun, M., Jiang, X., and Li, X. (2018), Convolution in convolution for network in network, IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, No. 5, pp. 1587-1597. https://doi.org/10.1109/TNNLS.2017.2676130
  26. Parthasarathy, D. (2017), A brief history of CNNs in image segmentation: From R-CNN to Mask R-CNN, https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4 (last date accessed: 6 September 2018).
  27. Ren, S., He, K., Girshick, R., and Sun, J. (2017), Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 6, pp. 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031
  28. Rosenblatt, F. (1958), The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, Vol. 65, No. 6, pp. 386-408. https://doi.org/10.1037/h0042519
  29. Rumelhart, D., Hinton, G., and Williams, R. (1986), Learning internal representations by back-propagating errors, Nature, Vol. 323, No. 9, pp. 533-536. https://doi.org/10.1038/323533a0
  30. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang Z., Karpathy, A., Khosla, A., Bernstein, M., and Berg, A. (2015), Imagenet large scale visual recognition challenge, International Journal of Computer Vision, Vol. 115, No. 3, pp. 211-252. https://doi.org/10.1007/s11263-015-0816-y
  31. Schenk, T. (1999), Digital Photogrammetry: Volume 1, TerraScience, Laurelville, OH, 428p.
  32. Shaikh, F. (2018), Automatic image captioning using deep learning (CNN and LSTM) in PyTorch, Analytics vidhya, https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/ (last date accessed: 31 October 2018).
  33. Simard, P., Steinkraus, D., and Platt, J. (2003), Best practices for convolutional neural networks applied to visual document analysis, Proceedings of the Seventh International Conference on Document Analysis and Recognition, ICDAR 2003, 3-6 August, Vol. 2, pp. 958-962.
  34. Tokarczyk, P., Wegner, J., Walk, S., and Schindler, K. (2015), Features, color spaces, and boosting: new insights on semantic classification of remote sensing images, IEEE Transactions on Geoscience And Remote Sensing, Vol. 53, No. 1, pp. 280-295. https://doi.org/10.1109/TGRS.2014.2321423
  35. You, Q., Jin, H., Wang, Z., Fang, C., and Luo, J. (2016), Image captioning with semantic attention, IEEE Conference on Computer Vision and Pattern Recognition, 26 June-1 July, Las Vegas, Nevada, pp. 4651-4659.
  36. Vo, A.V., Truong-Hong, L., Laefer, D., Tiede, D., d'Oleire-Oltmanns, S., Baraldi, A., Shimoni, M., Moser, G., and Tuia, D. (2016), Processing of extremely high resolution LiDAR and RGB Data: Outcome of the 2015 IEEE GRSS data fusion contest-Part B: 3-D Contest, IEEE Journal of Selected Topics In Applied Earth Observations And Remote Sensing, Vol. 9, No. 12, pp. 5560-5575. https://doi.org/10.1109/JSTARS.2016.2581843
  37. Wang, S., Quan, D., Liang, X., Ning, M., Guo, Y., and Jiao, L. (2018), A deep learning framework for remote sensing image registration, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 148-164. https://doi.org/10.1016/j.isprsjprs.2017.12.012
  38. Xing, Y., Wang, M., Yang, S., and Jiao, L. (2018), Pansharpening via deep metric learning, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 165-183. https://doi.org/10.1016/j.isprsjprs.2018.01.016
  39. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., and Bengio, Y. (2015), Show, attend and tell: Neural image caption generation with visual attention, International Conference on Machine Learning, 6-11 July, Lille, France, pp. 2048-2057.
  40. Zhang, B., Gu, J., Chen, C., Han, J., Su, X., Cao, X., and Liu, J. (2018), One-two-one networks for compression artifacts in remote sensing, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 184-196. https://doi.org/10.1016/j.isprsjprs.2018.01.003

Cited by

  1. 포인트 클라우드에서 딥러닝을 이용한 객체 분류 및 변화 탐지 vol.50, pp.2, 2018, https://doi.org/10.22640/lxsiri.2020.50.2.37
  2. 적외선 영상, 라이다 데이터 및 특성정보 융합 기반의 합성곱 인공신경망을 이용한 건물탐지 vol.38, pp.6, 2020, https://doi.org/10.7848/ksgpc.2020.38.6.635
  3. 인공지능 기반 유해조류 탐지 관제 시스템 vol.16, pp.1, 2018, https://doi.org/10.13067/jkiecs.2021.16.1.175
  4. 항공영상을 이용한 딥러닝 기반 건물객체 추출 기법들의 비교평가 vol.39, pp.3, 2018, https://doi.org/10.7848/ksgpc.2021.39.3.157