DOI QR코드

DOI QR Code

Automatic Dataset Generation of Object Detection and Instance Segmentation using Mask R-CNN

Mask R-CNN을 이용한 물체인식 및 개체분할의 학습 데이터셋 자동 생성

  • Received : 2018.12.10
  • Accepted : 2019.01.15
  • Published : 2019.02.28

Abstract

A robot usually adopts ANN (artificial neural network)-based object detection and instance segmentation algorithms to recognize objects but creating datasets for these algorithms requires high labeling costs because the dataset should be manually labeled. In order to lower the labeling cost, a new scheme is proposed that can automatically generate a training images and label them for specific objects. This scheme uses an instance segmentation algorithm trained to give the masks of unknown objects, so that they can be obtained in a simple environment. The RGB images of objects can be obtained by using these masks, and it is necessary to label the classes of objects through a human supervision. After obtaining object images, they are synthesized with various background images to create new images. Labeling the synthesized images is performed automatically using the masks and previously input object classes. In addition, human intervention is further reduced by using the robot arm to collect object images. The experiments show that the performance of instance segmentation trained through the proposed method is equivalent to that of the real dataset and that the time required to generate the dataset can be significantly reduced.

Keywords

References

  1. S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, "Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection," The International Journal of Robotics Research (IJRR), vol. 37, no. 4-5, pp. 421-436, 2018. https://doi.org/10.1177/0278364917710318
  2. J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, "Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics," arXiv:1703.09312 [cs.RO], 2017.
  3. A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, and V. Ferrari, "The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale," arXiv:1811.00982 [cs.CV], 2018.
  4. M. Oquab, L. Bottou, I. Laptev, and J. Sivic, "Is object localization for free? -Weakly-supervised learning with convolutional neural networks," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 685-694, 2015.
  5. A. Yao, J. Gall, C. Leistner, and L. V. Gool, "Interactive object detection," 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, pp. 3242-3249, 2012.
  6. D. P. Papadopoulos, J. R. R. Uijlings, F. Keller, and V. Ferrari, "We don't need no bounding-boxes: Training object class detectors using only human verification," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 854-863, 2016.
  7. Y. Wu, Y. Wu, G. Gkioxari, and Y. Tian, "Building generalizable agents with a realistic and rich 3D environment," arXiv:1801.02209 [cs.LG], 2018.
  8. K. He, G. GkioXari, P. Dollar, and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 2980-2988, 2017.
  9. J. Leitner, A. W. Tow, N. Sunderhauf, J. E. Dean, J. W. Durham, M. Cooper, M. Eich, C. Lehnert, R. Mangels, C. McCool, P. T. Kujala, L. Nicholson, T. Pham, J. Sergeant, L. Wu, F. Zhang, B. Upcroft, and P. Corke, "The ACRV picking benchmark: A robotic shelf picking benchmark to foster reproducible research," 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, Singapore, pp. 4705-4712, 2017.
  10. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context," European Conference on Computer Vision (ECCV), Zurich, Switzerland, pp.740-755, 2014.
  11. M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The Pascal Visual Object Classes Challenge: A Retrospective," International Journal of Computer Vision (IJCV), vol. 111, no. 1, pp.98-116, 2015. https://doi.org/10.1007/s11263-014-0733-5