DOI QR코드

DOI QR Code

CNN-based Sign Language Translation Program for the Deaf

CNN기반의 청각장애인을 위한 수화번역 프로그램

  • Hong, Kyeong-Chan (Department of Computer Information and Engineering, Sangji University) ;
  • Kim, Hyung-Su (Department of Computer Information and Engineering, Sangji University) ;
  • Han, Young-Hwan (Department of Information Communication Software Engineering, Sangji University)
  • 홍경찬 (상지대학교 컴퓨터정보공학과) ;
  • 김형수 (상지대학교 컴퓨터정보공학과) ;
  • 한영환 (상지대학교 정보통신소프트웨어공학과)
  • Received : 2021.11.15
  • Accepted : 2021.11.30
  • Published : 2021.12.31

Abstract

Society is developing more and more, and communication methods are developing in many ways. However, developed communication is a way for the non-disabled and has no effect on the deaf. Therefore, in this paper, a CNN-based sign language translation program is designed and implemented to help deaf people communicate. Sign language translation programs translate sign language images entered through WebCam according to meaning based on data. The sign language translation program uses 24,000 pieces of Korean vowel data produced directly and conducts U-Net segmentation to train effective classification models. In the implemented sign language translation program, 'ㅋ' showed the best performance among all sign language data with 97% accuracy and 99% F1-Score, while 'ㅣ' showed the highest performance among vowel data with 94% accuracy and 95.5% F1-Score.

사회가 점점 발전하면서 의사소통 방법이 다양한 형태로 발전하고 있다. 그러나 발전한 의사소통은 비장애인을 위한 방법이며, 청각장애인에게는 아무런 영향을 미치지 않는다. 따라서 본 논문에서는 청각장애인의 의사소통을 돕기 위한 CNN 기반의 수화번역 프로그램을 설계 및 구현한다. 수화번역 프로그램은 웹캠을 통해 입력된 수화 영상 데이터를 기반으로 의미에 맞게 번역한다. 수화번역 프로그램은 직접 제작한 24,000개의 한글 자모음 데이터를 사용하였으며, 효과적인 분류모델의 학습을 위해 U-Net을 통한 Segmentation을 진행한다. 전처리가 적용된 데이터는 19,200개의 Training Data와 4,800개의 Test Data를 통하여 AlexNet을 기반으로 학습을 진행한다. 구현한 수화번역 프로그램은 'ㅋ'이 97%의 정확도와 99%의 F1-Score로 모든 수화데이터 중에서 가장 우수한 성능을 나타내었으며, 모음 데이터에서는 'ㅣ'가 94%의 정확도와 95.5%의 F1-Score로 모음 데이터 중에서 가장 높은 성능을 보였다.

Keywords

Acknowledgement

본 논문은 상지대학교 대학원 지원에 의하여 수행된 연구임

References

  1. K. P. Jeon.(2020, Mar.). A Study of Communication Experience in the Job Adaptation Process of People with Hearing Impairment. Journal of Korean Society of Vocational Rehabilitation. 30(2), pp. 97-125. https://doi.org/10.24226/jvr.2020.8.30.2.97
  2. H. S. Lee. et al.(2013, Aug.). Development of Sign Language Translation System using Motion Recognition of Kinect. Journal of Korea Institute do Convergence Signal Processing. 14(4), pp. 235-242.
  3. M. O. Kim. et al.(2013, Jun.). A Phenomenological Study on the Communication Experiences of the Deaf. Journal of Korean Academy of Social Welfare. 49(4), pp. 1-26.
  4. I. H. Kim. et al.(2021, Apr.). A Study on Korea Sign Language Motion Recognition Using OpenPose Based on Deep Learning. Journal of Digital Contents Society. 22(4), pp. 681-687. https://doi.org/10.9728/dcs.2021.22.4.681
  5. P. S. Jung. et al.(2015, Sep.). Design and Implementation of Finger Language Translation System using Raspberry Pi and Leap Motion. Journal of the Korea Institute of Information and Communication Engineering. 19(9), pp. 2006-2013. https://doi.org/10.6109/JKIICE.2015.19.9.2006
  6. J. R. Cho. et al.(2021, Apr.). Application of Artificial Neural Network For Sign Language Translation. Journal of Korea Soeity of Computer Information. 24(2), pp. 185-192.
  7. S. E. Han. et al.(2017, Feb.). E-book to sign-language translation program based on morpheme analysis. Journal of the Korea Institute of Information and Communication Engineering. 21(2), pp. 461-467. https://doi.org/10.6109/JKIICE.2017.21.2.461
  8. J. Long. et al.(2015, Oct.). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440.
  9. O. Ronneberger. et al.(2015, Oct.). U-Net: Convolutional networks for biomedical image segmentation. arXiv. Lecture Notes in Computer Science, pp. 234-241.
  10. A. Krizhevsky. et al.(2012, Jul.). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, pp. 1097-1105.
  11. S. H. Park. et al.(2012, Mar.). IReceiver operating characteristic (ROC) curve: practical review for radiologists. Korean journal of radiology. 5(1), pp. 11-18. https://doi.org/10.3348/kjr.2004.5.1.11
  12. H. Huang. et al.(2015, Mar.). Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 23(4), pp. 787-797. https://doi.org/10.1109/TASLP.2015.2409733