DOI QR코드

DOI QR Code

Environment for Translation Domain Adaptation and Continuous Improvement of English-Korean Machine Translation System

  • Kim, Sung-Dong (School of Computer Engineering, Hansung University) ;
  • Kim, Namyun (School of Computer Engineering, Hansung University)
  • Received : 2020.04.17
  • Accepted : 2020.04.27
  • Published : 2020.05.31

Abstract

This paper presents an environment for rule-based English-Korean machine translation system, which supports the translation domain adaptation and the continuous translation quality improvement. For the purposes, corpus is essential, from which necessary information for translation will be acquired. The environment consists of a corpus construction part and a translation knowledge extraction part. The corpus construction part crawls news articles from some newspaper sites. The extraction part builds the translation knowledge such as newly-created words, compound words, collocation information, distributional word representations, and so on. For the translation domain adaption, the corpus for the domain should be built and the translation knowledge should be constructed from the corpus. For the continuous improvement, corpus needs to be continuously expanded and the translation knowledge should be enhanced from the expanded corpus. The proposed web-based environment is expected to facilitate the tasks of domain adaptation and translation system improvement.

Keywords

References

  1. H. T. Hwang, D. Yun, and S. H. Choi, "Deep Learning-Based Sound Localization Using Stereo Signals Based on Synchronized ILD," International Journal of Internet, Broadcasting and Communication(IJIBC), Vol. 11, No. 3, pp. 106-110, 2019. DOI: http://dx.doi.org/10.7236/IJIBC.2019.11.3.106
  2. G. Agrawal and D.-K. Kang, "Wine Quality Classification with Multilayer Perceptron," International Journal of Internet, Broadcasting and Communication(IJIBC), Vol. 10, No. 2, pp. 25-30, 2018. DOI: http://dx.doi.org/10.7236/IJIBC.2018.10.2.5
  3. M. Johnson et al., "Google's Multilingual Neural machine Translation System: Enabling Zero-Shot Translation," Transactions of the Association for Computational Linguistics, Vol. 5, pp. 339-351, 2017. DOI: https://arxiv.org/pdf/1611.04558.pdf https://doi.org/10.1162/tacl_a_00065
  4. Y. Wu et al., "Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation," Computing Research Repository(CoRR), abs/1609.08144, 2016. DOI: https://arxiv.org/pdf/1609.08144.pdf
  5. Sung-Dong Kim, Seok Kee Lee, "English-Korean Machine Translation System with the Improved Ability of Resolve Linguistic Differences by Pre- and Post-Processing," The Journal of Linguistic Science, Vol. 92, pp. 151-179, Mar. 2020. DOI: http://dx.doi.org/10.21296/jls.2020.3.92.151
  6. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," In International Conference on Learning Representations: Workshops Track, 2013. DOI: https://arxiv.org/abs/1301.3781
  7. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributional Representation of Words and Phrases and their Compositionality," In Advances in Neural Information Processing Systems, pp. 3111-3119, 2013. DOI: https://arxiv.org/abs/1310.4546
  8. P. Kraft, "Collocations and their crucial role in language and translation," https://www.anjajonestranslation.co.uk/collocations-and-their-crucial-role-in-language-and-translation/
  9. M. Duan and X. Qin, "Collocation in English Teaching and Learning," Theory and Practice in Language Studies, Vol. 2, No. 9, pp. 1890-1894, September 2012. DOI: http://dx.doi.org/10.4304/tpls.2.9.1890-1894
  10. Y. Belinkov, T. Lei, R. Barzilay, and A. Globerson. (2014). "Exploring Compositional Architectures and Word Vector Representations for Prepositional Phrase Attachment," Transactions of the Association for Computational Linguistics, Vol. 2, pp. 561-572. DOI: https://doi.org/10.1162/tacl_a_00203