DOI QR코드

DOI QR Code

A Bidirectional Korean-Japanese Statistical Machine Translation System by Using MOSES

MOSES를 이용한 한/일 양방향 통계기반 자동 번역 시스템

  • 이공주 (충남대학교 정보통신공학과) ;
  • 이성욱 (한국교통대학교 컴퓨터정보공학과) ;
  • 김지은 (한국외국어대학교 영어학과)
  • Received : 2012.04.24
  • Accepted : 2012.05.25
  • Published : 2012.07.31

Abstract

Recently, statistical machine translation (SMT) has received many attention with ease of its implementation and maintenance. The goal of our works is to build bidirectional Korean-Japanese SMT system by using MOSES [1] system. We use Korean-Japanese bilingual corpus which is aligned per sentence to train the translation model and use a large raw corpus in each language to train each language model. The proposed system shows results comparable to those of a rule-based machine translation system. Most of errors are caused by noises occurred in each processing stage.

통계기반 자동 번역 시스템은 구현과 유지보수의 용이함으로 최근 많은 관심을 받고 있다. 본 연구의 목적은 MOSES[1] 시스템을 이용하여 통계기반의 한/일 양방향 기계번역시스템을 구축하는 것이다. 한/일 문장단위 병렬 코퍼스를 구축하여 번역모델 학습에 이용하였고, 한/일 각각 대량의 원시 코퍼스를 이용하여 언어모델 학습에 이용하였다. 시스템 구축 결과 기존의 규칙기반 번역 시스템의 성능에 근접하는 결과를 얻었으며, 발생하는 오류의 대부분은 각 처리 단계에서 발생하는 노이즈에 기인하였다.

Keywords

References

  1. http://www.statmt.org/moses/
  2. P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer, "The mathematics of statistical machine translation: Parameter Estimation", Computational Linguistics, vol. 19, no. 2, pp. 263-311, 1991.
  3. Yun Kim and et al, "The trends of machine translation techonology and case study", Electronics and Telecommunications Trends, vol. 23, no. 1, 2008.
  4. U. Germann, "Greedy decoding for statistical machine translation in almost linear time", Proceedings of HLT-NAACL pp. 1-8, 2003.
  5. F. J. Och and H. Ney, "A systematic comparison of various statistical alignment models", Computational Linguistics, vol. 29, no. 1, pp. 19-51, 2003. https://doi.org/10.1162/089120103321337421
  6. K. Yamada, and K. Knight, "A syntax-based statistical translation model", Proceedings of The Association for Computational Linguistics 2001, pp. 523-530, 2001.
  7. R. Rosenfeld, "Two decades of statistical language modeling: where do we go from here?", Proceedings of IEEE, vol. 88, no. 8, pp. 1270-1278, 2000. https://doi.org/10.1109/5.880083
  8. The 21st Century Sejong Project, http://sejong.or.kr/sejong_kr/index.html, 2006.
  9. Xiaoyi Ma, "Champollion: A robust parallel text sentence aligner", Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genova, Italy, 2006.
  10. C. Bannard and C.B. Callison, "Paraphrasing with Bilingual Parallel Corpora", Proceedings of The Association for Computational Linguistics 2005, pp. 597-604, 2005.
  11. Y.S. Hwang, Y.K. Kim, and S.K. Park, "Paraphrasing depending on bilingual context toward generalization of translation knowledge", Proceedings of the Third Int'l Joint Conf. on Natural Language Processing, pp. 327-334, 2008.
  12. Daniel Marcu and William Wong, "A phrase-Based, joint probability model for statistical machine translation", Proceedings of Empirical Methods on Natural Language Processing 2002, pp. 133-139.
  13. Nicola Ueffing and Hermann Ney, "Using POS information for statistical machine translation into morphologically rich languages", Proceedings of The European Chapter of the Association for Computational Linguistics 2003.
  14. Philipp Koehn, Franz Josef Och and Daniel Marcu, "Statistical phrase-based translation", Proceedings of Human Language Technologiesthe North American Chapter of the Association for Computational Linguistics 2003, pp. 347-354.
  15. Eleftherios Avramidis and Philipp Koehn, "Enriching morphologically poor languages for statistical machine translation", proceedings of Association for Computational Linguistics 2008, pp. 763-770.

Cited by

  1. Error-driven Noun-Connection Rule Extraction for Morphological Analysis vol.36, pp.8, 2012, https://doi.org/10.5916/jkosme.2012.36.8.1123
  2. A comparison of grammatical error detection techniques for an automated english scoring system vol.37, pp.7, 2013, https://doi.org/10.5916/jkosme.2013.37.7.760
  3. Factors Behind the Effectiveness of an Unsupervised Neural Machine Translation System between Korean and Japanese vol.11, pp.16, 2012, https://doi.org/10.3390/app11167662