DOI QR코드

DOI QR Code

Analysis of patterns in meteorological research and development using a text-mining algorithm

텍스트 마이닝 알고리즘을 이용한 기상청 연구개발분야 과제의 추세 분석

  • Park, Hongju (Department of Applied Statistics, Yonsei University) ;
  • Kim, Habin (Department of Statistics, Dongguk University) ;
  • Park, Taeyoung (Department of Applied Statistics, Yonsei University) ;
  • Lee, Yung-Seop (Department of Statistics, Dongguk University)
  • Received : 2016.05.30
  • Accepted : 2016.07.19
  • Published : 2016.08.31

Abstract

This paper considers the analysis of patterns in meteorological research and development using a text-mining algorithm as the method of analyzing unstructured data. To analyze text data, we define a list of terms related to meteorological research and development, construct times series of a term-document matrix through data preprocessing, and identify terms that have upward or downward patterns over time. The proposed methodology is applied to multi-year plans funded by Korea Meteorological Administration research and development programs from 2011 to 2015.

이 연구에서는 비정형 자료 분석 기법 중 하나인 텍스트 마이닝 기법으로 기상청 연구개발분야 과제의 동향에 대하여 분석하였다. 이를 위하여 용어사전을 구축하고, 전처리를 하여 용어-문서 행렬을 만들었다. 이것을 이용해 연도별 용어 빈도수를 측정하고, 자주 나타나는 단어들에 대해서는 상대도수의 변화에 대해서 관찰하였다. 그리고 회귀 분석을 사용하여 증가추세와 감소추세를 가지는 용어들을 파악하였다. 이러한 분석으로 기상청 최근 연구개발 분야의 트렌드를 파악하였다. 이와 같은 연구는 향후 기상청 연구개발에 관한 기초 자료로 사용될 수 있으며, 연구개발의 방향성과 청사진을 제시하는데 이용될 수 있을 것이다.

Keywords

References

  1. Attali, Y. and Burstein, J. (2006). Automated Essay Scoring With $e-rater^{(R)}$ V.2, The Journal of Technology, Learning, and Assessment, 4, Available from: http://www.jtla.org.
  2. Bae, K. Y., Park, J. H., Kim, J. S., and Lee, Y. S. (2013). Analysis of the abstracts of research article in food related to climate change using a text-mining algorithm, Journal of the Korean Data and Information Science Society, 24, 1429-1437. https://doi.org/10.7465/jkdi.2013.24.6.1429
  3. Feinerer, I. (2013). Introduction to the tm package text mining in R, http://CRAN.R-project.org/doc/Rnews/
  4. Feinerer, I., Hornik, K., and Meyer, D. (2008). Text mining infrastructure in R, Journal of Statistical Software, 25, 1-54.
  5. Goo, J. and Kim, K. (2014). Text mining for Korean: characteristics and application to 2011 Korean Economic Census Data, Korean Journal of Applied Statistics, 27, 1207-1217. https://doi.org/10.5351/KJAS.2014.27.7.1207
  6. Jeon, H. (2013). KoNLP: Korean NLP package, R package version 0.76, 8.
  7. Jeon, H. (2015). Package KoNLP, Available from: https://cran.r-project.org/web/packages/KoNLP/KoNLP.pdf.
  8. Jin, S. A., Heo, G. E., Jeong, Y. K., and Song, M. (2013). Topic-network based topic shift detection on twitter, Korea Society for Information Management, 30, 285-302. https://doi.org/10.3743/KOSIM.2013.30.1.285
  9. Kang, M. M., Kim, S. R., and Park, S. M. (2012). Analysis and utilization of big data, Korea Information Science Society review, 30, 25-32.
  10. Srivastava, A. N. and Sahami, M. (2009). Text Mining: Classication, Clustering, and Applications, CRC Press.
  11. Zhang, B. T. (2007). Next-generation machine learning technologies, Communications of the Korea Information Science Society, 3, 96-107.