DOI QR코드

DOI QR Code

A Study on Word Cloud Techniques for Analysis of Unstructured Text Data

비정형 텍스트 테이터 분석을 위한 워드클라우드 기법에 관한 연구

  • Lee, Won-Jo (Dept. of Safety and Industrial Management Eng., Ulsan College)
  • 이원조 (울산과학대학교 안전및산업경영공학과)
  • Received : 2020.09.27
  • Accepted : 2020.10.18
  • Published : 2020.11.30

Abstract

In Big data analysis, text data is mostly unstructured and large-capacity, so analysis was difficult because analysis techniques were not established. Therefore, this study was conducted for the possibility of commercialization through verification of usefulness and problems when applying the big data word cloud technique, one of the text data analysis techniques. In this paper, the limitations and problems of this technique are derived through visualization analysis of the "President UN Speech" using the R program word cloud technique. In addition, by proposing an improved model to solve this problem, an efficient method for practical application of the word cloud technique is proposed.

빅데이터 분석에서 텍스트 데이터는 대부분 비정형이고 대용량으로 분석 기법이 정립되지 않아 분석에 어려움이 많았다. 따라서 텍스트 데이터 분석 기법의 하나인 빅데이터 워드클라우드 기법의 실무 적용시 문제점과 유용성 검증을 통한 상용화 가능성을 위해 본 연구를 수행하였다. 본 논문에서는 R 프로그램 워드클라우드 기법을 이용하여 "대통령 UN연설문"을 시각화 분석을 하고 이 기법의 한계와 문제점을 도출한다. 그리고 이를 해결하기 위한 개선된 모델을 제안하여 워드클라우드 기법의 실무 적용에 대한 효율적인 방안을 제시한다.

Keywords

References

  1. J. Lee, D. Yun, S. O, C. Lee, A Big Data Analysis of Civel Complaint Texts Using R Language, KIICE, vol.24, no.1, pp. 323-325, 2020.
  2. M. Chi , S. Lin, S. Chen, C. Lin, T. Lee, Morphab1e word Clouds for Time-Varying Text Data Visualization, IEEE, vol.21, no.12, pp. 1415-1426, 10.1109/TVCG.2015.2440241, 2015.
  3. Kumar, P. Thakur, K. Gupta, and A. Pal, Text mining approach to analyse the relation between obesity and breast cancer data, ILNS, 2015.
  4. M. Han, Y. Kim, C. Lee, Analysis of News Regarding New southeastem Airport Using Text Mining Techniques, Smart Media Journal, Vol. 6, No. 1, 2017.
  5. Sungeun Kim, Keywords "4 major rivers" seen through big data analysis, Korea Water Resources Association, 2017.
  6. Jiapei Li, Seong Yoon Shin, Hyun Chang Lee, Text Mining and Visualization of Papers Reviews Using R Language, Korea Information and Communication Society, vol.15 no.3, pp. 170-174, 10.6109/jicce.2017.15.3.170, 2017.
  7. I. Lee, N. Young, Unstructured data analysis and visualization, Korean Psychology Association, vol.31, no.2, pp. 499-518, 10.24230/ksiop.31.2.201805.499, 2018.
  8. Y. Noh, S. Bae, Analysis of unstructured data for Korean traffic broadcast reports applying text mining, vol.17, no.3, pp. 87-97, 10.12815/kits.2018.17.3.87, 2018.
  9. Dongnyeok Sim, Research on ICT issue detection and analysis methodology using text data, 2020.
  10. Software Engineering Center Webzine Materials, Big data purification process, 2020.
  11. Giseop Noh, 2018, An Analysis on Internet Information using Real Time Search Words, JCCT, vol. 4, No. 4, pp. 337-341
  12. Jongyong LEE, A Study on Tourism Analysis in Uijeongbu Region Using Big Data, JCCT, vol. 6, No. 1, pp. 413-419, 2020.
  13. Sunghuk Moon, Big data environment analysis and research on ways to secure global competitiveness, JCCT, vol. 5 No. 2, pp. 361-367.
  14. Web Mining, IT Glossary, Korea Information and Communication Technology Association.
  15. text mining, Biochemistry Encyclopedia.
  16. Sejong Oh, R data analysis for everyone, R data analysis for everyone, Hanbit Media, 2019.