DOI QR코드

DOI QR Code

Research on Methods for Processing Nonstandard Korean Words on Social Network Services

소셜네트워크서비스에 활용할 비표준어 한글 처리 방법 연구

  • Received : 2016.06.02
  • Accepted : 2016.06.27
  • Published : 2016.06.30

Abstract

Social network services (SNS) that help to build relationship network and share a particular interest or activity freely according to their interests by posting comments, photos, videos,${\ldots}$ on online communities such as blogs have adopted and developed widely as a social phenomenon. Several researches have been done to explore the pattern and valuable information in social networks data via text mining such as opinion mining and semantic analysis. For improving the efficiency of text mining, keyword-based approach have been applied but most of researchers argued the limitations of the rules of Korean orthography. This research aims to construct a database of non-standard Korean words which are difficulty in data mining such abbreviations, slangs, strange expressions, emoticons in order to improve the limitations in keyword-based text mining techniques. Based on the study of subjective opinions about specific topics on blogs, this research extracted non-standard words that were found useful in text mining process.

특정한 관심이나 활동을 공유하는 관계망을 구축해주는 온라인 서비스인 소셜네트워크서비스(SNS), 자신의 관심사에 따라 자유롭게 글, 사진, 동영상 등을 올릴 수 있는 공간인 블로그(Blog) 등은 자신을 알리고 표현하는 사회현상으로 자리 매김하고 있다. 이러한 SNS나 블로그를 통해 사용자들이 자유롭게 표현한 글들을 분석하여 의미있는 정보와 가치, 그리고 패턴을 찾기 위한 텍스트 마이닝(Text Mining), 오피니언 마이닝(Opinion Mining), 의미 분석(Semantic Analysis) 등의 연구가 활발히 이루어지고 있다. 또한, 연구자들의 연구 효율을 보다 높이기 위하여 키워드 기반 연구들도 이루어져있다. 하지만 대부분의 연구들은 한글의 맞춤법에 많은 한계점을 나타내고 있다. 본 연구는 어근을 찾기 힘든 이상한 외계 언어, 무분별하게 표현되는 속어, 알기 힘든 한글 이모티콘 인터넷 언어, 마이닝 처리 과정에서 파악하기 어려운 단어들을 데이터베이스에 구축하여 데이터 사전 기반 마이닝 처리 기법의 한계를 극복하고자 한다. 특정 주제에 대한 주관적 견해로 구성된 블로그를 사례 분석 대상으로 연구를 진행하였으며 유니코드를 활용한 비표준어 추출은 텍스트 마이닝 처리에 유용함을 발견할 수 있었다.

Keywords

References

  1. Lee, J. H., "Big Data, Data Mining and Temporary Reproduction," The Journal of Intellectual Property, Vol. 8, No. 4, 2013, pp. 93-125. https://doi.org/10.1093/jiplp/jps218
  2. Kang, S. J., "Constructing a Large Interlinked Ontology Network for the Web of Data," Journal of Korean Industrial Information Systems Society, Vol. 15, No. 1, 2010, pp. 15-23.
  3. Park, C. S., Hong, Y. J. and Cho, I. H., "An Analysis on Journalism Characteristics of SNS based on Issued Cases : With Twitter as the Center," Proceedings in 2012 Fall Conference of The Korean Entertainment Industry Association, 2012, pp. 36-40.
  4. Boyd, D. M. and Ellison, N. B., "Social Network Sites: Definition, History, and Scholarship," Journal of Computer-Mediated Communication, Vol. 13, No. 4, 2007, pp. 210-230. https://doi.org/10.1111/j.1083-6101.2007.00393.x
  5. Kim, W. S., Lee, J. H., Park, j. W. and Choi, j. H.,"A Technique of the Approval Rating Analysis for Political Party Using Opinion Mining,", Journal of Korean Institute of Information Technology, Vol. 12, No. 10, 2014, pp. 133-141.
  6. Won, J. Y. and Kim, D. G., "Deduction of Social Risk Issues Using Text Mining," Journal of safety and crisis management, Vol. 10, No. 7, 2014, pp. 33-52.
  7. Lee, J. H. and Lee, H. K., "A Study on Unstructured Text Mining Algorithm through R Programming based on Data Dictionary," Journal of the Korea Society Industrial Information System, Vol. 20, No. 2, 2015, pp. 113-124. https://doi.org/10.9723/jksiis.2015.20.2.113
  8. Chang, J. Y., Lee, s. Y. and Han, J. B., "Machine-Learned Classification Technique for Opinion Documents Retrieval in Social Network Services," Proceedings in 2013 Conference of Korean Institute of Information Scientists and Engineers, 2013, pp. 245-247.
  9. Chang, C. Y., Jang, J. H., Kim, S, H., Lee, H. K. and Lee, C. H., "A Study on the Efficient Patent Search Process using Big Data Analysis Tool R," Journal of Korea Safety Management & Science, Vol. 15, No. 4, 2013, pp. 289-294. https://doi.org/10.12812/ksms.2013.15.4.289
  10. Le, H., and Lee, H. K., "Exploring Relationship Between Social ICT Issues And Academic Research Interests Through Text Mining Analysis," The Journal of Internet Electronic Commerce Research, Vol. 14, No. 5, 2014, pp. 161-180.
  11. Le, H., Lee, J. H. and Lee, H. K., "Purchase Process Aspect-based Opinion Mining : An Application for Online Shopping Mall," The Journal of Internet Electronic Commerce Research, Vol. 15, No. 2, 2015, pp. 15-28.
  12. Yun, B. H., "Natural Language Processing based Information Extraction for Newspapers," Journal of Korean Institute of Information Technology, Vol. 6, No. 4, 2008, pp. 188-195.
  13. Hong, J. P. and Cha, J. W., "Error Correction of Sejong Morphological Annotation Corpora using Part-of-Speech Tagger andFrequency Information," Journal of KISS : Software and Applications, 2013, Vol. 40, No. 7, pp. 417-428.
  14. Sim, K. S., "Syllable-based POS Tagging without Korean Morphological Analysis," Korean Journal of Cognitive Science, Vol. 22, No. 3, 2011, pp. 327-345. https://doi.org/10.19066/cogsci.2011.22.3.005
  15. An, J. K. and Kim, H. U., "Building a Korean Sentiment Dictionary and Applications of Natural Language Processing," Proceedings in 2014 Summer Conference of Korea Intelligent Information Systems Society, 2014, pp. 177-182.
  16. Kwon H. R., Na J. H., Yoo J. S. and Cho W. S., "Text-mining Techniques for Metabolic Pathway Reconstruction," Journal of Korean Industrial Information Systems Society, Vol. 12, No. 4, pp. 138-147, 2007.
  17. URL http://www.korean.go.kr/
  18. URL http://www.naver.com/
  19. URL http://www.unicode.org/

Cited by

  1. SNS의 해시태그를 이용한 감정 단어 수집 시스템 개발 vol.27, pp.2, 2016, https://doi.org/10.5859/kais.2018.27.2.77