DOI QR코드

DOI QR Code

Topic modeling for automatic classification of learner question and answer in teaching-learning support system

교수-학습지원시스템에서 학습자 질의응답 자동분류를 위한 토픽 모델링

  • Kim, Kyungrog (Department of Electronic Display Engineering, Hoseo University) ;
  • Song, Hye jin (Department of Computer Engineering, Hoseo University) ;
  • Moon, Nammee (Department of Computer Engineering, Hoseo University)
  • 김경록 (호서대학교 전자디스플레이공학부) ;
  • 송혜진 (호서대학교 컴퓨터정보공학부) ;
  • 문남미 (호서대학교 컴퓨터정보공학부)
  • Received : 2017.03.30
  • Accepted : 2017.04.25
  • Published : 2017.04.30

Abstract

There is increasing interest in text analysis based on unstructured data such as articles and comments, questions and answers. This is because they can be used to identify, evaluate, predict, and recommend features from unstructured text data, which is the opinion of people. The same holds true for TEL, where the MOOC service has evolved to automate debating, questioning and answering services based on the teaching-learning support system in order to generate question topics and to automatically classify the topics relevant to new questions based on question and answer data accumulated in the system. Therefore, in this study, we propose topic modeling using LDA to automatically classify new query topics. The proposed method enables the generation of a dictionary of question topics and the automatic classification of topics relevant to new questions. Experimentation showed high automatic classification of over 0.7 in some queries. The more new queries were included in the various topics, the better the automatic classification results.

기사와 댓글, 질의응답과 같은 비정형 데이터에 기반한 텍스트 분석에 대한 관심이 증가하고 있다. 이는 사람들의 견해인 비정형 텍스트 데이터로부터 특징을 파악하고, 평가, 예측 및 추천에 활용할 수 있기 때문이다. TEL 분야에서도 MOOC 서비스의 확대로 교수학습지원시스템 기반 토론, 질의응답 서비스를 자동화하기 위한 관심이 증가하고 있다. 시스템에 축적된 질의응답 데이터를 기반으로 질의 토픽을 생성하고, 새로운 질의에 대해 토픽을 자동분류하기 위해서이다. 따라서 본 연구에서는 새로운 질의 토픽을 자동분류 할 수 있도록 LDA기법을 활용한 토픽 모델링을 제안하고자 한다. 이를 바탕으로 질의 토픽 사전을 생성하고 새로운 질의에 대해 토픽을 자동분류 할 수 있다. 일부 질의에서는 0.7 이상의 높은 자동 분류를 보였으며, 새로운 질의가 여러 토픽에 포함될수록 좀 더 좋은 자동분류 결과를 보였다.

Keywords

References

  1. Hokyung Lee, Seon Yang, Youngjoong Ko. "Feature Expansion based on LDA Word Distribution for Performance Improvement of Informal Document Classification", Journal of Korea Institute of Information Scientists and Engineers, 2016
  2. Wang, Gang, et al. "Wisdom in the social crowd: an analysis of quora", Proceedings of the 22nd international conference on World Wide Web, ACM, 2013.
  3. Cerulo, Luigi, and Damiano Distante, "Topic-driven semi-automatic reorganization of online discussion forums: a case study in an e-learning context.", Global Engineering Education Conference (EDUCON), IEEE, 2013.
  4. Ezen-Can, Aysu, et al. "Unsupervised modeling for understanding MOOC discussion forums: a learning analytics approach", Proceedings of the fifth international conference on learning analytics and knowledge, ACM, 2015.
  5. Lee, Won-Jo, Oh, KyoJoong, and Choi, Ho-Jin. "Comparison Method of Topic Flows for Reusing Experience." 15th Korea Conference on Software Engineering. KIISE, 2013.
  6. Chang, J., Boyd-Graber, J. L., and Blei, D. M. "Connections Between the Lines: Augmenting Social Networks with Text." ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Paris, France, 2009.
  7. Rosen-Zvi, M., Griffiths, T. L., Steyvers, M., and Smyth, P., "The Author-Topic Model for Authors and Documents." Uncertainty in Artificial Intelligence (UAI), Baff, Canada, 2004.
  8. Hu, Yuening, et al. "Interactive topic modeling." Machine learning 95.3, 2014.
  9. Lee Sang Yeon, and Keon Myung Lee. "A Reply Graph-based Social Mining Method with Topic Modeling." Journal of Korean Institute of Intelligent Systems 24.6, 2014.
  10. Blei, David M., Andrew Y. Ng, and Michael I. Jordan, "Latent dirichlet allocation", Journal of machine learning research 3.Jan, 2003.
  11. Park, Jong Do. "A Study on Mapping Users' Topic Interest for Question Routing for Community-based Q&A Service", Journal of the Korean Society for information Management, 2015.
  12. Anoop, V. S., S. Asharaf, and P. Deepak, "Learning Concept Hierarchies through Probabilistic Topic Modeling", arXiv preprint arXiv:1611.09573, 2016.
  13. H Misra, O Cappe, and F Yvon, "Using LDA to detect semantically incoherent documents". In Proc. of CoNLL, pages 41-48, Manchester, England, 2008.
  14. Jeong Byeongki, Kim Jungwook, Yoon Janghyeok. "A Semantic Patent Analysis Approach to Identifying Trends of Convergence Technology : Application of Topic Modeling and Cross-impact Analysis." The Journal of Intellectual Property, 2016.
  15. Taemin Cho, Jee-Hyong Lee, "Latent Keyphrase Extraction Using LDA Model". Journal of Korean Institute of Intelligent Systems, 25(2), 2014.
  16. Lin, Yung-Shen, Jung-Yi Jiang, and Shie-Jue Lee, "A similarity measure for text classification and clustering", IEEE transactions on knowledge and data engineering , 2014.
  17. Sidorov, Grigori, et al. "Soft similarity and soft cosine measure: Similarity of features in vector space model." Computacion y Sistemas, 2014.
  18. Hokyung Lee, Seon Yang, Youngjoong Ko. "Feature Expansion based on LDA Word Distribution for Performance Improvement of Informal Document Classification". Journal of KIISE, 2016.
  19. Young-Sung Cho, Song-Chul Moon, Yeon S. Ahn. A Study of Recommending Service Using Mining Sequential Pattern based on Weight. Journal of Digital Contents Society, 15(6), p.711-719. 2014. https://doi.org/10.9728/dcs.2014.15.6.711
  20. TEH, Yee Whye, et al. Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes. In: NIPS. p. 1385-1392. 2004.

Cited by

  1. 토픽모델링과 시계열회귀분석을 활용한 정보시스템분야 연구동향 분석 vol.18, pp.6, 2017, https://doi.org/10.9728/dcs.2017.18.6.1143
  2. A Study on Factors Influencing the Intention to Use NFC Payment System for Public Transport - Focused on Ho Chi Minh Citizens in Vietnam vol.19, pp.3, 2017, https://doi.org/10.9728/dcs.2018.19.3.569
  3. 질의응답시스템에서 정답 특징에 관한 실험적 분석 vol.19, pp.5, 2017, https://doi.org/10.9728/dcs.2018.19.5.927
  4. 자율주행과 공간정보의 빅데이터 기반 연계성 분석을 통한 동향 및 예측에 관한 연구 vol.50, pp.2, 2017, https://doi.org/10.22640/lxsiri.2020.50.2.101