DOI QR코드

DOI QR Code

R&D Perspective Social Issue Packaging using Text Analysis

  • Received : 2016.07.24
  • Accepted : 2016.08.29
  • Published : 2016.09.30

Abstract

In recent years, text mining has been used to extract meaningful insights from the large volume of unstructured text data sets of various domains. As one of the most representative text mining applications, topic modeling has been widely used to extract main topics in the form of a set of keywords extracted from a large collection of documents. In general, topic modeling is performed according to the weighted frequency of words in a document corpus. However, general topic modeling cannot discover the relation between documents if the documents share only a few terms, although the documents are in fact strongly related from a particular perspective. For instance, a document about "sexual offense" and another document about "silver industry for aged persons" might not be classified into the same topic because they may not share many key terms. However, these two documents can be strongly related from the R&D perspective because some technologies, such as "RF Tag," "CCTV," and "Heart Rate Sensor," are core components of both "sexual offense" and "silver industry." Thus, in this study, we attempted to discover the differences between the results of general topic modeling and R&D perspective topic modeling. Furthermore, we package social issues from the R&D perspective and present a prototype system, which provides a package of news articles for each R&D issue. Finally, we analyze the quality of R&D perspective topic modeling and provide the results of inter- and intra-topic analysis.

Keywords

References

  1. Aggarwal, C.C. and C. Zhai, "A Survey of Text Clustering Algorithms", Mining Text Data, Springer, US, 2012, 77-128.
  2. Agrawal, R. and M. Batra, "A Detailed Study on Text Mining Techniques", International Journal of Soft Computing and Engineering, Vol.2, 2013, 2231-2307.
  3. Aizawa, A., "An Information-Theoretic Perspective of TF-IDF Measures", Information Processing and Management, Vol.39, No.1, 2003, 45-65. https://doi.org/10.1016/S0306-4573(02)00021-3
  4. Albright, R., "Taming Text with the SVD", SAS Institute Inc., 2004.
  5. Bae, J., J. Shon, and M. Song, "Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques", Journal of Information Technology Applications and Management, Vol.19, Vol.3, 2013, 141-156. (배정환, 손지은, 송 민, "텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석", 지능정보연구, 제19권, 제3호, 2013, 141-156.)
  6. Barbakh, W.A., Y. Wu, and C. Fyfe, Non-Standard Parameter Adaption for Exploratory Data Analysis, Springer, Berlin Heidelberg, 2009, 1-6.
  7. Blei, D.M., A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation", Journal of Machine Learning Research, Vol.3, 2003, Chicago, 993-1022.
  8. Cai, D., X. He, and J. Han, "Locally Consistent Concept Factorization for Document Clustering", IEEE Transactions on Knowledge and Data Engineering, Vol.23, No.6, 2011, 902-913. https://doi.org/10.1109/TKDE.2010.165
  9. Carnerud, D., "Exploration of Text Mining Methodology through Investigation of QMODICQSS Proceedings", in QMOD-ICQSS Prague, Czech Republic, Sep, 3-5, 2014.
  10. Fan, W., W. Wallace, S. Rich, and Z. Zhang, "Tapping the Power of Text Mining", Communications of the ACM, Vol.49, No.9, 2006, 76-82.
  11. Feldman, R. and J. Sanger, The Text Mining Handbook : Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007.
  12. Griffiths, T.L. and M. Steyvers, "Finding Scientific Topics", Proceedings of the National Academy of Sciences, 2004, 5228-5235.
  13. Han, J., M. Kamber, and J. Pei, Data Mining : Concepts and Techniques, 3rd Edition, Morgan Kaufmann Publishers, 2011.
  14. Hearst, M.A., "Untangling Text Data Mining", Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, 1999, 3-10.
  15. Hofmann, T., "Unsupervised Learning by Probabilistic Latent Semantic Analysis", Machine Learning, Vol.42, No.1-2, 2001, 177-196. https://doi.org/10.1023/A:1007617005950
  16. Holton, C., "Identifying Disgruntled Employee Systems Fraud Risk through Text Mining : A Simple Solution for a Multi-Billion Dollar Problem", Decision Support Systems, Vol.46, No.4, 2009, 853-864. https://doi.org/10.1016/j.dss.2008.11.013
  17. Hong, D.S. and H.G. Kim, "From Big Data to Business Value : Emerging Technology Extraction and Analysis System Based on Korean Newspapers", Journal of the Korea Entertainment Industry Association, Vol.8, No.4, 2014, 285-292. https://doi.org/10.21184/jkeia.2014.12.8.4.285
  18. Hyun, Y.J, H.J. Han, H.S. Choi, J.Y. Park, K.H. Lee, K.Y. Kwahk, and N.G. Kim, "Methodology using Text Analysis for Packaging R&D Information Services on Pending National Issues", Journal of Information Technology Applications and Management, Vol.20, 2013, 231-257. (현윤진, 한희준, 최희석, 박준형, 이규하, 곽기영, 김남규, "텍스트 분석을 활용한 국가 현안 대응 R&D 정보 패키징 방법론", 한국정보기술응용학회, 제20권, 제3호, 2013, 231-257.)
  19. Jain, A.K., M.N. Murty, and P.J. Flynn, "Data Clustering : A Review", ACM Computing Surveys, Vol.31, No.3, 1999, 264-323. https://doi.org/10.1145/331499.331504
  20. Kim, I., "The Value of Big Data and Strategy", 2012 Big Data Search Analysis Technology Insight, 2012.
  21. Kim, J.E., N.G. Kim, and Y.H. Cho, "User-Perspective Issue Clustering Using Multi-layered Two-Mode Network Analysis", Journal of Interlligence and Information Systems, Vol. 20, No.2, 2014, 93-107. (김지은, 김남규, 조윤호, "다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링", 지능정보연구, Vol.20, 2014, 93-107.)
  22. Liebowitz, J., Business Analytics : An Introduction, CRC Press, 2013.
  23. Liu, B., Sentiment Analysis and Opinion Mining, Morgan and Claypool Publishers, 2012.
  24. Lu, Y., Q. Mei, and C. Zhai, "Investigating Task Performance of Probabilistic Topic Models : An Empirical Study of PLSA and LDA", Information Retrieval, Vol.14, No.2, 2011, 178-203. https://doi.org/10.1007/s10791-010-9141-9
  25. Macnamara, J.R., "Media Content Analysis : Its Uses, Benefits and Best Practice Methodology", Asia Pacific Public Relations Journal, Vol.6, No.1, 2003, 1-34.
  26. Meyer, D., K. Hornik, and I. Feinerer, "Text Mining Infrastructure in R", Journal of Statistical Software, Vol.25, No.5, 2008, 1-54.
  27. Mooney, R.J. and R. Bunescu, "Mining Knowledge from Text using Information Extraction", ACM SIGKDD Exploration, Vol.7, No.1, 2005, 3-10.
  28. Myung, J.S., D.J. Lee, and S.G. Lee, "A Korean Product Review Analysis System using A Semi-Automatically Constructed Semantic Dictionary", Journal of KIISE : Software and Applications, Vol.35, No.6, 2008, 347-405. (명재석, 이동주, 이상구, "반자동으로 구축된 의미 사전을 이용한 한국어 상품평 분석 시스템", 정보과학회논문지 : 스프트웨어 및 응용, 제35권, 제6호, 2008, 347-405.)
  29. Nagaraj, R., V. Thiagarasu, and P. Vijayakumar, "A Novel Semantic Level Text Classification by Combining NLP and Thesaurus Concepts", IOSR Journal of Computer Engineering, Vol.16, No.4, 2014, 14-26. https://doi.org/10.9790/0661-16461426
  30. Newman, D.J. and S. Block, "Probabilistic Topic Decomposition of an Eighteenth? Century American Newspaper", Journal of the American Society for Information Science and Technology, Vol.57, No.6, 2006, 753-767. https://doi.org/10.1002/asi.20342
  31. Ng, A.Y., M.I. Jordan, and Y. Weiss, "On Spectral Clustering : Analysis and an Algorithm", Advances in Neural Information Processing Systems, Vol.2, 2002, 849-856.
  32. Provost, F. and T. Fawcett, Data Science for Business, O'Reilly Media, 2013.
  33. Romero, C., S. Ventura, and E. Garcia, "Data Mining in Course Management Systems : Moodle Case Study and Tutorial", Computer and Education, Vol.51, No.1, 2008, 368-384. https://doi.org/10.1016/j.compedu.2007.05.016
  34. Sebastiani, F., "Classification of Text, Automatic", The Encyclopedia of Language and Linguistics, (ed.2), Vol.14, Elsevier Science Pub, 2006.
  35. Stanvrianou, A., P. Andritsos, and N. Nicoloyannis, "Overview and Semantic Issues of Text Mining", ACM SIGMOD Record, Vol.36, No.3, 2007, 23-24. https://doi.org/10.1145/1324185.1324190
  36. Sun, Y., "A Text Mining Approach to Analyze Public Media Science Coverage and Public Interest in Science", International Journal of Machine Learning and Computing, Vol.4, No.6, 2014, 496. https://doi.org/10.7763/IJMLC.2014.V6.461
  37. Tseng, Y.H., C.J. Lin, and Y.I. Lin, "Text Mining Techniques for Patent Analysis", Information Processing and Management, Vol.43, No.5, 2007, 1216-1247. https://doi.org/10.1016/j.ipm.2006.11.011
  38. Wang, M.Y., D.S. Chang, and C.H. Kao, "Identifying Technology Trends for R&D Planning using TRIZ and Text Mining", R&D Management, Vol.40, No.5, 2010, 491-509. https://doi.org/10.1111/j.1467-9310.2010.00612.x
  39. Wang, H. and Y. Ohsawa, "Innovation Support System for Creative Product Design Based on Chance Discovery", Expert Systems with Applications, Vol.39, No.5, 2012, 4890-4897. https://doi.org/10.1016/j.eswa.2011.10.002
  40. Weiss, S.M., N. Indurkhya, and T. Zhang, Fundamentals of Predictive Text Mining, Springer, London, 2012.
  41. Wong, W.X.S. and N. Kim, "Reorganizing Social Issues from R&D Perspective Using Social Network Analysis", Journal of Information Technology Application and Management, Vol.22, No.3, 2015, 83-103. https://doi.org/10.21219/JITAM.2015.22.3.083
  42. Xie, P. and E.P. Xing, "Integrating Document Clustering and Topic Modeling", arXiv preprint arXiv : 1309.6874, 2013.
  43. Xu, W. and Y. Gong, "Document Clustering by Concept Factorization", Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2004, 202-209.
  44. Xu, W., X. Liu, and Y. Gong, "Document Clustering Based on Non-Negative Matrix Factorization", Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003, 267-273.
  45. Yoon, J., "Detecting Weak Signals for Long-Term Business Opportunities Using Text Mining of Web News", Expert Systems with Applications, Vol.39, No.16, 2012, 12543-12550. https://doi.org/10.1016/j.eswa.2012.04.059