R&D Perspective Social Issue Packaging using Text Analysis

Wong, William Xiu Shun;Kim, Namgyu;

doi:10.9716/KITS.2016.15.3.071

Journal of Information Technology Services (한국IT서비스학회지)

Volume 15 Issue 3
/
Pages.71-95
/
2016
/
1975-4256(pISSN)

Korea Society of IT Services (한국IT서비스학회)

DOI QR Code

R&D Perspective Social Issue Packaging using Text Analysis

Wong, William Xiu Shun (Graduate School of Business Information Technology, Kookmin University) ;
Kim, Namgyu (School of MIS, Kookmin University)

Received : 2016.07.24
Accepted : 2016.08.29
Published : 2016.09.30

https://doi.org/10.9716/KITS.2016.15.3.071 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In recent years, text mining has been used to extract meaningful insights from the large volume of unstructured text data sets of various domains. As one of the most representative text mining applications, topic modeling has been widely used to extract main topics in the form of a set of keywords extracted from a large collection of documents. In general, topic modeling is performed according to the weighted frequency of words in a document corpus. However, general topic modeling cannot discover the relation between documents if the documents share only a few terms, although the documents are in fact strongly related from a particular perspective. For instance, a document about "sexual offense" and another document about "silver industry for aged persons" might not be classified into the same topic because they may not share many key terms. However, these two documents can be strongly related from the R&D perspective because some technologies, such as "RF Tag," "CCTV," and "Heart Rate Sensor," are core components of both "sexual offense" and "silver industry." Thus, in this study, we attempted to discover the differences between the results of general topic modeling and R&D perspective topic modeling. Furthermore, we package social issues from the R&D perspective and present a prototype system, which provides a package of news articles for each R&D issue. Finally, we analyze the quality of R&D perspective topic modeling and provide the results of inter- and intra-topic analysis.

Keywords

References

Aggarwal, C.C. and C. Zhai, "A Survey of Text Clustering Algorithms", Mining Text Data, Springer, US, 2012, 77-128.
Agrawal, R. and M. Batra, "A Detailed Study on Text Mining Techniques", International Journal of Soft Computing and Engineering, Vol.2, 2013, 2231-2307.
Aizawa, A., "An Information-Theoretic Perspective of TF-IDF Measures", Information Processing and Management, Vol.39, No.1, 2003, 45-65. https://doi.org/10.1016/S0306-4573(02)00021-3
Albright, R., "Taming Text with the SVD", SAS Institute Inc., 2004.
Bae, J., J. Shon, and M. Song, "Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques", Journal of Information Technology Applications and Management, Vol.19, Vol.3, 2013, 141-156. (배정환, 손지은, 송 민, "텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석", 지능정보연구, 제19권, 제3호, 2013, 141-156.)
Barbakh, W.A., Y. Wu, and C. Fyfe, Non-Standard Parameter Adaption for Exploratory Data Analysis, Springer, Berlin Heidelberg, 2009, 1-6.
Blei, D.M., A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation", Journal of Machine Learning Research, Vol.3, 2003, Chicago, 993-1022.
Cai, D., X. He, and J. Han, "Locally Consistent Concept Factorization for Document Clustering", IEEE Transactions on Knowledge and Data Engineering, Vol.23, No.6, 2011, 902-913. https://doi.org/10.1109/TKDE.2010.165
Carnerud, D., "Exploration of Text Mining Methodology through Investigation of QMODICQSS Proceedings", in QMOD-ICQSS Prague, Czech Republic, Sep, 3-5, 2014.
Fan, W., W. Wallace, S. Rich, and Z. Zhang, "Tapping the Power of Text Mining", Communications of the ACM, Vol.49, No.9, 2006, 76-82.
Feldman, R. and J. Sanger, The Text Mining Handbook : Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007.
Griffiths, T.L. and M. Steyvers, "Finding Scientific Topics", Proceedings of the National Academy of Sciences, 2004, 5228-5235.
Han, J., M. Kamber, and J. Pei, Data Mining : Concepts and Techniques, 3rd Edition, Morgan Kaufmann Publishers, 2011.
Hearst, M.A., "Untangling Text Data Mining", Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, 1999, 3-10.
Hofmann, T., "Unsupervised Learning by Probabilistic Latent Semantic Analysis", Machine Learning, Vol.42, No.1-2, 2001, 177-196. https://doi.org/10.1023/A:1007617005950
Holton, C., "Identifying Disgruntled Employee Systems Fraud Risk through Text Mining : A Simple Solution for a Multi-Billion Dollar Problem", Decision Support Systems, Vol.46, No.4, 2009, 853-864. https://doi.org/10.1016/j.dss.2008.11.013
Hong, D.S. and H.G. Kim, "From Big Data to Business Value : Emerging Technology Extraction and Analysis System Based on Korean Newspapers", Journal of the Korea Entertainment Industry Association, Vol.8, No.4, 2014, 285-292. https://doi.org/10.21184/jkeia.2014.12.8.4.285
Hyun, Y.J, H.J. Han, H.S. Choi, J.Y. Park, K.H. Lee, K.Y. Kwahk, and N.G. Kim, "Methodology using Text Analysis for Packaging R&D Information Services on Pending National Issues", Journal of Information Technology Applications and Management, Vol.20, 2013, 231-257. (현윤진, 한희준, 최희석, 박준형, 이규하, 곽기영, 김남규, "텍스트 분석을 활용한 국가 현안 대응 R&D 정보 패키징 방법론", 한국정보기술응용학회, 제20권, 제3호, 2013, 231-257.)
Jain, A.K., M.N. Murty, and P.J. Flynn, "Data Clustering : A Review", ACM Computing Surveys, Vol.31, No.3, 1999, 264-323. https://doi.org/10.1145/331499.331504
Kim, I., "The Value of Big Data and Strategy", 2012 Big Data Search Analysis Technology Insight, 2012.
Kim, J.E., N.G. Kim, and Y.H. Cho, "User-Perspective Issue Clustering Using Multi-layered Two-Mode Network Analysis", Journal of Interlligence and Information Systems, Vol. 20, No.2, 2014, 93-107. (김지은, 김남규, 조윤호, "다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링", 지능정보연구, Vol.20, 2014, 93-107.)
Liebowitz, J., Business Analytics : An Introduction, CRC Press, 2013.
Liu, B., Sentiment Analysis and Opinion Mining, Morgan and Claypool Publishers, 2012.
Lu, Y., Q. Mei, and C. Zhai, "Investigating Task Performance of Probabilistic Topic Models : An Empirical Study of PLSA and LDA", Information Retrieval, Vol.14, No.2, 2011, 178-203. https://doi.org/10.1007/s10791-010-9141-9
Macnamara, J.R., "Media Content Analysis : Its Uses, Benefits and Best Practice Methodology", Asia Pacific Public Relations Journal, Vol.6, No.1, 2003, 1-34.
Meyer, D., K. Hornik, and I. Feinerer, "Text Mining Infrastructure in R", Journal of Statistical Software, Vol.25, No.5, 2008, 1-54.
Mooney, R.J. and R. Bunescu, "Mining Knowledge from Text using Information Extraction", ACM SIGKDD Exploration, Vol.7, No.1, 2005, 3-10.
Myung, J.S., D.J. Lee, and S.G. Lee, "A Korean Product Review Analysis System using A Semi-Automatically Constructed Semantic Dictionary", Journal of KIISE : Software and Applications, Vol.35, No.6, 2008, 347-405. (명재석, 이동주, 이상구, "반자동으로 구축된 의미 사전을 이용한 한국어 상품평 분석 시스템", 정보과학회논문지 : 스프트웨어 및 응용, 제35권, 제6호, 2008, 347-405.)
Nagaraj, R., V. Thiagarasu, and P. Vijayakumar, "A Novel Semantic Level Text Classification by Combining NLP and Thesaurus Concepts", IOSR Journal of Computer Engineering, Vol.16, No.4, 2014, 14-26. https://doi.org/10.9790/0661-16461426
Newman, D.J. and S. Block, "Probabilistic Topic Decomposition of an Eighteenth? Century American Newspaper", Journal of the American Society for Information Science and Technology, Vol.57, No.6, 2006, 753-767. https://doi.org/10.1002/asi.20342
Ng, A.Y., M.I. Jordan, and Y. Weiss, "On Spectral Clustering : Analysis and an Algorithm", Advances in Neural Information Processing Systems, Vol.2, 2002, 849-856.
Provost, F. and T. Fawcett, Data Science for Business, O'Reilly Media, 2013.
Romero, C., S. Ventura, and E. Garcia, "Data Mining in Course Management Systems : Moodle Case Study and Tutorial", Computer and Education, Vol.51, No.1, 2008, 368-384. https://doi.org/10.1016/j.compedu.2007.05.016
Sebastiani, F., "Classification of Text, Automatic", The Encyclopedia of Language and Linguistics, (ed.2), Vol.14, Elsevier Science Pub, 2006.
Stanvrianou, A., P. Andritsos, and N. Nicoloyannis, "Overview and Semantic Issues of Text Mining", ACM SIGMOD Record, Vol.36, No.3, 2007, 23-24. https://doi.org/10.1145/1324185.1324190
Sun, Y., "A Text Mining Approach to Analyze Public Media Science Coverage and Public Interest in Science", International Journal of Machine Learning and Computing, Vol.4, No.6, 2014, 496. https://doi.org/10.7763/IJMLC.2014.V6.461
Tseng, Y.H., C.J. Lin, and Y.I. Lin, "Text Mining Techniques for Patent Analysis", Information Processing and Management, Vol.43, No.5, 2007, 1216-1247. https://doi.org/10.1016/j.ipm.2006.11.011
Wang, M.Y., D.S. Chang, and C.H. Kao, "Identifying Technology Trends for R&D Planning using TRIZ and Text Mining", R&D Management, Vol.40, No.5, 2010, 491-509. https://doi.org/10.1111/j.1467-9310.2010.00612.x
Wang, H. and Y. Ohsawa, "Innovation Support System for Creative Product Design Based on Chance Discovery", Expert Systems with Applications, Vol.39, No.5, 2012, 4890-4897. https://doi.org/10.1016/j.eswa.2011.10.002
Weiss, S.M., N. Indurkhya, and T. Zhang, Fundamentals of Predictive Text Mining, Springer, London, 2012.
Wong, W.X.S. and N. Kim, "Reorganizing Social Issues from R&D Perspective Using Social Network Analysis", Journal of Information Technology Application and Management, Vol.22, No.3, 2015, 83-103. https://doi.org/10.21219/JITAM.2015.22.3.083
Xie, P. and E.P. Xing, "Integrating Document Clustering and Topic Modeling", arXiv preprint arXiv : 1309.6874, 2013.
Xu, W. and Y. Gong, "Document Clustering by Concept Factorization", Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2004, 202-209.
Xu, W., X. Liu, and Y. Gong, "Document Clustering Based on Non-Negative Matrix Factorization", Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003, 267-273.
Yoon, J., "Detecting Weak Signals for Long-Term Business Opportunities Using Text Mining of Web News", Expert Systems with Applications, Vol.39, No.16, 2012, 12543-12550. https://doi.org/10.1016/j.eswa.2012.04.059

Journal of Information Technology Services (한국IT서비스학회지)

R&D Perspective Social Issue Packaging using Text Analysis

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)