DOI QR코드

DOI QR Code

Mining Intellectual History Using Unstructured Data Analytics to Classify Thoughts for Digital Humanities

디지털 인문학에서 비정형 데이터 분석을 이용한 사조 분류 방법

  • Received : 2017.11.10
  • Accepted : 2018.03.09
  • Published : 2018.03.31

Abstract

Information technology improves the efficiency of humanities research. In humanities research, information technology can be used to analyze a given topic or document automatically, facilitate connections to other ideas, and increase our understanding of intellectual history. We suggest a method to identify and automatically analyze the relationships between arguments contained in unstructured data collected from humanities writings such as books, papers, and articles. Our method, which is called history mining, reveals influential relationships between arguments and the philosophers who present them. We utilize several classification algorithms, including a deep learning method. To verify the performance of the methodology proposed in this paper, empiricists and rationalism - related philosophers were collected from among the philosophical specimens and collected related writings or articles accessible on the internet. The performance of the classification algorithm was measured by Recall, Precision, F-Score and Elapsed Time. DNN, Random Forest, and Ensemble showed better performance than other algorithms. Using the selected classification algorithm, we classified rationalism or empiricism into the writings of specific philosophers, and generated the history map considering the philosopher's year of activity.

최근 디지털 인문학 (Digital humanities) 연구분야의 등장으로 정보기술을 활용하여 인문학 연구의 효율성 제고에 기여하고 있다. 특히 인문학 연구에서 특정한 인물 혹은 문서가 어떠한 사상 (idea)을 담고 있는지와 다른 사상과의 어떤 연결성을 가지는지를 자동적인 방법으로 분석하는 것은 지성사(intellectual history)를 파악하는 데 중요한 도전이 될 것이다. 본 연구의 목적은 책이나 논문, 기사와 같은 비정형 데이터 (unstructured data)에 포함된 주장을 파악하고 이를 다른 주장이나 사상과 어떠한 관련이 있는지를 자동으로 분석하는 방법을 제안하는 것이다. 특히 본 연구에서는 주장과 주장 사이의 영향관계를 밝히는 히스토리 마이닝 (History Mining)이라는 방법도 제안하였다. 이를 위해 딥러닝 기법 (deep learning method)을 포함한 분류알고리즘 기법 (classification algorithm)을 활용하였다. 본 연구가 제안하는 방법론의 성능을 검증하기 위하여 철학 사조 중에서 대표적으로 대비되는 경험주의와 합리주의 관련 철학자들을 선정하고 관련된 저서 혹은 인터넷 상의 글을 수집하였다. 분류 알고리즘의 성능은 Recall, Precision, F-Score 및 Elapsed Time으로 측정하였으며 DNN, Random Forest, 그리고 앙상블 등이 우수한 성능을 보였다. 선정된 분류 알고리즘으로 특정 철학자의 글에 대해 합리주의 혹은 경험주의로 분류하였으며, 그 철학자의 활동 연도를 고려하여 히스토리 맵을 생성할 수 있었다.

Keywords

References

  1. Akbani, R., S. Kwek, and N. Japkowicz, "Applying Support Vector Machines to Imbalanced Datasets," Machine Learning: ECML, (2004), 39-50.
  2. Alghoson, A. M., "Medical Document Classification Based on MeSH," System Sciences (HICSS), 2014 47th Hawaii International Conference, IEEE (2014), 2571-2575.
  3. Ananiadou, S., B. Rea, N. Okazaki, R. Procter, and J. Thomas, "Supporting Systematic Reviews using Text Mining," Social Science Computer Review, Vol.27, No.1 (2009), 509-523. https://doi.org/10.1177/0894439309332293
  4. Antonie, M. L. and O. R. Zaiane, "Text Document Categorization by Term Association," Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference, (2002), 19-26.
  5. Bae, J. and B. Watson, "Reinforcing Visual Grouping Cues to Communicate Complex Informational Structure," IEEE Transactions on Visualization and Computer Graphics, Vol.20, No.12 (2014), 1973-1982. https://doi.org/10.1109/TVCG.2014.2346998
  6. Bederson, B. B, "PhotoMesa: A Zoomable Image Browser Using Quantum Treemaps and Bubblemaps." Proceedings of the Fourteenth Annual ACM Symposium on User Interface Software and Technology, (2001), 71-80.
  7. Berry, D., "The Computational Turn: Thinking about the Digital Humanities," Culture Machine, Vol.12 (2011).
  8. Berry, D. M., E. Borra, A. Helmond, J. C. Plantin, and J. W. Rettberg, "The Data Sprint Approach: Exploring the Field of Digital Humanities through Amazon's Application Programming Interface," Digital Humanities Quarterly, Vol.9, No.4, (2015).
  9. Blei, D. M., A. Y. Ng and M. I. Jordan, "Latent Dirichlet Allocation," Journal of machine Learning research, Vol.3 (2003), 993-1022.
  10. Bouras, C., and V. Tsogkas, "Improving Text Summarization using Noun Retrieval Techniques," International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (2008), 593-600.
  11. Carr, O. and D. Estival, "Document Classification in Structured Military Messages," Proceedings of the Australasian Language Technology Workshop 2003, (2003), 134-142.
  12. Chen, D., H. M. Muller, and P. W. Sternberg, "Automatic Document Classification of Biological Literature," BMC bioinformatics, Vol.7, No.1 (2006), 370. https://doi.org/10.1186/1471-2105-7-370
  13. Chen, Y., Y. Sun, and B. Q. Han, "Improving Classification of Protein Interaction Articles using Context Similarity-Based Feature Selection," BioMed research international, Vol.2015 (2015).
  14. Choi, S., J. Jeon, B. Subrata, and O. Kwon, "An Efficient Estimation of Place Brand Image Power based on Text Mining Technology," Journal of Intelligence and Information Systems, Vol.21, No.2 (2015), 113-129. (최석재, 전종식, 권오병, "텍스트마이닝 기반의 효율적인 장소 브랜드 이미지 강도 측정 방법," 지능정보연구, Vol.21, No.2 (2015), 113-129.) https://doi.org/10.13088/jiis.2015.21.2.113
  15. Christians, C. G., "Utilitarianism in Media Ethics and Its Discontents," Journal of Mass Media Ethics, Vol.22, No.2-3 (2007), 113-131. https://doi.org/10.1080/08900520701315640
  16. Cohen, M. R., "Hegel's Rationalism," The Philosophical Review, Vol.41, No.3 (1932), 283-301. https://doi.org/10.2307/2179785
  17. Cross, W. R., The Burned-over District: The Social and Intellectual History of Enthusiastic Religion in Western New York, 1800-1850, Cornell University Press, New York, 2015.
  18. Dasgupta, A., P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney, "Feature Selection Methods for Text Classification," Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, (2007), 230-239.
  19. Dhillon, I. S., and D. S. Modha, "Concept Decompositions for Large Sparse Text Data using Clustering," Machine learning, Vol.42, No.1 (2001), 143-175. https://doi.org/10.1023/A:1007612920971
  20. Dodds, E. R., "Plato and the Irrational," The Journal of Hellenic Studies, Vol.65, (1945), 16-25. https://doi.org/10.2307/626336
  21. Edelstein, D., "Intellectual History and Digital Humanities," Modern Intellectual History, Vol.13, No.1 (2016), 237-246. https://doi.org/10.1017/S1479244314000833
  22. Fung, G. P. C., J. X. Yu, H. Wang, D. W. Cheung, and H. Liu, "A Balanced Ensemble Approach to Weighting Classifiers for Text Classification," Data Mining, 2006. ICDM'06. Sixth International Conference, (2006), 869-873.
  23. Gainor, R., S. Sinclair, S. Ruecker, M. Patey, and S. Gabriele, "A Mandala Browser User Study: Visualizing XML Versions of Shakespeare's Plays," Visible Language, Vol.43, No.1 (2009), 60.
  24. Gold, M. K., Debates in the Digital Humanities, U of Minnesota Press, London, 2012.
  25. Golob, U., M. Lah, and Z. Jancic, "Value Orientations and Consumer Expectations of Corporate Social Responsibility," Journal of Marketing Communications, Vol.14, No.2 (2008), 83-96. https://doi.org/10.1080/13527260701856525
  26. Gonzalez, R. F., and C. McMillian, "The Universality of American Management Philosophy," Academy of Management Journal, Vol.4, No.1 (1961), 33-41. https://doi.org/10.2307/254586
  27. Hall, P., Cities of Tomorrow: An Intellectual History of Urban Planning and Design Since 1880, John Wiley & Sons, Hoboken, 2014.
  28. Han, B., Z. Obradovic, Z. Z. Hu, C. H. Wu, and S. Vucetic, "Substring Selection for Biomedical Document Classification," Bioinformatics, Vol.22, No.17 (2006), 2136-2142. https://doi.org/10.1093/bioinformatics/btl350
  29. Higham, J., "Intellectual History and its Neighbors," Journal of the History of Ideas, Vol.15, No.3 (1954), 339-347. https://doi.org/10.2307/2707758
  30. Hossain, F. A., "A Critical Analysis of Empiricism," Open Journal of Philosophy, Vol.4, No.3 (2014), 225-230. https://doi.org/10.4236/ojpp.2014.43030
  31. Hotho, A., A. Nurnberger, and G. PaaB., "A Brief Survey of Text Mining," In Ldv Forum, Vol.20, No.1, (2005), 19-62.
  32. Huang, A. "Similarity Measures for Text Document Clustering," Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), Christchurch, New Zealand, (2008), 49-56.
  33. Hunnicutt, B. J., and M. Krzywinski, "Points of View: Pathways," Nature methods, Vol.13, No.1 (2016), 5-5. https://doi.org/10.1038/nmeth.3699
  34. Jessop, M., "Digital Visualization as a Scholarly Activity," Literary and Linguistic Computing, Vol.23, No.3 (2008), 281-293. https://doi.org/10.1093/llc/fqn016
  35. Jessop, M., "The Inhibition of Geographical Information in Digital Humanities Scholarship," Literary and Linguistic Computing, Vol.23, No.1 (2007), 39-50. https://doi.org/10.1093/llc/fqm041
  36. Jindal, R., R. Malhotra, and A. Jain, "Techniques for Text Classification: Literature Review and Current Trends," Webology, Vol.12, No.2, (2015), 1-28.
  37. Kerber, L. K., Toward an Intellectual History of Women: Essays by Linda K. Kerber, UNC Press Books, North Carolina, 2014.
  38. Kim, J. and O. Kwon, "A Method of Predicting Service Time based on Voice of Customer Data," Journal of the Korea society of IT services, Vol. 15 (2016), 197-210. (김정훈, 권오병, "고객의 소리 (VOC) 데이터를 활용한 서비스 처리 시간 예측방법," 한국IT 서비스학회지, Vol.15 (2016), 197-210.) https://doi.org/10.9716/KITS.2016.15.1.197
  39. Korde, V. and C. N. Mahender, "Text Classification and Classifiers: A Survey," International Journal of Artificial Intelligence & Applications, Vol.3, No.2 (2012), 85. https://doi.org/10.5121/ijaia.2012.3208
  40. Lauxtermann, P. F. H., "Hegel and Schopenhauer as Partisans of Goethe's Theory of Color," Journal of the History of Ideas, Vol.51, No.4 (1990), 599-624. https://doi.org/10.2307/2709648
  41. Kwon, O. and J. S. Lee, "Smarter Classification for Imbalanced Data Set and Its Application to Patent Evaluation," Journal of Intelligence and Information Systems, Vol.20, No.1 (2014), 15-34. (권오병, 이상연, "불균형 데이터 집합에 대한 스마트 분류방법과 특허 평가에의 응용," 지능정보연구, Vol.20, No.1 (2014), 15-34.) https://doi.org/10.13088/jiis.2014.20.1.015
  42. Lee, H., Jin, Y., & Kwon, O. "Investigating the Impact of Corporate Social Responsibility on Firm's Short-and Long-Term Performance with Online Text Analytics," Journal of Intelligence and Information Systems, Vol. 22, No.2 (2016), 13-31. https://doi.org/10.13088/jiis.2016.22.2.013
  43. Lin, Y. W., "Transdisciplinarity and Digital Humanities: Lessons Learned from Developing Text-Mining Tools for Textual Analysis," Understanding Digital Humanities, (2012), 295-314.
  44. Lord, G., M. N. Smith, M. G. Kirschenbaum, T. Clement, Auvil, L. Auvil, J. Rose, B. Yu, and C. Plaisant., "Exploring Erotics in Emily Dickinson's Correspondence with Text Mining and Visual Interfaces," Digital Libraries, 2006. JCDL'06. Proceedings of the 6th ACM/IEEE-CS Joint Conference, (2006), 141-150.
  45. Martin, M. Proposal for a Digital Humanities, Center at Princeton University, 2013. Available at https://digitalhumanities.princeton.edu/files/2013/08/Proposal-for-a-Digital-Humanities-Center-at-Princeton-University3.11.pdf. (Downloaded 21 January, 2017).
  46. Michura, Piotr, S. Ruecker, M. Radzikowska, and C. Fiorentino, "The Novel as a List of Words." The Potential and Limitations of a List: An International Transdisciplinary Workshop. Center for Theoretical Study, Charles U and Philosophical Inst. of the Acad. of the Sciences of the Czech Republic, 2007.
  47. Moniz, A., and F Jong, "Sentiment Analysis and the Impact of Employee Satisfaction on Firm Earnings," In European Conference on Information Retrieval (2014), 519-527.
  48. Moro, S., P. Cortez, and P. Rita, "Business Intelligence in Banking: A Literature Analysis from 2002 to 2013 using Text Mining and Latent Dirichlet Allocation," Expert Systems with Applications, Vol.42, No.3 (2015), 1314-1324. https://doi.org/10.1016/j.eswa.2014.09.024
  49. Nelson, R. K., "Digital Humanities as Appendix," American Quarterly, Vol.68, No.1 (2016), 131-136. https://doi.org/10.1353/aq.2016.0001
  50. Olivecrona, K., "The Will of the Sovereign: Some Reflections on Bentham's Concept of a Law," The American Journal of Jurisprudence, Vol.20, No.1 (1975), 95-110. https://doi.org/10.1093/ajj/20.1.95
  51. Powell, R. J., An Experimental Examination of Visual Grouping Techniques in Skip Patterns on Respondent Navigation Errors, University of Nebraska - Lincoln, 2016, Available at http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1008&context=sramdiss (Downloaded 21 January, 2017).
  52. Roberts-Smith, J., S. DeSAouza-Coelho, T. M. Dobson, S. Gabriele, O. Rodriguez-Arenas, S. Ruecker, and D. Jakacki, "Visualizing Theatrical Text: From Watching the Script to the Simulated Environment for Theatre (SET)," Digital Humanities Quarterly, Vol.7, No.3, (2013).
  53. Rosa, K. D., J. Ellen, "Text Classification Methodologies Applied to Micro-text in Military Chat," Machine Learning and Applications, 2009. ICMLA'09. International Conference, (2009), 710-714.
  54. Ross, S., amd J. Sayers, "Modernism Meets Digital Humanities," Literature Compass, Vol.11, No.9 (2014), 625-633. https://doi.org/10.1111/lic3.12174
  55. Sattelmeyer, R. Thoreau's Reading: A Study in Intellectual History with Bibliographical Catalogue, Princeton University Press, New Jersey, 2014.
  56. Schreibman, S., R. Siemens, and J. Unsworth. Introduction, in Schreibman et al. (eds.) A Companion to Digital Humanities. Oxford: Blackwell, 2004.
  57. Sculley, D. and B. M. Pasanek, "Meaning and Mining: the Impact of Implicit Assumptions in Data Mining for the Humanities," Literary and Linguistic Computing, Vol.23, No.4 (2008), 409-424. https://doi.org/10.1093/llc/fqn019
  58. Sebastiani, F., "Machine Learning in Automated Text Categorization," ACM Computing Surveys, Vol.34, No.1 (2002), 1-47. https://doi.org/10.1145/505282.505283
  59. Sinclair, S., S. Ruecker, and M. Radzikowska, "Information Visualization for Humanities Scholars," Literary Studies in the Digital Age-An Evolving Anthology, (2013)
  60. Sinclair, S., D. Sondheim, C. Warwick, and J. Windsor, "Introduction to Designing Interactive Reading Environments for the Online Scholarly Edition," Digital Humanities 2012, (2012), 36.
  61. Skorupski, J., The Place of Utilitarianism in Mill's Philosophy. Utilitarianism, Wiley-Blackwell, New Jersey, 2008.
  62. Small, H. G., "Cited Documents as Concept Symbols," Social Studies of Science, Vol.8, No.3 (1978), 327-340. https://doi.org/10.1177/030631277800800305
  63. Stiltner, B., "Who can Understand Abraham? The Relation of God and Morality in Kierkegaard and Aquinas," The Journal of Religious Ethics, Vol.12, No.2 (1993), 221-245.
  64. Thomas, J., J. McNaught, and S. Ananiadou, "Applications of Text Mining within Systematic Reviews," Research Synthesis Methods, Vol.2, No.1 (2011), 1-14. https://doi.org/10.1002/jrsm.27
  65. Vanzo, A., "Kant on Empiricism and Rationalism," History of Philosophy Quarterly, Vol.30, No.1 (2013), 53-74.
  66. Wang, T. Y. and H. M. Chiang, "Solving Multi-Label Text Categorization Problem using Support Vector Machine Approach with Membership Function," Neurocomputing, Vol.74, No.17 (2011), 3682-3689. https://doi.org/10.1016/j.neucom.2011.07.001
  67. Wilkens, M., "Digital Humanities and Its Application in the Study of Literature and Culture," Comparative Literature, Vol.67, No.1 (2015), 11-20. https://doi.org/10.1215/00104124-2861911
  68. Xia, R., C. Zong, and S. Li, "Ensemble of Feature Sets and Classification Algorithms for Sentiment Classification. Information Sciences, Vol.181, No.6 (2011), 1138-1152. https://doi.org/10.1016/j.ins.2010.11.023
  69. Yadav, K., E. Sarioglu, M. Smith, H. A. Choi, and C. D. Newgard, "Automated Outcome Classification of Emergency Department Computed Tomography Imaging Reports," Academic Emergency Medicine, Vol.20, No.8 (2013), 848-854. https://doi.org/10.1111/acem.12174
  70. Yano, H., Y. Nakajima, K. Ueda, and G. B. Remijn, "The Effect of Sound on Visual Grouping in a Multi-Stable Stimulus," International Journal of Psychology, Vol.51, (2016), 1027.
  71. Yoo, K. H. and U. Gretzel, "What Motivates Consumers to Write Online Travel Reviews?," Information Technology & Tourism, Vol.10, No.4 (2008), 283-295. https://doi.org/10.3727/109830508788403114
  72. Yu, B., "An Evaluation of Text Classification Methods for Literary Study," Literary and Linguistic Computing, Vol.23, No.3 (2008), 327-343. https://doi.org/10.1093/llc/fqn015