DOI QR코드

DOI QR Code

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model

텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론

  • 허고은 (연세대학교 문헌정보학과 대학원) ;
  • 송민 (연세대학교 문헌정보학과)
  • Received : 2014.02.20
  • Accepted : 2014.03.13
  • Published : 2014.03.30

Abstract

Due to the recent development of Information and Communication Technologies (ICT), the amount of research publications has increased exponentially. In response to this rapid growth, the demand of automated text processing methods has risen to deal with massive amount of text data. Biomedical text mining discovering hidden biological meanings and treatments from biomedical literatures becomes a pivotal methodology and it helps medical disciplines reduce the time and cost. Many researchers have conducted literature-based discovery studies to generate new hypotheses. However, existing approaches either require intensive manual process of during the procedures or a semi-automatic procedure to find and select biomedical entities. In addition, they had limitations of showing one dimension that is, the cause-and-effect relationship between two concepts. Thus;this study proposed a novel approach to discover various relationships among source and target concepts and their intermediate concepts by expanding intermediate concepts to multi-levels. This study provided distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

정보통신기술의 발달로 학술 정보의 양이 기하급수적으로 증가하였고 방대한 양의 텍스트 데이터를 처리하기 위한 자동화된 텍스트 처리의 필요성이 대두되었다. 생의학 문헌에서 생물학적 의미와 치료 효과 등에 대한 정보를 발견해내는 바이오 텍스트 마이닝은 문헌 내의 각 개념들 간의 유의미한 연관성을 발견하여 의학 영역에서 상당한 시간과 비용을 줄여준다. 문헌 기반 발견 연구로 새로운 생의학적 가설들이 발견되었지만 기존의 연구들은 반자동화된 기법으로 전문가의 개입이 필수적이며 원인과 결과의 한가지의 관계만을 밝히는 제한점이 있다. 따라서 본 연구에서는 중간 개념인 B를 다수준으로 확장하여 다양한 관계성을 동시출현 개체와 동사 추출을 통해 확인한다. 그래프 기반의 경로 추론을 통해 각 노드 사이의 관계성을 체계적으로 분석하여 규명할 수 있었으며 새로운 방법론적 시도를 통해 기존에 밝혀지지 않았던 새로운 가설 제시의 가능성을 기대할 수 있다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. Automatic Classification for English Verbs. (2013, July 1). Retrieved from http://www.cl.cam.ac.uk/-ls418/resource_release/
  2. Cameron, D., Bodenreider, O., Yalamanchili, H., Danh, T., Vallabhaneni, S., Thirunarayan, K., Sheth, A. P., & Rindflesch, T. C. (2013). A graph-based recovery and decomposition of swanson's hypothesis using semantic predications. Journal of Biomedical Informatics, 46(2), 238-251. https://doi.org/10.1016/j.jbi.2012.09.004
  3. DiGiacomo, R. A., Kremer, J. M., & Shah, D. M. (1989). Fish oil dietary supplementation in patients with Raynaud's phenomenon: A doubleblind, controlled, prospective study. American Journal of Medicine, 8, 158-164.
  4. Frijters, R., Heupers, B., van Beek, P., Bouwhuis, M., van Schaik, R., de Vlieg, J., Polman, J., & Alkema, W. (2008). CoPub: a literature-based keyword enrichment tool for microarray data analysis. Nucleic Acids Research, 36(suppl 2), W406-W410. https://doi.org/10.1093/nar/gkn215
  5. Frijters, R., van Vugt, M., Smeets, R., van Schaik, R., de Vlieg, J., & Alkema, W. (2010). Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Computational Biology, 6(9), 1-11. e1000943.
  6. Hristovski, D., Friedman, C., Rindflesch, T. C., & Peterlin, B. (2006). Exploiting semantic relations for literature-based discovery. In AMIA Annual Symposium Proceedings, 349-353. American Medical Informatics Association.
  7. Hristovski, D., Peterlin, B., Mitchell, J. A., & Humphrey, S. M. (2005). Using literature-based discovery to identify disease candidate genes. International Journal of Medical Informatics, 74(2), 289-298. https://doi.org/10.1016/j.ijmedinf.2004.04.024
  8. Hristovski, D., Rindflesch, T., & Peterlin, B. (2013). Using literature-based discovery to identify novel therapeutic approaches. Cardiovascular and Hematological Agents in Medicinal Chemistry, 11(1), 14-24. https://doi.org/10.2174/1871525711311010005
  9. Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., & Rindflesch, T. C. (2012). SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics, 28(23), 3158-3160. https://doi.org/10.1093/bioinformatics/bts591
  10. Kim, J. D., Ohta, T., Tateisi, Y., & Tsujii, J. (2003). GENIA corpus-a semantically annotated corpus for bio-textmining. Bioinformatics, 19(1), 180-182. https://doi.org/10.1093/bioinformatics/btg1023
  11. Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, 282-289.
  12. Liekens, A. M., De Knijf, J., Daelemans, W., Goethals, B., De Rijk, P., & Del-Favero, J. (2011). BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation. Genome Biology, 12(6), R57. https://doi.org/10.1186/gb-2011-12-6-r57
  13. LingPipe: Named entity tutorial. (2013, July 1). Retrieved from http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html/
  14. LingPipe: Sentence boundary detection. (2013, July 1). Retrieved from http://alias-i.com/lingpipe/demos/tutorial/sentences/read-me.html/
  15. MEDLINE, PubMed XML element descriptions and their attributes. (2013, October 10). Retrieved from http://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html/
  16. Narayanasamy, V., Mukhopadhyay, S., Palakal, M., & Potter, D. A. (2004). TransMiner: Mining transitive associations among biological objects from text. Journal of Biomedical Science, 11(6), 864-873. https://doi.org/10.1007/BF02254372
  17. NegEx (2013, December 1). Retrieved from http://code.google.com/p/negex/
  18. PubMed (2013, August 2). Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/
  19. Smalheiser, N. R., & Swanson, D. R. (1994). Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease. Neuroscience Research Communications, 15(1), 1-9.
  20. Smalheiser, N. R., & Swanson, D. R. (1996a). Indomethacin and Alzheimer's disease. Neurology, 46(2), 583-583.
  21. Smalheiser, N. R., & Swanson, D. R. (1996b). Linking estrogen to Alzheimer's disease: An informatics approach. Neurology, 47(3), 809-810. https://doi.org/10.1212/WNL.47.3.809
  22. Srinivasan, P. (2004). Text mining: Generating hypotheses from MEDLINE. Journal of the American Society for Information Science and Technology, 55(5), 396-413. https://doi.org/10.1002/asi.10389
  23. Sun, L., & Korhonen, A. (2009). Improving verb clustering with automatically acquired selectional preferences. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2, 638-647. Association for Computational Linguistics.
  24. Swanson, D. R. (1986a). Undiscovered public knowledge. The Library Quarterly, 56(2), 103-118. https://doi.org/10.1086/601720
  25. Swanson, D. R. (1986b). Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1), 7-18. https://doi.org/10.1353/pbm.1986.0087
  26. Swanson, D. R. (1988). Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine, 31(4), 526-557. https://doi.org/10.1353/pbm.1988.0009
  27. Swanson, D. R. (1990a). Somatomedin C and arginine: Implicit connections between mutually isolated literatures. Perspectives in Biology and Medicine, 33(2), 157-186. https://doi.org/10.1353/pbm.1990.0031
  28. Swanson, D. R., & Smalheiser, N. R. (1997). An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence, 91(2), 183-203. https://doi.org/10.1016/S0004-3702(97)00008-8
  29. Swanson, D. R., Smalheiser, N. R., & Bookstein, A. (2001). Information discovery from complementary literatures: Categorizing viruses as potential weapons. Journal of the American Society for Information Science and Technology, 52(10), 797-812. https://doi.org/10.1002/asi.1135
  30. Swanson, D. R., Smalheiser, N. R., & Torvik, V. I. (2006). Ranking indirect connections in literature-based discovery: The role of medical subject headings. Journal of the American Society for Information Science and Technology, 57(11), 1427-1439. https://doi.org/10.1002/asi.20438
  31. UMLS Reference Manual. (2013, October 10). Retrieved from http://www.ncbi.nlm.nih.gov/books/NBK9676/
  32. Weeber, M., Klein, H., de Jong-van den Berg, L., & Vos, R. (2001). Using concepts in literaturebased discovery: Simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries. Journal of the American Society for Information Science and Technology, 52(7), 548-557. https://doi.org/10.1002/asi.1104
  33. Weeber, M., Vos, R., Klein, H., Aronson, A. R., & Molema, G. (2003). Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. Journal of the American Medical Informatics Association, 10(3), 252-259. https://doi.org/10.1197/jamia.M1158
  34. Wilkowski, B., Fiszman, M., Miller, C., Hristovski, D., Arabandi, S., Rosemblat, G., & Rindflesch, T. (2011). Discovery browsing with semantic predications and graph theory. In AMIA Annual Symposium Proceedings.