Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model

Heo, Go Eun;Song, Min;

doi:10.3743/KOSIM.2014.31.1.231

Journal of the Korean Society for information Management (정보관리학회지)

Volume 31 Issue 1
/
Pages.231-250
/
2014
/
1013-0799(pISSN)
/
2586-2073(eISSN)

Korean Society for Information Management (한국정보관리학회)

DOI QR Code

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model

텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론

Heo, Go Eun ;
Song, Min

허고은 (연세대학교 문헌정보학과 대학원) ;
송민 (연세대학교 문헌정보학과)

Received : 2014.02.20
Accepted : 2014.03.13
Published : 2014.03.30

https://doi.org/10.3743/KOSIM.2014.31.1.231 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Due to the recent development of Information and Communication Technologies (ICT), the amount of research publications has increased exponentially. In response to this rapid growth, the demand of automated text processing methods has risen to deal with massive amount of text data. Biomedical text mining discovering hidden biological meanings and treatments from biomedical literatures becomes a pivotal methodology and it helps medical disciplines reduce the time and cost. Many researchers have conducted literature-based discovery studies to generate new hypotheses. However, existing approaches either require intensive manual process of during the procedures or a semi-automatic procedure to find and select biomedical entities. In addition, they had limitations of showing one dimension that is, the cause-and-effect relationship between two concepts. Thus;this study proposed a novel approach to discover various relationships among source and target concepts and their intermediate concepts by expanding intermediate concepts to multi-levels. This study provided distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

정보통신기술의 발달로 학술 정보의 양이 기하급수적으로 증가하였고 방대한 양의 텍스트 데이터를 처리하기 위한 자동화된 텍스트 처리의 필요성이 대두되었다. 생의학 문헌에서 생물학적 의미와 치료 효과 등에 대한 정보를 발견해내는 바이오 텍스트 마이닝은 문헌 내의 각 개념들 간의 유의미한 연관성을 발견하여 의학 영역에서 상당한 시간과 비용을 줄여준다. 문헌 기반 발견 연구로 새로운 생의학적 가설들이 발견되었지만 기존의 연구들은 반자동화된 기법으로 전문가의 개입이 필수적이며 원인과 결과의 한가지의 관계만을 밝히는 제한점이 있다. 따라서 본 연구에서는 중간 개념인 B를 다수준으로 확장하여 다양한 관계성을 동시출현 개체와 동사 추출을 통해 확인한다. 그래프 기반의 경로 추론을 통해 각 노드 사이의 관계성을 체계적으로 분석하여 규명할 수 있었으며 새로운 방법론적 시도를 통해 기존에 밝혀지지 않았던 새로운 가설 제시의 가능성을 기대할 수 있다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

Automatic Classification for English Verbs. (2013, July 1). Retrieved from http://www.cl.cam.ac.uk/-ls418/resource_release/
Cameron, D., Bodenreider, O., Yalamanchili, H., Danh, T., Vallabhaneni, S., Thirunarayan, K., Sheth, A. P., & Rindflesch, T. C. (2013). A graph-based recovery and decomposition of swanson's hypothesis using semantic predications. Journal of Biomedical Informatics, 46(2), 238-251. https://doi.org/10.1016/j.jbi.2012.09.004
DiGiacomo, R. A., Kremer, J. M., & Shah, D. M. (1989). Fish oil dietary supplementation in patients with Raynaud's phenomenon: A doubleblind, controlled, prospective study. American Journal of Medicine, 8, 158-164.
Frijters, R., Heupers, B., van Beek, P., Bouwhuis, M., van Schaik, R., de Vlieg, J., Polman, J., & Alkema, W. (2008). CoPub: a literature-based keyword enrichment tool for microarray data analysis. Nucleic Acids Research, 36(suppl 2), W406-W410. https://doi.org/10.1093/nar/gkn215
Frijters, R., van Vugt, M., Smeets, R., van Schaik, R., de Vlieg, J., & Alkema, W. (2010). Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Computational Biology, 6(9), 1-11. e1000943.
Hristovski, D., Friedman, C., Rindflesch, T. C., & Peterlin, B. (2006). Exploiting semantic relations for literature-based discovery. In AMIA Annual Symposium Proceedings, 349-353. American Medical Informatics Association.
Hristovski, D., Peterlin, B., Mitchell, J. A., & Humphrey, S. M. (2005). Using literature-based discovery to identify disease candidate genes. International Journal of Medical Informatics, 74(2), 289-298. https://doi.org/10.1016/j.ijmedinf.2004.04.024
Hristovski, D., Rindflesch, T., & Peterlin, B. (2013). Using literature-based discovery to identify novel therapeutic approaches. Cardiovascular and Hematological Agents in Medicinal Chemistry, 11(1), 14-24. https://doi.org/10.2174/1871525711311010005
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., & Rindflesch, T. C. (2012). SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics, 28(23), 3158-3160. https://doi.org/10.1093/bioinformatics/bts591
Kim, J. D., Ohta, T., Tateisi, Y., & Tsujii, J. (2003). GENIA corpus-a semantically annotated corpus for bio-textmining. Bioinformatics, 19(1), 180-182. https://doi.org/10.1093/bioinformatics/btg1023
Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, 282-289.
Liekens, A. M., De Knijf, J., Daelemans, W., Goethals, B., De Rijk, P., & Del-Favero, J. (2011). BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation. Genome Biology, 12(6), R57. https://doi.org/10.1186/gb-2011-12-6-r57
LingPipe: Named entity tutorial. (2013, July 1). Retrieved from http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html/
LingPipe: Sentence boundary detection. (2013, July 1). Retrieved from http://alias-i.com/lingpipe/demos/tutorial/sentences/read-me.html/
MEDLINE, PubMed XML element descriptions and their attributes. (2013, October 10). Retrieved from http://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html/
Narayanasamy, V., Mukhopadhyay, S., Palakal, M., & Potter, D. A. (2004). TransMiner: Mining transitive associations among biological objects from text. Journal of Biomedical Science, 11(6), 864-873. https://doi.org/10.1007/BF02254372
NegEx (2013, December 1). Retrieved from http://code.google.com/p/negex/
PubMed (2013, August 2). Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/
Smalheiser, N. R., & Swanson, D. R. (1994). Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease. Neuroscience Research Communications, 15(1), 1-9.
Smalheiser, N. R., & Swanson, D. R. (1996a). Indomethacin and Alzheimer's disease. Neurology, 46(2), 583-583.
Smalheiser, N. R., & Swanson, D. R. (1996b). Linking estrogen to Alzheimer's disease: An informatics approach. Neurology, 47(3), 809-810. https://doi.org/10.1212/WNL.47.3.809
Srinivasan, P. (2004). Text mining: Generating hypotheses from MEDLINE. Journal of the American Society for Information Science and Technology, 55(5), 396-413. https://doi.org/10.1002/asi.10389
Sun, L., & Korhonen, A. (2009). Improving verb clustering with automatically acquired selectional preferences. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2, 638-647. Association for Computational Linguistics.
Swanson, D. R. (1986a). Undiscovered public knowledge. The Library Quarterly, 56(2), 103-118. https://doi.org/10.1086/601720
Swanson, D. R. (1986b). Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1), 7-18. https://doi.org/10.1353/pbm.1986.0087
Swanson, D. R. (1988). Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine, 31(4), 526-557. https://doi.org/10.1353/pbm.1988.0009
Swanson, D. R. (1990a). Somatomedin C and arginine: Implicit connections between mutually isolated literatures. Perspectives in Biology and Medicine, 33(2), 157-186. https://doi.org/10.1353/pbm.1990.0031
Swanson, D. R., & Smalheiser, N. R. (1997). An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence, 91(2), 183-203. https://doi.org/10.1016/S0004-3702(97)00008-8
Swanson, D. R., Smalheiser, N. R., & Bookstein, A. (2001). Information discovery from complementary literatures: Categorizing viruses as potential weapons. Journal of the American Society for Information Science and Technology, 52(10), 797-812. https://doi.org/10.1002/asi.1135
Swanson, D. R., Smalheiser, N. R., & Torvik, V. I. (2006). Ranking indirect connections in literature-based discovery: The role of medical subject headings. Journal of the American Society for Information Science and Technology, 57(11), 1427-1439. https://doi.org/10.1002/asi.20438
UMLS Reference Manual. (2013, October 10). Retrieved from http://www.ncbi.nlm.nih.gov/books/NBK9676/
Weeber, M., Klein, H., de Jong-van den Berg, L., & Vos, R. (2001). Using concepts in literaturebased discovery: Simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries. Journal of the American Society for Information Science and Technology, 52(7), 548-557. https://doi.org/10.1002/asi.1104
Weeber, M., Vos, R., Klein, H., Aronson, A. R., & Molema, G. (2003). Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. Journal of the American Medical Informatics Association, 10(3), 252-259. https://doi.org/10.1197/jamia.M1158
Wilkowski, B., Fiszman, M., Miller, C., Hristovski, D., Arabandi, S., Rosemblat, G., & Rindflesch, T. (2011). Discovery browsing with semantic predications and graph theory. In AMIA Annual Symposium Proceedings.

Journal of the Korean Society for information Management (정보관리학회지)

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model

텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)