An effective approach to generate Wikipedia infobox of movie domain using semi-structured data

Bhuiyan, Hanif;Oh, Kyeong-Jin;Hong, Myung-Duk;Jo, Geun-Sik;

doi:10.7472/jksii.2017.18.3.49

Journal of Internet Computing and Services (인터넷정보학회논문지)

Volume 18 Issue 3
/
Pages.49-61
/
2017
/
1598-0170(pISSN)
/
2287-1136(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

An effective approach to generate Wikipedia infobox of movie domain using semi-structured data

Bhuiyan, Hanif (Dept. of Computer Science & Information Engineering Inha University) ;
Oh, Kyeong-Jin (Dept. of Computer Science & Information Engineering Inha University) ;
Hong, Myung-Duk (Dept. of Computer Science & Information Engineering Inha University) ;
Jo, Geun-Sik (Dept. of Computer Science & Information Engineering Inha University)

Received : 2017.02.10
Accepted : 2017.04.25
Published : 2017.06.30

https://doi.org/10.7472/jksii.2017.18.3.49 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Wikipedia infoboxes have emerged as an important structured information source on the web. To compose infobox for an article, considerable amount of manual effort is required from an author. Due to this manual involvement, infobox suffers from inconsistency, data heterogeneity, incompleteness, schema drift etc. Prior works attempted to solve those problems by generating infobox automatically based on the corresponding article text. However, there are many articles in Wikipedia that do not have enough text content to generate infobox. In this paper, we present an automated approach to generate infobox for movie domain of Wikipedia by extracting information from several sources of the web instead of relying on article text only. The proposed methodology has been developed using semantic relations of article content and available semi-structured information of the web. It processes the article text through some classification processes to identify the template from the large pool of template list. Finally, it extracts the information for the corresponding template attributes from web and thus generates infobox. Through a comprehensive experimental evaluation the proposed scheme was demonstrated as an effective and efficient approach to generate Wikipedia infobox.

Keywords

References

Wikipedia. htttp://www.wikipedia.org
DBpedia. http://wiki.dbpedia.org
Wordnet. http//wordnet.princeton.edu
D. Milne, and I. H Witten, "An open-source toolkit for mining Wikipedia," Artificial Intelligence, vol. 194, pp. 222-239, 2013. https://doi.org/10.1016/j.artint.2012.06.007
D. Milne, O. Medelyan, and I. H Witten, "Mining domain-specific thesauri from wikipedia: A case study," In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence, IEEE Computer Society, pp. 442-448, Dec. 2006. https://doi.org/10.1109/WI.2006.119
C. Elkan and R. Greiner, "Building large Knowledge- based systems representation and inference in the Cyc project," Artificial Intelligence, vol. 61, no. 1, pp. 41-52, 2006. https://doi.org/10.1016/0004-3702(93)90092-P
H. Nguyen, T. Nguyen, H. Nguyen, and J. Freire, "Querying Wikipedia documents and relationships," In Proceedings of the 13th International Workshop on the Web and Databases, ACM, p. 4, June 2010. https://doi.org/10.1145/1859127.1859133
H. Mousavi, D. Kerr, M. Iseli, and C. Zaniolo, "Deducing infoboxes from unstructured text in wikipedia pages," CSD Technical Report# 130001, UCLA, pp. 1-13, 2013.
K. Bollacker, C. Evans, P. Paritosh, T. Sturge and J. Taylor, "Freebase: a collaboratively created graph database for structuring human knowledge," In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, pp. 1247-1250, June. 2008. https://doi.org/10.1145/1376616.1376746
Morsey, M. Lehmann, J. Auer, S. Stadler, C. Hellmann, and S. Hellmann, "Dbpedia and the live extraction of structured data from Wikipedia," program electronic library and information systems, vol. 46, no. 2, pp. 157-181, 2012. http://doi.org/10.1108/00330331211221828
R. Yus, V, Mulwad, T. Finin and E. Mena, "Infoboxer: Using Statistical and Semantic Knowledge to Help Create Wikipedia Infoboxes," In Proceeding of ISWC-PD'14 Proceedings of the 2014 International Conference, vol. 1272, pp. 405-408, 2014.
F. Wu and D. S. Weld, "Automatically refining the wikipedia infobox ontology," InProceedings of the 17th international conference on World Wide Web, ACM, pp. 635-644, April 2008. https://doi.org/10.1145/1367497.1367583
F. Wu and D.S. Weld, "Autonomously semantifying Wikipedia," In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 41-50, Nov. 2007. https://doi.org/10.1145/1321440.1321449
K. Zhang, Y. Xiao, H. Tong, H. Wang, and W. Wang, "The links have it: Infobox generation by summarization over linked entities," arXiv preprint arXiv: 1406.6449, 2014.
A. Sultana, Q. M. Hasan, A. K. Biswas, S. Das, H. Rahman, C. Ding, and C. Li, "Infobox suggestion for Wikipedia entities," In Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 2307-2310, Oct. 2012. https://doi.org/10.1145/2396761.2398627
A.M. Azmi and S. Al-Thanyyan, "A text summarizer for Arabic," Computer Speech & Language, vol. 26, no. 4, pp. 260-273, 2012. https://doi.org/10.1016/j.csl.2012.01.002
T. Pederson, S. Patwardhan and J. Michelizzi, "WordNet:: Similarity: measuring the relatedness of concepts," In Demonstration papers at HLT-NAACL, Association for Computational Linguistics, pp. 38-41, May 2004.
O. Medelyan, D. Milne, C. Legg and I. H. Witten, "Mining meaning from Wikipedia," International Journal of Human-Computer Studies, vol. 67, no. 9, pp. 716-754, 2009. https://doi.org/10.1016/j.ijhcs.2009.05.004
W. Dakka and S. Cuecerzen, "Augmenting Wikipedia with Named Entity Tags," In IJCNLP, pp. 545-552, Jan. 2008.
J. Nothman, J. R. Curran and T. Murphy, "Transforming Wikipedia into named entity training data," In Proceedings of the Australian Language Technology Workshop, pp. 124-132, Dec. 2008.
P. A. Ly, C. Pedrinaci and J. Domingue. "Automated information extraction from Web APIs documentation," In Web Information Systems Engineering-WISE, Springer Berlin Heidelberg, pp. 497-511, 2012. http://doi.org/10.1007/978-3-642-35063-4_36
L. Faria, A. Akbik, B. Sierman, M. Ras, M. Ferreira, and J. C. Ramalho, "Automatic preservation watch using information extraction on the Web," In Proceedings of the 10th International Conference on Preservation of Digital Objects (iPRES). Lisbon, 2013.

Cited by

지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구 vol.25, pp.1, 2019, https://doi.org/10.13088/jiis.2019.25.1.043

Journal of Internet Computing and Services (인터넷정보학회논문지)

An effective approach to generate Wikipedia infobox of movie domain using semi-structured data

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)