DOI QR코드

DOI QR Code

Investigating the Efficient Method for Constructing Audio Surrogates of Digital Video Data

비디오의 오디오 정보 요약 기법에 관한 연구

  • 김현희 (명지대학교 인문대학 문헌정보학과)
  • Published : 2009.09.30

Abstract

The study proposed the algorithm for automatically summarizing the audio information from a video and then conducted an experiment for the evaluation of the audio extraction that was constructed based on the proposed algorithm. The research results showed that first, the recall and precision rates of the proposed method for audio summarization were higher than those of the mechanical method by which audio extraction was constructed based on the sentence location. Second, the proposed method outperformed the mechanical method in summary making tasks, although in the gist recognition task(multiple choice), there is no statistically difference between the proposed and mechanical methods. In addition, the study conducted the participants' satisfaction survey regarding the use of audio extraction for video browsing and also discussed the practical implications of the proposed method in Internet and digital library environments.

본 연구는 비디오의 오디오 정보를 추출하여 자동으로 요약하는 알고리즘을 설계하고, 제안된 알고리즘에 의해서 구성한 오디오 요약의 품질을 평가하여 효율적인 비디오 요약의 구현 방안을 제안하였다. 구체적인 연구 결과를 살펴보면 다음과 같다. 먼저, 제안 오디오 요약의 품질이 위치 기반 오디오 요약의 품질 보다 내재적 평가에서 더 우수하게 나타났다. 이용자 평가(외재적 평가)의 요약문 정확도에서는 제안 요약문이 위치 기반 요약문 보다 더 우수한 것으로 나타났지만, 항목 선택에서는 이 두 요약문간의 성능 차이는 없는 것으로 나타났다. 이외에 비디오 브라우징을 위한 오디오 요약에 대한 이용자 만족도를 조사하였다. 끝으로 이러한 조사 결과를 기초로 하여 제안된 오디오 요약 기법을 인터넷이나 디지털 도서관에 활용하는 방안들을 제시하였다.

Keywords

References

  1. 김재곤 등. 2000. 효율적인 비디오 브라우징을 위한 동적 요약 및 요약 기술구조. 방송 공학회논문지, 5(1): 82-93
  2. 정영미. 2005. 정보검색연구. 서울: 구미무역 출판부
  3. 진성원 등. 2005. 개인화된 의미 기반 콘텐츠 소비 를 위한 지능형 방송 시스템과 서비스. 방 송공학회논문지, 10(3): 422-435
  4. Edmunson, H. P. 1969. 'New methods in automatic extracting.' Journal of the ACM, 16(2): 265-285 https://doi.org/10.1145/321510.321519
  5. Furini, M. and V. Ghini. 2006. 'An Audiovideo smmarisation scheme based on audio and video analysis.' Proceedings of the IEEE Consumer Communications and Networking Conference(CCNC '06), vol. 2, Las Vegas, NV, USA, 8-10 January, 2006, 1209-1213
  6. Gunther, R., R. Kazman, and C. MaccGregor. 2004. 'Using 3D sound as a navigational aid in virtual environments.' Behaviour and Information Technology, 23(6): 435-446 https://doi.org/10.1080/01449290410001723364
  7. Hauptmann, A. G. 2005. 'Lessons for the future from a decade of informedia video analysis research.' Lecture Notes in Computer Science, Vol. 3568: 1-10. [cited 2006.6.25]. https://doi.org/10.1007/11526346_1
  8. Kristin, B. et al. 2006. Audio surrogation for digital video: A design framework. UNC School of Information and Library Science(SILS) Technical Report TR 2006-21
  9. Kupiec, J., J. Pedersen, and F. Chen. 1995. 'A trainable document summarizer.' Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 68-73 https://doi.org/10.1145/215206.215333
  10. Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2): 159- 165 https://doi.org/10.1147/rd.22.0159
  11. Mani, I. 2001. Automatic summarization. Amsterdam: John Benjamins Publishing Co
  12. Marchionini, G., B. M. Wildemuth, and G. Geisler. 2006. 'The Open Video Digital Library: A Mobius strip of research and practice.' Journal of the American Society for Information Science and Technology, 57(12): 1623- 1643 https://doi.org/10.1002/asi.20336
  13. Money, A. G. and H. Agius. 2008. 'Video summarisation: A conceptual framework and survey of the state of the art.' Journal of visual communication and image representation, 19(2): 121- 143 https://doi.org/10.1016/j.jvcir.2007.04.002
  14. Money, A. G. and H. Agius. 2009. 'Analysing user physiological responses for affective video summarisation.' Displays, 30: 59-70 https://doi.org/10.1016/j.displa.2008.12.003
  15. Myaeng, S. H. and D. H. Jang. 1999. 'Development and evaluation of a statistically- based document summarization system.' In I. Mani and M. T. Maybury, eds. Advances in automatic text summarization. Cambridge, MA: The MIT Press, 61-70
  16. Over, P. et al. 2005. TRECVID, 2005: 'An introduction.' Proceedings of the TRECVID, 2005(Gaithersburg, MD), 1-14
  17. Schmandt, C. and A. Mullins. 1995. 'Audio- Streamer: Exploiting simultaneity for listening.' CHI '95: Conference companion on human factors in computing systems, Denver, Colorado, United States, 218-219. from
  18. Smeaton, A. F. 2007. 'Techniques used and open challenges to the analysis, indexing and retrieval of digital video.' Information Systems, 32: 545-559 https://doi.org/10.1016/j.is.2006.09.001
  19. Smeaton, A. F. and P. Browne. 2006. 'A usage study of retrieval modalities for video shot retrieval.' Information Processing and Management, 42(5): 1330- 1344 https://doi.org/10.1016/j.ipm.2005.11.003
  20. Song, Y. and G. Marchionini. 2007. 'Effects of audio and visual surrogates for making sense of digital video.' Proceedings of CHI 2007, San Jose, CA, USA. 867-876 https://doi.org/10.1145/1240624.1240755
  21. Sparck Jones, K. 2007. 'Automatic summarising: The state of the art.' Information Processing and Management, 43: 1449- 1481 https://doi.org/10.1016/j.ipm.2007.03.009
  22. Witbrock, M. and A. Hauptmann. 1998. 'Speech recognition for a digital video library.' Journal of the American Society for Information Science and Technology, 49(7): 619-632 https://doi.org/10.1002/(SICI)1097-4571(1998)49:7<619::AID-ASI4>3.0.CO;2-1
  23. Yang, M. and G. Marchionini. 2005. 'Deciphering visual gist and its implications for video retrieval and interface design.' Conference on Human Factors in Computing Systems(CHI). Portland, OR. Apr. 2-7 https://doi.org/10.1145/1056808.1057045

Cited by

  1. Investigating an Automatic Method in Summarizing a Video Speech Using User-Assigned Tags vol.46, pp.1, 2012, https://doi.org/10.4275/KSLIS.2012.46.1.163
  2. Comparing the Use of Semantic Relations between Tags Versus Latent Semantic Analysis for Speech Summarization vol.47, pp.3, 2013, https://doi.org/10.4275/KSLIS.2013.47.3.343
  3. A Study on the Interactive Effect of Spoken Words and Imagery not Synchronized in Multimedia Surrogates for Video Gisting vol.45, pp.2, 2011, https://doi.org/10.4275/KSLIS.2011.45.2.097