DOI QR코드

DOI QR Code

Annotation Method based on Face Area for Efficient Interactive Video Authoring

효과적인 인터랙티브 비디오 저작을 위한 얼굴영역 기반의 어노테이션 방법

  • Yoon, Ui Nyoung (Department of Computer Science and Information Engineering, Inha University) ;
  • Ga, Myeong Hyeon (Department of Computer Science and Information Engineering, Inha University) ;
  • Jo, Geun-Sik (Department of Computer Science and Information Engineering, Inha University)
  • 윤의녕 (인하대학교 컴퓨터정보공학부) ;
  • 가명현 (인하대학교 컴퓨터정보공학부) ;
  • 조근식 (인하대학교 컴퓨터정보공학부)
  • Received : 2014.11.26
  • Accepted : 2015.01.02
  • Published : 2015.03.31

Abstract

Many TV viewers use mainly portal sites in order to retrieve information related to broadcast while watching TV. However retrieving information that people wanted needs a lot of time to retrieve the information because current internet presents too much information which is not required. Consequentially, this process can't satisfy users who want to consume information immediately. Interactive video is being actively investigated to solve this problem. An interactive video provides clickable objects, areas or hotspots to interact with users. When users click object on the interactive video, they can see additional information, related to video, instantly. The following shows the three basic procedures to make an interactive video using interactive video authoring tool: (1) Create an augmented object; (2) Set an object's area and time to be displayed on the video; (3) Set an interactive action which is related to pages or hyperlink; However users who use existing authoring tools such as Popcorn Maker and Zentrick spend a lot of time in step (2). If users use wireWAX then they can save sufficient time to set object's location and time to be displayed because wireWAX uses vision based annotation method. But they need to wait for time to detect and track object. Therefore, it is required to reduce the process time in step (2) using benefits of manual annotation method and vision-based annotation method effectively. This paper proposes a novel annotation method allows annotator to easily annotate based on face area. For proposing new annotation method, this paper presents two steps: pre-processing step and annotation step. The pre-processing is necessary because system detects shots for users who want to find contents of video easily. Pre-processing step is as follow: 1) Extract shots using color histogram based shot boundary detection method from frames of video; 2) Make shot clusters using similarities of shots and aligns as shot sequences; and 3) Detect and track faces from all shots of shot sequence metadata and save into the shot sequence metadata with each shot. After pre-processing, user can annotates object as follow: 1) Annotator selects a shot sequence, and then selects keyframe of shot in the shot sequence; 2) Annotator annotates objects on the relative position of the actor's face on the selected keyframe. Then same objects will be annotated automatically until the end of shot sequence which has detected face area; and 3) User assigns additional information to the annotated object. In addition, this paper designs the feedback model in order to compensate the defects which are wrong aligned shots, wrong detected faces problem and inaccurate location problem might occur after object annotation. Furthermore, users can use interpolation method to interpolate position of objects which is deleted by feedback. After feedback user can save annotated object data to the interactive object metadata. Finally, this paper shows interactive video authoring system implemented for verifying performance of proposed annotation method which uses presented models. In the experiment presents analysis of object annotation time, and user evaluation. First, result of object annotation average time shows our proposed tool is 2 times faster than existing authoring tools for object annotation. Sometimes, annotation time of proposed tool took longer than existing authoring tools, because wrong shots are detected in the pre-processing. The usefulness and convenience of the system were measured through the user evaluation which was aimed at users who have experienced in interactive video authoring system. Recruited 19 experts evaluates of 11 questions which is out of CSUQ(Computer System Usability Questionnaire). CSUQ is designed by IBM for evaluating system. Through the user evaluation, showed that proposed tool is useful for authoring interactive video than about 10% of the other interactive video authoring systems.

TV를 보면서 방송에 관련된 정보를 검색하려는 많은 시청자들은 정보 검색을 위해 주로 포털 사이트를 이용하고 있으며, 무분별한 정보 속에서 원하는 정보를 찾기 위해 많은 시간을 소비하고 있다. 이와 같은 문제를 해결하기 위한 연구로써, 인터랙티브 비디오에 대한 연구가 활발하게 진행되고 있다. 인터랙티브 비디오는 일반적인 비디오에 추가 정보를 갖는 클릭 가능한 객체, 영역, 또는 핫스팟을 동시에 제공하여 사용자와 상호작용이 가능한 비디오를 말한다. 클릭 가능한 객체를 제공하는 인터랙티브 비디오를 저작하기 위해서는 첫째, 증강 객체를 생성하고, 둘째, 어노테이터가 비디오 위에 클릭 가능한 객체의 영역과 객체가 등장할 시간을 지정하고, 셋째, 객체를 클릭할 때 사용자에게 제공할 추가 정보를 지정하는 과정을 인터랙티브 비디오 저작 도구를 이용하여 수행한다. 그러나 기존의 저작 도구를 이용하여 인터랙티브 비디오를 저작할 때, 객체의 영역과 등장할 시간을 지정하는데 많은 시간을 소비하고 있다. 본 논문에서는 이와 같은 문제를 해결하기 위해 유사한 샷들의 모임인 샷 시퀀스의 모든 샷에서 얼굴 영역을 검출한 샷 시퀀스 메타데이터 모델과 객체의 어노테이션 결과를 저장할 인터랙티브 오브젝트 메타데이터 모델, 그리고 어노테이션 후 발생될 수 있는 부정확한 객체의 위치 문제를 보완할 사용자 피드백 모델을 적용한 얼굴영역을 기반으로 하는 새로운 형태의 어노테이션 방법을 제안한다. 마지막으로 제안한 어노테이션 방법의 성능을 검증하기 위해서 인터랙티브 비디오 저작 시스템을 구현하여 기존의 저작도구들과 저작 시간을 비교하였고, 사용자 평가를 진행 하였다. 비교 분석 결과 평균 저작 시간이 다른 저작 도구에 비해 2배 감소하였고, 사용자 평가 결과 약 10% 더 유용한다고 평가 되었다.

Keywords

References

  1. Chasanis, V. T., C. L. Likas, and N. P. Galatsanos, "Scene Detection in Videos Using Shot Clustering and Sequence Alignment," IEEE Transactions on Multimedia, Vol.11, No.1 (2009), 89-100. https://doi.org/10.1109/TMM.2008.2008924
  2. Froba, B., A. Ernst, "Face Detection with the Modified Census Transform," Proceedings of Sixth IEEE International Conference on Automatic Face and Gesture Recognition, (2004), 91-96.
  3. Lewis, J. R., "IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use," International Journal of Human-Computer Interaction, Vol.7, No.1 (1995), 57-78. https://doi.org/10.1080/10447319509526110
  4. Lienhart, R., "Comparison of automatic shot boundary detection algorithms," Proceedings of the SPIE Conference, Vol.3656(1998), 290-301.
  5. Lee, K.-S., A. N. Rosli, I. A. Supandi, and G.-S. Jo, "Dynamic sampling-based interpolation algorithm for representation of clickable moving object in collaborative video annotation," Neurocomputing, Vol.146(2014), 291-300. https://doi.org/10.1016/j.neucom.2014.03.068
  6. Lee, K. A., C. H. You, H. Li, T. Kinnunen, and K. C. Sim, "Using Discrete Probabilities With Bhattacharyya Measure for SVM-Based Speaker Verification," IEEE Transactions on Audio, Speech, and Language Processing, Vol.19, No.4(2011), 861-870. https://doi.org/10.1109/TASL.2010.2064308
  7. Lin, T. T. C., "Convergence and Regulation of Multi-Screen Television : The Singapore Experience," Telecommunications Policy, Vol.37, No.8(2013), 673-685. https://doi.org/10.1016/j.telpol.2013.04.011
  8. Lucas, B. D., T. Kanade, "An iterative image registration technique with an application to stereo vision," Proceedings of the 7th International Joint Conference on Artificial Intelligence, (1981), 674-679.
  9. Miller, G., S. Fels, M. Ilich, M. M. Finke, T. Bauer, K. Wong, and S. Mueller, "An End-to-End Framework for Multi-View Video Content: Creating Multiple-Perspective Hypervideo to View On Mobile Platforms," Proceedings of 10th International Conference on Entertainment Computing, (2011), 337-342.
  10. Nielsen, Digital Consumer Report, 2014. Available at http://www.nielsen.com/us/en/reports.html(Accessed 13 November, 2014).
  11. Mozilla, Popcorn Maker. Available at https://popcorn.webmaker.org(Accessed 13 November, 2014).
  12. Rui, Y., T. S. Huang, and S. Mehrotra, "Constructing Table-of-Content for Videos," Multimedia Systems, Vol.7, No.5(1999), 359-368. https://doi.org/10.1007/s005300050138
  13. Swiki, Frontal Face Haar Cascade, Available at http://alereimondo.no-ip.org/OpenCV (Accessed 13 November, 2014).
  14. Viola, P., and M. Jones, "Rapid object detection using a boosted cascade of simple features," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2001), 511-518.
  15. wireWAX. Available at http://wirewax.com(Accessed 13 November, 2014).
  16. Yoon, U. N., K. S. Lee, and G. S. Jo, "Interactive Video Annotation System based on Face Area," Korea Computer Congress, (2014), 755-757.
  17. Zentrick. Available at https://www.zentrick.com (Accessed 13 November, 2014).

Cited by

  1. HTML5 iframe 기반 상호작용형 융합 콘텐츠 저작을 위한 XML 데이터 모형 및 해석기 개발 vol.20, pp.12, 2020, https://doi.org/10.5392/jkca.2020.20.12.250