DOI QR코드

DOI QR Code

A Study on Method for User Gender Prediction Using Multi-Modal Smart Device Log Data

스마트 기기의 멀티 모달 로그 데이터를 이용한 사용자 성별 예측 기법 연구

  • Kim, Yoonjung (Department of Industrial Engineering, Seoul National University) ;
  • Choi, Yerim (Department of Industrial Engineering, Seoul National University) ;
  • Kim, Solee (Department of Industrial Engineering, Seoul National University) ;
  • Park, Kyuyon (Department of Industrial Engineering, Seoul National University) ;
  • Park, Jonghun (Department of Industrial Engineering, Seoul National University)
  • Received : 2016.01.12
  • Accepted : 2016.02.19
  • Published : 2016.02.28

Abstract

Gender information of a smart device user is essential to provide personalized services, and multi-modal data obtained from the device is useful for predicting the gender of the user. However, the method for utilizing each of the multi-modal data for gender prediction differs according to the characteristics of the data. Therefore, in this study, an ensemble method for predicting the gender of a smart device user by using three classifiers that have text, application, and acceleration data as inputs, respectively, is proposed. To alleviate privacy issues that occur when text data generated in a smart device are sent outside, a classification method which scans smart device text data only on the device and classifies the gender of the user by matching text data with predefined sets of word. An application based classifier assigns gender labels to executed applications and predicts gender of the user by comparing the label ratio. Acceleration data is used with Support Vector Machine to classify user gender. The proposed method was evaluated by using the actual smart device log data collected from an Android application. The experimental results showed that the proposed method outperformed the compared methods.

스마트 기기 사용자의 성별 정보는 성공적인 개인화 서비스를 위해 중요하며, 스마트 기기로부터 수집된 멀티 모달 로그 데이터는 사용자의 성별 예측에 중요한 근거가 된다. 하지만 각 멀티 모달 데이터의 특성에 따라 다른 방식으로 성별 예측을 수행해야 한다. 따라서 본 연구에서는 스마트 기기로부터 발생한 로그 데이터 중 텍스트, 어플리케이션, 가속도 데이터에 기반한 각기 다른 분류기의 예측 결과를 다수결 방식으로 앙상블하여 최종 성별을 예측하는 기법을 제안한다. 텍스트 데이터를 이용한 분류기는 데이터 유출에 의한 사생활 침해 문제를 최소화하기 위해 웹 문서로부터 각 성별의 특징적 단어 집합을 도출하고 이를 기기로 전송하여 사용자의 기기 내에서 성별 분류를 수행한다. 어플리케이션 데이터에 기반한 분류기는 사용자가 실행한 어플리케이션들에 성별을 부여하고 높은 비율을 차지하는 성별로 사용자의 성별을 예측한다. 가속도 기반 분류기는 성별에 따른 사용자의 가속도 데이터 인스턴스를 학습한 SVM 모델을 사용하여 주어진 성별을 분류한다. 자체 제작한 안드로이드 어플리케이션을 통해 수집된 실제 스마트 기기 로그 데이터를 사용하여 제안하는 기법을 평가하였으며 그 결과 높은 예측 성능을 보였다.

Keywords

References

  1. Bohmer, M., Hecht, B., Schoning, J., Kruger, A., and Bauer, G., "Falling Asleep with Angry Birds, Facebook and Kindle: A Large Scale Study on Mobile Application Usage," Proceedings of the International Conference on Human Computer Interaction with Mobile Devices and Services, 2011.
  2. Baek, S. I. and Choi, D. S., "Exploring User Attitude to Information Privacy," The Journal of Society for e-Business Studies, Vol. 20, No. 1, pp. 45-59, 2015. https://doi.org/10.7838/jsebs.2015.20.1.045
  3. Brdar, S., Culibrk, D., and Crnojevic, V., "Demographic Attributes Prediction on the Real-World Mobile Data," Proceedings of Mobile Data Challenge by Nokia Workshop, 2012.
  4. Chang, C.-C. and Lin, C.-J., "LIBSVM: A Library for Support Vector Machines," ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, p. 27, 2011.
  5. Chen, P.-T. and Hsieh, H.-P., "Personalized Mobile Advertising: Its Key Attributes, Trends, and Social Impact," Technological Forecasting and Social Change, Vol. 79, No. 3, pp. 543-557, 2012. https://doi.org/10.1016/j.techfore.2011.08.011
  6. Croft, W. B., Metzler, D., and Strohman, T., Search Engines: Information Retrieval in Practice, Pearson, 2009.
  7. Delany, S. J., Buckley, M., and Greene, D., "SMS Spam Filtering: Methods and Data," Expert Systems with Applications, Vol. 39, No. 10, pp. 9899-9908, 2012. https://doi.org/10.1016/j.eswa.2012.02.053
  8. Ha, S. H., Oh, J., and Lee, B. G., "The Analysis of Advertisement Effect in Smart Phone Environment: The Comparison of Users with Providers of Commercial," The Journal of Society for e-Business Studies, Vol. 16, No. 4, pp. 221-239, 2011. https://doi.org/10.7838/jsebs.2011.16.4.221
  9. Hu, J., Zeng, H.-J., Li, H., Niu, C., and Chen, Z., "Demographic Prediction based on User's Browsing Behavior," Proceedings of the International Conference on World Wide Web, 2007.
  10. Igarashi, T., Takai, J., and Yoshida, T., "Gender Differences in Social Network Development via Mobile Phone Text Messages: A Longitudinal Study," Journal of Social and Personal Relationships, Vol. 22, No. 5, pp. 691-713, 2005. https://doi.org/10.1177/0265407505056492
  11. Joachims, T., "Making Large-Scale SVM Learning Practical," in Advances in Kernel Methods-Support Vector Learning, ed Cambridge, Massachusetts: MIT Press, pp. 169-184, 1999.
  12. Kim, S., Choi, Y., Kim, Y., Park, K., and Park, J., "On-Device Gender Prediction Framework Based on the Development of Discriminative Word and Emoticon Sets," KIISE Transactions on Computing Practices, Vol. 21, No. 11, pp. 733-738, 2015. https://doi.org/10.5626/KTCP.2015.21.11.733
  13. Kuncheva, L. I., Combining Pattern Classifiers: Methods and Algorithms, John Wiley and Sons, 2004.
  14. Laurila, J. K., Gatica-Perez, D., Aad, I., Blom, J., Bornet, O., Do, T. M. T., Dousse, O., Eberle, J., and Miettinen, M., "From Big Smartphone Data to Worldwide Research: The Mobile Data Challenge," Pervasive and Mobile Computing, Vol. 9, No. 6, pp. 752-771, 2013. https://doi.org/10.1016/j.pmcj.2013.07.014
  15. Lee, D. and Shim, J., "Survey on Vector Similarity Measures: Focusing on Algebraic Characteristics," The Journal of Society for e-Business Studies, Vol. 17, No. 4, pp. 209-219, 2012. https://doi.org/10.7838/jsebs.2012.17.4.209
  16. Lee, Z., Choi, H., and Choi, S., "Study on How Service Usefulness and Privacy Concern Influence on Service Acceptance," The Journal of Society for e-Business Studies, Vol. 12, No. 4, pp. 37-51, 2007.
  17. Mohrehkesh, S., Ji, S., Nadeem, T., and Weigle, M. C., "Demographic Prediction of Mobile User from Phone Usage," Proceedings of Mobile Data Challenge by Nokia Workshop, 2012.
  18. Roh, J.-H., Kim, H.-j., and Chang, J.-Y., "Improving Hypertext Classification Systems Through WordNet-based Feature Abstraction," The Journal of Society for e-Business Studies, Vol. 18, No. 2, pp. 95-110, 2013.
  19. Seneviratne, S., Seneviratne, A., Mohapatra, P. and Mahanti, A., "Your Installed Apps Reveal Your Gender and More!," SIGMOBILE Mobile Computing and Communications Review, Vol. 18, pp. 55-61, 2015. https://doi.org/10.1145/2721896.2721908
  20. Shim, K.-S., "MADE: Morphological Analyzer Development Environment," Journal of Internet Computing and Services, Vol. 8, No. 4, pp. 159-171, 2007.
  21. Walkowiak, K., Sztajer, S., and Wozniak, M., "Decentralized Distributed Computing System for Privacy-Preserving Combined Classifiers-Modeling and Optimization," Proceedings of the International Conference on Computational Science and Its Applications, 2011.
  22. Weiss, G. M. and Lockhart, J. W., "Identifying User Traits By Mining Smart Phone Accelerometer Data," Proceedings of the International Workshop on Knowledge Discovery from Sensor Data, 2011.
  23. Wozniak, M., Grana, M., and Corchado, E., "A Survey of Multiple Classifier Systems as Hybrid Systems," Information Fusion, Vol. 16, pp. 3-17, 2014. https://doi.org/10.1016/j.inffus.2013.04.006
  24. Ying, J. J.-C., Chang, Y.-J., Huang, C.-M. and Tseng, V. S., "Demographic Prediction based on Users Mobile Behaviors," Proceedings of Mobile Data Challenge by Nokia Workshop, 2012.
  25. Zenobi, G. and Cunningham, P., "Using Diversity in Preparing Ensembles of Classifiers based on Different Feature Subsets to Minimize Generalization Error," Proceedings of the European Conference on Machine Learning, 2001.
  26. Zhong, E., Tan, B., Mo, K., and Yang, Q., "User Demographics Prediction Based on Mobile Data," Pervasive and Mobile Computing, Vol. 9, No. 6, pp. 823-837, 2013. https://doi.org/10.1016/j.pmcj.2013.07.009

Cited by

  1. 생체신호를 활용한 학습기반 영유아 스트레스 상태 식별 모델 연구 vol.22, pp.2, 2016, https://doi.org/10.7838/jsebs.2017.22.2.001
  2. 다중 스태킹을 가진 새로운 앙상블 학습 기법 vol.25, pp.3, 2016, https://doi.org/10.7838/jsebs.2020.25.3.001