DOI QR코드

DOI QR Code

On-Device Gender Prediction Framework Based on the Development of Discriminative Word and Emoticon Sets

특징적 단어 및 이모티콘 집합을 활용한 모바일 기기 내 성별 예측 프레임워크

  • Received : 2015.08.19
  • Accepted : 2015.09.21
  • Published : 2015.11.15

Abstract

User demographic information is necessary in order to improve the quality of personalized services such as recommendation systems. Mobile data, especially text data, is known to be effective for prediction of user demographic information. However, mobile text data has privacy issues so that its utilization is limited. In this regard, we introduce an on-device gender prediction framework utilizing mobile text data while minimizing the privacy issue. Discriminative word and emoticon sets of each gender are constructed from web documents written by authors of each gender. After gender prediction is performed by comparing discriminative word and emoticon sets with a user's mobile text data, an ensemble method that combines two prediction results draws a final result. From experiments conducted on real-world mobile text data, the proposed on-device framework shows promising results for gender prediction.

사용자의 인구통계학적 정보는 추천 시스템과 같은 개인화 서비스 발달에 도움이 되며, 모바일 사용 데이터는 사용자의 인구통계학적 정보 예측에 활용될 수 있다. 특히 텍스트 데이터는 성별 예측에 효과적인 것으로 알려져 있지만, 모바일 텍스트 데이터는 프라이버시 이슈가 존재하여 그 활용이 제한되고 있다. 본 연구에서는 디바이스 내 예측 방법론을 제안하여 모바일 텍스트 데이터를 사용하면서 프라이버시 이슈를 최소화는 동시에 사용자의 성별을 효과적으로 예측하고자 한다. 우선, 성별에 따른 특징이 반영된 웹문서를 수집하여 각 성별에 따른 특징적 단어 집합과 특징적 이모티콘 집합을 구성한다. 단어 집합과 이모티콘 집합을 디바이스 내에서 사용자의 모바일 데이터와 비교하여 성별을 각각 예측하고, 두 예측 결과를 앙상블하여 최종적인 성별 예측 결과를 도출한다. 피실험자들의 모바일 텍스트 데이터를 사용하여 성별 예측 실험을 수행하였으며 제안 방법론의 우수한 성능을 확인하였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. J. J. Ying, Y. Chang, C. Huang, and V. S. Tseng, "Demographic Prediction based on User's Mobile Behaviors," Mobile Data Challenge, Jun. 2012.
  2. T. Kucukyilmaz, B. B. Cambazoglu, C. Aykanat, and F. Can, "Chat Mining for Gender Prediction," Advanced in Information Systems, Vol. 4243, pp. 274-283, Oct. 2006.
  3. H.-J. Song, A.-Y. Kim, and S.-B. Park, "Identification of User Profile in Social Media based on Multi-Instance Learning," Journal of KIISE : Software and Applications, Vol. 40, No. 4, pp. 233-240, Apr. 2013. (in Korean)
  4. K. Ryu, J. Jeong, and S. Moon, "Inferring Sex, Age, Location of Twitter Users," Journal of KIISE, Vol. 32, No. 7, pp. 46-53, Jul. 2014. (in Korean)
  5. R. Lakoff, "Language and Woman's Place," Language in Society, Vol. 2, No. 1, pp. 45-80, 1973. https://doi.org/10.1017/S0047404500000051
  6. J.-B. Lee, "Use and Gender Differences of Onomatopoeia and Mimetic Words on Internet," The Korean Language and Literature, Vol. 62, pp. 45-74, Sep. 2014. (in Korean)
  7. H. Zhu, E. Chen, K. Yu, H. Cao, H. Xiong, and J. Tian, "Mining Personal Context-Aware Preferences for Mobile Users," Proc. of the IEEE International Conference on Data Mining, Vol. 12, Dec. 2012.
  8. H.-J. Song, S.-B. Park, and S.-J. Lee, "User Profiles Identification from Mobility and Social Media Texts," Journal of KIISE : Computing Practices and Letters, Vol. 19, No. 4, pp. 393-397, Jul. 2013. (in Korean)
  9. L. Li, M. Sun, and Z. Liu, "Discriminating Gender on Chinese Microblog: A Study of Online Behaviour, Writing Style and Preferred Vocabulary," Proc. of the International Conference on Natural Computation, pp. 812-817, Aug. 2014.
  10. S.-C. Kim and J. C. Park, "Age Prediction from Korean Tweets with Style-based Feature Analysis," Proc. of the HCI Korea 2012 Conference, pp. 177-180, Jan. 2012. (in Korean)
  11. D. Cheng, H. Song, H. Cho, S. Jeong, S. Kalasapur, and A. Messer, "Mobile Situation-Aware Task Recommendation Application," Proc. of the International Conference on Next Generation Mobile Applications, Services, and Technologies, pp. 228-233, Sep. 2008.
  12. Y. Yang and J. O. Pedersen, "A Comparative Study on Feature Selection in Text Categorization," Proc. of the International Conference on Machine Learning, pp. 412-420, Jul. 1997.
  13. S. Tata and J. M. Patel, "Estimating the Selectivity of Tf-Idf based Cosine Similarity," ACM SIGMOD Record, Vol. 24, No. 2, pp. 7-12, Jun. 2007.
  14. K. Shim, "MADE: Morphological Analyzer Development Environment," Journal of Internet Computing and Services, Vol. 8, No. 4, pp. 159-171, Aug. 2007. (in Korean)

Cited by

  1. A Two-Phase On-Device Analysis for Gender Prediction of Mobile Users Using Discriminative and Popular Wordsets vol.21, pp.1, 2016, https://doi.org/10.7838/jsebs.2016.21.1.065
  2. A Study on Method for User Gender Prediction Using Multi-Modal Smart Device Log Data vol.21, pp.1, 2016, https://doi.org/10.7838/jsebs.2016.21.1.147