DOI QR코드

DOI QR Code

Enhancing Classification Performance of Temporal Keyword Data by Using Moving Average-based Dynamic Time Warping Method

이동 평균 기반 동적 시간 와핑 기법을 이용한 시계열 키워드 데이터의 분류 성능 개선 방안

  • 정도헌 (덕성여자대학교 문헌정보학과)
  • Received : 2019.11.15
  • Accepted : 2019.12.25
  • Published : 2019.12.30

Abstract

This study aims to suggest an effective method for the automatic classification of keywords with similar patterns by calculating pattern similarity of temporal data. For this, large scale news on the Web were collected and time series data composed of 120 time segments were built. To make training data set for the performance test of the proposed model, 440 representative keywords were manually classified according to 8 types of trend. This study introduces a Dynamic Time Warping(DTW) method which have been commonly used in the field of time series analytics, and proposes an application model, MA-DTW based on a Moving Average(MA) method which gives a good explanation on a tendency of trend curve. As a result of the automatic classification by a k-Nearest Neighbor(kNN) algorithm, Euclidean Distance(ED) and DTW showed 48.2% and 66.6% of maximum micro-averaged F1 score respectively, whereas the proposed model represented 74.3% of the best micro-averaged F1 score. In all respect of the comprehensive experiments, the suggested model outperformed the methods of ED and DTW.

본 연구는 시계열 특성을 갖는 데이터의 패턴 유사도 비교를 통해 유사 추세를 보이는 키워드를 자동 분류하기 위한 효과적인 방법을 제안하는 것을 목표로 한다. 이를 위해 대량의 웹 뉴스 기사를 수집하고 키워드를 추출한 후 120개 구간을 갖는 시계열 데이터를 생성하였다. 제안한 모델의 성능 평가를 위한 테스트 셋을 구축하기 위해, 440개의 주요 키워드를 8종의 추세 유형에 따라 수작업으로 범주를 부여하였다. 본 연구에서는 시계열 분석에 널리 활용되는 동적 시간 와핑(DTW) 기법을 기반으로, 추세의 경향성을 잘 보여주는 이동평균(MA) 기법을 DTW에 추가 적용한 응용 모델인 MA-DTW를 제안하였다, 자동 분류 성능 평가를 위해 k-최근접 이웃(kNN) 알고리즘을 적용한 결과, ED와 DTW가 각각 마이크로 평균 F1 기준 48.2%와 66.6%의 최고 점수를 보인 데 비해, 제안 모델은 최고 74.3%의 식별 성능을 보여주었다. 종합 성능 평가를 통해 측정된 모든 지표에서, 제안 모델이 기존의 ED와 DTW에 비해 우수한 성능을 보임을 확인하였다.

Keywords

Acknowledgement

Supported by : 덕성여자대학교

본 연구는 2018년도 덕성여자대학교 교내연구비 지원에 의해 이루어졌음(3000003047).

References

  1. Kim, Yunji, & Park, Cheong Hee (2014). An improved dynamic time warping method for query by humming. Journal of Korean Institute of Information Scientists and Engineers(KIISE): Software and Applications, 41(4), 318-326.
  2. Park, KeeHyun, & Yoo, Sangjin (2003). A prediction system on user interest degree to web sites using the concept of the moving averages. Korean management science review, 20(1), 25-36.
  3. Seo, Janghyuk, Jung, Woohwan, & Shim, Kyuseok (2019). Improving the upper bound of the dynamic time warping for sparse and long time sequences. Journal of Korean Institute of Information Scientists and Engineers(KIISE), 46(6), 570-576. http://dx.doi.org/10.5626/JOK.2019.46.6.570
  4. An, Juyoung, Ahn, Kyubin, & Song, Min (2016). Text mining driven content analysis of ebola on news media and scientific publications. Journal of the Korean Society for Library and Information Science, 50(2), 289-307. https://doi.org/10.4275/KSLIS.2016.50.2.289
  5. Lee, Jae Won (2012). A stock trading system based on moving average patterns and turning point matrix. Journal of KIISE: Computing Practices and Letters, 18(7), 528-532.
  6. Lee, Chunju, Ahn, Wonbin, & Oh, KyongJoo (2017). Analysis of intraday price momentum effect based on patterns using dynamic time warping. Journal of the Korean Data & Information Science Society, 28(4), 819-829. http://dx.doi.org/10.7465/jkdi.2017.28.4.819
  7. Jeong, Do-Heon (2017). Prescriptive analytics system design fusing automatic classification method and intellectual structure analysis method. Journal of the Korean Society for information Management, 34(4), 33-57. https://dx.doi.org/10.3743/KOSIM.2017.34.4.033
  8. Jeong, Do-Heon (2018). Generating and controlling an interlinking network of technical terms to enhance data utilization. Journal of the Korean Society for information Management, 35(1), 157-182. https://dx.doi.org/10.3743/KOSIM.2018.35.1.157
  9. Jeong, Do-Heon, & Joo, Hwang-Soo (2018). Discovering interdisciplinary convergence technologies using content analysis technique based on topic modeling. Journal of the Korean Society for information Management, 35(3), 77-100. http://doi.org/10.3743/KOSIM.2018.35.3.077
  10. Choi, Sanghee (2017). Analysis of author image based on book recommendation from readers. Journal of the Korean Society for information Management, 34(4), 153-171. https://doi.org/10.3743/KOSIM.2017.34.4.153
  11. Pyo, Soon Hee, Kim, Yun Hyung, Kim, Hye Sun, & Kim, Wan Jong (2015). A study on the developing of big data services in public library. Journal of the Korean Society for information Management, 32(2), 63-86. https://doi.org/10.3743/KOSIM.2015.32.2.063
  12. Aach, J., & Church, G. M. (2001). Aligning gene expression time series with time warping algorithms. Bioinformatics, 17(6), 495-508. http://dx.doi.org/10.1093/bioinformatics/17.6.495
  13. Abe, H., & Tsumoto, S. (2010). Trend detection from large text data. 2010 IEEE International Conference on Systems Man and Cybernetics (SMC), 310-315. http://dx.doi.org/10.1109/ICSMC.2010.5641682
  14. Al-Naymat, G., Chawla, S., & Taheri, J. (2009). SparseDTW: a novel approach to speed up dynamic time warping. Proceeding of the Eighth Australasian Data Mining Conference, 101, 117-127.
  15. Asch, V. V. (2013). Macro- and micro-averaged evaluation measures [BASIC DRAFT].
  16. Astrom, F. (2007). Changes in the LIS research front: time-sliced cocitation analyses of LIS journal articles, 1990-2004. Journal of the American Society for Information Science and Technology, 58(7), 947-957. http://dx.doi.org/10.1002/asi.20567
  17. Bagnall, A., Lines, J., Bostrom, A., Large, J., & Keogh, E. (2017). The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances, 31(3), 606-660. https://doi.org/10.1007/s10618-016-0483-9
  18. Daim, T. U., Rueda, G., Martin, H., & Gerdsri, P. (2006). Forecasting emerging technologies: Use of bibliometrics and patent analysis. Technological Forecasting and Social Change, 73(8), 981-1012. http://dx.doi.org/10.1016/j.techfore.2006.04.004
  19. Dore, J. C., & Ojasoo, T. (2001). How to analyze publication time trends by correspondence factor analysis: Analysis of publications by 48 countries in 19 disciplines over 12 years. Journal of the American Society for Information Science and Technology, 52(9), 763-769. http://dx.doi.org/10.1002/asi.1130
  20. Geler, Z., Kurbalija, V., Radovanovic, M., & Ivanovic, M. (2014). Impact of the sakoe-chiba band on the DTW time-series distance measure for kNN classification. International Conference on Knowledge Science, Engineering and Management (KSEM 2014): Knowledge Science, Engineering and Management, 105-114.
  21. Glanzel, W., & Schlemmer, B. (2007). National research profiles in a changing europe (1983-2003): An exploratory study of sectoral characteristics in the Triple Helix. Scientometrics, 70(2), 267-275. http://dx.doi.org/10.1007/s11192-007-0203-8
  22. Hsu, H. H., Yang, A. C., & Lu, M. D. (2011). KNN-DTW based missing value imputation for microarray time series data. Journal of Computers, 6(3), 418-425. http://dx.doi.org/10.4304/jcp.6.3.418-425
  23. Hwang, M. N., Cho, M. H., Hwang, M., Lee, M., & Jeong, D. H. (2011). Application of trend detection of technical terms to technology opportunity discovery. Communications in Computer and Information Science (CCIS), 264, 258-262. http://dx.doi.org/10.1007/978-3-642-27210-3_33
  24. Jeffery, S. R., Alonso, G., Franklin, M. J., Hong, W., & Widom, J. (2006). Declarative support for sensor data cleaning. International Conference on Pervasive Computing (LNCS 3968), 88-100.
  25. Juang, B.-H. (1984). On the hidden markov model and dynamic time warping for speech recognition - A unified view. AT&T DELL LAB Technical Journal, 63(7), 1213-1243. http://dx.doi.org/10.1002/j.1538-7305.1984.tb00034.x
  26. Keogh, E. (2005). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358-386. http://dx.doi.org/10.1007/s10115-004-0154-9
  27. Keogh, E. J., & Pazzani, M. J. (2001). Derivative dynamic time warping. Proceedings of the 2001 SIAM International Conference on Data Mining, 1-11. http://dx.doi.org/10.1137/1.9781611972719.1
  28. Kim, J., Hwang, M., Jeong, D.H., & Jung, H. (2012). Technology trends analysis and forecasting application based on decision tree and statistical feature analysis. Expert Systems with Applications, 39(2012), 12618-12625. http://dx.doi.org/10.1016/j.eswa.2012.05.021
  29. Ko, M. H., West, G., Venkatesh, S., & Kumar, M. (2005). Online context recognition in multisensor systems using dynamic time warping. 2005 International Conference on Intelligent Sensors, Sensor Networks and Information Processing. http://dx.doi.org/10.1109/ISSNIP.2005.1595593
  30. Mei, Q., & Zhai, C. X. (2005). Discovering evolutionary theme patterns from text: An exploration of temporal text mining. The 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 198-207. http://dx.doi.org/10.1145/1081870.1081895
  31. Niennattrakul, V., & Ratanamahatana, C. A. (2007). On clustering multimedia time series data using K-Means and dynamic time warping. 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07). http://dx.doi.org/10.1109/MUE.2007.165
  32. Rajagopalan, S., & Santoso, S. (2009). Wind power forecasting and error analysis using the autoregressive moving average modeling. 2009 IEEE Power & Energy Society General Meeting. http://dx.doi.org/10.1109/PES.2009.5276019
  33. Salvador, S., & Chan, P. (2007). Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis, 11(5), 561-580. http://dx.doi.org/10.3233/IDA-2007-11508
  34. ten Holt, G. A., Reinders, M. J. T., & Hendriks, E. A. (2007). Multi-dimensional dynamic time warping for gesture recognition. Thirteenth annual conference of the Advanced School for Computing and Imaging
  35. Tsokos, C. P. (2010). K-th Moving, Weighted and exponential moving average for time series forecasting models. European Journal of Pure and Applied Mathematics, 3(3), 406-416.
  36. Yang, K., & Shahabi, C. (2007). An efficient k nearest neighbor search for multivariate time series. Information and Computation, 205(1), 65-98. http://dx.doi.org/10.1016/j.ic.2006.08.004
  37. Zhuang, Y., Chen, L., Wang, X.S., & Lian, J. (2007). A weighted moving average-based approach for cleaning sensor data. 27th International Conference on Distributed Computing Systems (ICDCS '07). http://dx.doi.org/10.1109/ICDCS.2007.83