DOI QR코드

DOI QR Code

Detection and Correction Method of Erroneous Data Using Quantile Pattern and LSTM

  • Received : 2018.10.22
  • Accepted : 2018.11.20
  • Published : 2018.12.31

Abstract

The data of K-Water waterworks is collected from various sensors and used as basic data for the operation and analysis of various devices. In this way, the importance of the sensor data is very high, but it contains misleading data due to the characteristics of the sensor in the external environment. However, the cleansing method for the missing data is concentrated on the prediction of the missing data, so the research on the detection and prediction method of the missing data is poor. This is a study to detect wrong data by converting collected data into quintiles and patterning them. It is confirmed that the accuracy of detecting false data intentionally generated from real data is higher than that of the conventional method in all cases. Future research we will prove the proposed system's efficiency and accuracy in various environments.

Keywords

E1ICAW_2018_v16n4_242_f0001.png 이미지

Fig. 1. As-Is algorithm versus proposed algorithm.

E1ICAW_2018_v16n4_242_f0002.png 이미지

Fig. 2. Process flow of proposed algorithm.

E1ICAW_2018_v16n4_242_f0003.png 이미지

Fig. 3. Procedure for determining the error data.

E1ICAW_2018_v16n4_242_f0004.png 이미지

Fig. 4. Percent of erroneous data detected as erroneous data.

E1ICAW_2018_v16n4_242_f0005.png 이미지

Fig. 5. Judgment error for normal data.

Table 1. Percentage of erroneous data detected

E1ICAW_2018_v16n4_242_t0001.png 이미지

Table 2. Percent of erroneous data detected as erroneous data

E1ICAW_2018_v16n4_242_t0002.png 이미지

Table 3. Number of normal data detected as erroneous data

E1ICAW_2018_v16n4_242_t0003.png 이미지

References

  1. J. R. Kim, G. W. Shin, H. S. Kim, and S. T. Hong, "A study on cleansing algorithm for outlier data in water supply," in Proceedings of the Korean Institute of Communications and Information Sciences Summer Conference, pp. 19-20, 2017.
  2. G. W. Choi, K. S. Song, and J. Kang, "Understanding and policy assignment of R&D of deep learning," Korea Institute of S&T Evaluation and Planning, 2016 [Internet], Available: https://www.kistep.re.kr/c3/sub3.jsp?brdType=R&bbIdx=10484.
  3. S. M. Hong and A. Jang, "The development study on the integrated management system for water information based on ICT," Journal of Korean Society of Environmental Engneers, vol. 39, no. 12, pp. 723-732, 2017. DOI: 10.4491/KSEE.2017.39.12.723.
  4. S. Baek, C. Seong, S. Choe, Y. Park, and M. Kim, "Mobile water quality monitoring system using ion-selective-electrodes," Journal of the Institute of Electronics and Information Engineers, vol. 55, no. 2, pp. 29-38, 2018. DOI: 10.5573/ieie.2018.55.2.29.
  5. C. H. Kim, L. S. Kang, and H. J. Kim, "The development of information breakdown structure for integrated management of water filtration plants," Journal of the Korean Society of Civil Engineers, vol. 37, no. 5, pp. 863-869, 2017. DOI: 10.12652/Ksce.2017.37.5.0863.
  6. V. Q. Nguyen, L. Van Ma, and J. Kim, "LSTM-based anomaly detection on big data for smart factory monitoring," Journal of Digital Contents Society, vol. 19, no. 4, pp. 789-799, 2018. DOI: 10.9728/dcs.2018.19.4.789.
  7. J. M. Jerez, I. Molina, P. J. Garcia-Laencina, E. Alba, N. Ribelles, M. Martin, and L. Franco, "Missing data imputation using statistical and machine learning methods in a real breast cancer problem," Artificial Intelligence in Medicine, vol. 50, no. 2, pp. 105-115, 2010. DOI: 10.1016/j.artmed.2010.05.002.
  8. F. Liu, Z. You, W. Shan, and J. Liu, "A grey system based missing sensor data estimation algorithm," in Proceedings of 2012 2nd International Conference on Computer Science and Network Technology, Changchun, China, pp. 482-486, 2012. DOI: 10.1109/ICCSNT.2012.6525982.
  9. N. I. Nwulu, "Evaluation of machine learning classification algorithms & missing data imputation techniques," in Proceedings of 2017 International Artificial Intelligence and Data Processing Symposium, Malatya, Turkey, pp. 1-5, 2017. DOI: 10.1109/IDAP.2017.8090315.
  10. Z. C. Lipton, D. C. Kale, and R. Wetzel, "Modeling missing data in clinical time series with RNNs," Proceedings of Machine Learning Research, vol. 56, pp. 253-270, 2016.

Cited by

  1. An Effect of Genetic Algorithm for Creating a Dataset vol.20, pp.1, 2019, https://doi.org/10.9728/dcs.2019.20.1.127
  2. 미세먼지 예측 성능 개선을 위한 CNN-LSTM 결합 방법 vol.24, pp.1, 2020, https://doi.org/10.6109/jkiice.2020.24.1.57