Taxation Analysis Using Machine Learning

머신러닝을 이용한 세금 계정과목 분류

  • 최동빈 (단국대학교 컴퓨터학과) ;
  • 조인수 (단국대학교 소프트웨어학과) ;
  • 박용범 (단국대학교 소프트웨어학과)
  • Received : 2019.06.17
  • Accepted : 2019.06.24
  • Published : 2019.06.30

Abstract

Data mining techniques can also be used to increase the efficiency of production in the tax sector, which requires professional skills. As tax-related computerization was carried out, large amounts of data were accumulated, creating a good environment for data mining. In this paper, we have developed a system that can help tax accountant who have existing professional abilities by using data mining techniques on accumulated tax related data. The data mining technique used is random forest and improved by using f1-score. Using the implemented system, data accumulated over two years was learned, showing high accuracy at prediction.

Keywords

References

  1. Lakshmi, R. D., & Radha, N., "Machine Learning Approach for Taxation Analysis using Classification Techniques.," International Journal of Computer Applications, 12(10). , 2011.
  2. L.Breiman, Randomforests, MachineLearning45 (2001)5-32.
  3. G.Biau, L.Devroye, G.Lugosi, Consistency of random forests and other averaging classifiers, Journal of Machine Learning Research 9 (2008) 2015-2033.
  4. Verikas A, Gelzinis A, Bacauskiene M. Mining data with random forests: a survey and results of new tests. Pattern Recognit 2011, 44:330-349. https://doi.org/10.1016/j.patcog.2010.08.011
  5. Sasaki Y 2007 The truth of the F-measure Teach. Tutor. Mater. 1-5.
  6. T.K. Ho The random subspace method for constructing decision forests IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (8) (1998), pp. 832-844. https://doi.org/10.1109/34.709601
  7. G. Nimrod, A. Szilagyi, C. Leslie, N. Ben-Tal Identification of DNA-binding proteins using structural, electrostatic and evolutionary features Journal of Molecular Biology, 387 (4) (2009), pp. 1040-1053. https://doi.org/10.1016/j.jmb.2009.02.023
  8. H.T. Chen, T.L. Liu, C.S. Fuh Segmenting highly articulated video objects with weak-prior random forests A. Leonardis, H. Bischof, A. Pinz (Eds.), ECCV 2006, Part IV, Lecture Notes in Computer Science, vol. 3954, Springer-Verlag, Berlin, Heidelberg (2006), pp. 373-385.
  9. J. Ham, Y. Chen, M.M. Crawford, J. Ghosh Investigation of the random forest framework for classification of hyperspectral data IEEE Transactions on Geoscience and Remote Sensing, 43 (3) (2005), pp. 492-501. https://doi.org/10.1109/TGRS.2004.842481
  10. M.M. Crawford, J. Ham, Y. Chen, J. Ghosh Random forests of binary hierarchical classifiers for analysis of hyperspectral data 2003 IEEE Workshop on Advances in Techniques for Analysis of Remotely Sensed Data, IEEE, Greenbelt, MD, USA (2004), pp. 337-345.
  11. J. Peters, B. De Baets, N.E.C. Verhoest, R. Samson, S. Degroeve, P. De Becker, W. Huybrechts Random forests as a tool for ecohydrological distribution modelling Ecological Modelling, 207 (2007), pp. 304-318. https://doi.org/10.1016/j.ecolmodel.2007.05.011
  12. J. Peters, N.E.C. Verhoest, R. Samson, M. Van Meirvenne, L. Cockx, B. De Baets Uncertainty propagation in vegetation distribution models based on ensemble classifiers Ecological Modelling, 220 (2009), pp. 791-804. https://doi.org/10.1016/j.ecolmodel.2008.12.022