DOI QR코드

DOI QR Code

A Study on the Node Split in Decision Tree with Multivariate Target Variables

다변량 목표변수를 갖는 의사결정나무의 노드분리에 관한 연구

  • 김성준 (강릉대학교 산업시스템공학과)
  • Published : 2003.08.01

Abstract

Data mining is a process of discovering useful patterns for decision making from an amount of data. It has recently received much attention in a wide range of business and engineering fields. Classifying a group into subgroups is one of the most important subjects in data mining. Tree-based methods, known as decision trees, provide an efficient way to finding the classification model. The primary concern in tree learning is to minimize a node impurity, which is evaluated using a target variable in the data set. However, there are situations where multiple target variable should be taken into account, for example, such as manufacturing process monitoring, marketing science, and clinical and health analysis. The purpose of this article is to present some methods for measuring the node impurity, which are applicable to data sets with multivariate target variables. For illustration, a numerical cxample is given with discussion.

데이터마이닝은 많은 양의 데이터로부터 의사결정에 유용한 패턴을 발견하는 과정으로서 최근 경영 및 공학 분야의 폭넓은 영역에서 많은 관심을 모으고 있다. 어떤 그룹을 여러 하위그룹으로 분류해내는 일은 데이터마이닝의 주요 내용 중 하나이다. 의사결정나무로 알려진 트리기반 기법은 그러한 분류모형을 수립하는 데 효율적인 방안을 제공한다 트리학습에 있어서 우선적인 관건은 목표변수에 의해 측정되는 노드불순도를 최소화하는 것이다. 하지만 공정관측, 마케팅과학, 임상분석 등과 같은 문제에서는 여러 목표변수를 동시에 고려해야 하는 상황이 쉽게 등장하는 데, 본 논문의 목적은 이처럼 다변량 목표변수를 갖는 데이터셋에서 활용할 수 있는 노드불순도 측정방안을 제시하는 데 있다. 아울러 수치 예를 이용하여 적용결과에 대해 논의한다.

Keywords

References

  1. Indranil Bose and Radha K. Mahapatra, "Business Data Mining A Machine Learning Perspective," Information & Management, Vol. 39, pp. 211-225, 2001. https://doi.org/10.1016/S0378-7206(01)00091-X
  2. Katharina D. C. Stark and Dirk U. Pfeiffer, "The Application of Non-parametric Techniques to Solve Classification Problems in Complex Data Sets in Veterinary Epidemiology An Example," Intelligent Data Analysis, Vol. 3, pp. 23-35, 1999. https://doi.org/10.1016/S1088-467X(99)00003-7
  3. Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone, Classification and Regression Trees, Boca Raton, FL: Chapman & HaIl/CRC, 1984.
  4. Heping Zhang, "Classification Trees with Multiple Binary Responses," Journal of the American Statistical Association, Vol. 93, No. 441, pp. 180-193, 1998. https://doi.org/10.2307/2669615
  5. Antonio Ciampi, Djamel A. Zighed, and Jeremy Clech, "Trees and Induction Graphs for Multivariate Response," Lecture Notes In Computer Science, No. 1910, pp. 359-366, 2000.
  6. Roberta Siciliano and Francesco Mola, "Multivariate Data Analysis and Modeling Through Classification and Regression Trees," Computational Statistics & Data Analysis, Vol. 32, pp. 285-301, 2000. https://doi.org/10.1016/S0167-9473(99)00082-1
  7. 장남식 외 2인, 데이터마이닝, 대청, 2000.
  8. Seong-Jun Kim and Kang B. Lee, "Constructing Decision Trees with Multiple Response Variables," International Journal of Management and Decision Making, Vol. 6, 2003, to appear.
  9. UCI Repository of Machine Learning Databases, 1998.

Cited by

  1. A Study on Propriety of Pilot Aptitude Test Using Phased Analysis of Pilot Training vol.26, pp.3, 2016, https://doi.org/10.5391/JKIIS.2016.26.3.218