Basic Concepts and Principles of Data Mining in Clinical Practice

임상에서의 데이터 마이닝 개념과 원칙

Lee, Sun-Mi;Park, Rae-Woong
이선미;박래웅

  • Published : 20090600

Abstract

Recently, many hospitals have been adopting clinical data warehouses (CDW) as well as electronic medical records. These new hospital information systems are inevitably introducing very large amounts of clinical data that might be useful for further analysis. However, the electronic clinical data in the CDW are usually byproducts of clinical practice rather than the product of research. Therefore, they include inconsistent and sometimes erroneous information that might not have the specific context of the clinical situations. Data miners usually have various academic backgrounds such as electronics, informatics, statistics, biomedicine, and public health. If the complex situations surrounding the clinical data are not well understood, investigators performing data mining in clinical fields may have problems assessing the information they are confronted with. Here, we would like to introduce some basic concepts on the principles of data mining in clinical fields including legal and ethical considerations as well as technical concerns.

Keywords

References

  1. Cios KJ, William Moore G. Uniqueness of medical data mining. ArtificialIntelligence in Medicine 2002;26(1-2):1-24
  2. Lavrac N, Keravnou E, Zupan B. An overview. In:Lavrac N, Keravnou E, Zupan B, editors. Intelligentdata analysis in medicine and pharmacology. Boston: Kluwer;1997. pp.1-13
  3. Simon SR, Kaushal R, Cleary PD, Jenter CA, Volk LA,Orav EJ, et al. Physicians and electronic health records:a statewide survey. Archives of Internal Medicine 2007;167(5):507-512 https://doi.org/10.1001/archinte.167.5.507
  4. Menachemi N, Perkins RM, van Durme DJ, BrooksRG. Examining the adoption of electronic healthrecords and personal digital assistants by familyphysicians in Florida. Inform Prim Care 2006;14(1):1-9
  5. Park RW, Shin SS, Choi YI, Ahn JO, Hwang SC.Computerized physician order entry and electronicmedical record systems in Korean teaching and generalhospitals: results of a 2004 survey. J Am Med InformAssoc 2005;12(6):642-647 https://doi.org/10.1197/jamia.M1768
  6. Sittig F, Guappone K, Campbell E, Dykstra R, Ash J. Asurvey of USA acute care hospitals' computer-basedprovider order entry system infusion levels. Stud HealthTechnol Inform 2007;129(1):252
  7. DesRoches CM, Campbell EG, Rao SR, Donelan K,Ferris TG, Jha A, et al. Electronic health records inambulatory care--a national survey of physicians. TheNew England Journal of Medicine 2008;359(1):50-60 https://doi.org/10.1056/NEJMsa0802005
  8. Dewitt JG, Hampton PM. Development of a datawarehouse at an academic health system: knowing aplace for the first time. Acad Med 2005;80(11):1019- 1025 https://doi.org/10.1097/00001888-200511000-00009
  9. Schubart JR, Einbinder JS. Evaluation of a datawarehouse in an academic health sciences center.International Journal of Medical Informatics 2000;60(3):319-333 https://doi.org/10.1016/S1386-5056(00)00126-X
  10. Silver M, Sakata T, Su HC, Herman C, Dolins SB,O'Shea MJ. Case study: how to apply data miningtechniques in a healthcare data warehouse. J Healthc Inf Manag 2001;15(2):155-164
  11. Zhang Q, Matsumura Y, Teratani T, Yoshimoto S,Mineno T, Nakagawa K, et al. The application of aninstitutional clinical data warehouse to the assessmentof adverse drug reactions (ADRs). Evaluation of aminoglycosideand cephalosporin associated nephrotoxicity.Methods Inf Med 2007;46(5):516-522 https://doi.org/10.1160/ME0374
  12. Kononenko I. Machine learning for medical diagnosis:history, state of the art and perspective. Artif Intell Med2001;23(1):89-109 https://doi.org/10.1016/S0933-3657(01)00077-X
  13. Lavrac N. Selected techniques for data mining inmedicine. Artif Intell Med 1999;16(1):3-23 https://doi.org/10.1016/S0933-3657(98)00062-1
  14. Kopelman LM. Minimal risk as an international ethicalstandard in research. The Journal of Medicine andPhilosophy 2004;29(3):351-378 https://doi.org/10.1080/03605310490500545
  15. Cios KJ. Medical data mining and knowledgediscovery. IEEE Eng Med Biol Mag 2000;19(4):15-16 https://doi.org/10.1109/MEMB.2000.853477
  16. Cios KJ, Teresinska A, Konieczna S, Potocka J, SharmaS. A knowledge discovery approach to diagnosingmyocardial perfusion. IEEE Eng Med Biol Mag 2000;19(4):17-25 https://doi.org/10.1109/51.853478
  17. Yuan YC. Multiple imputation for missing data:concepts and new development. Paper presented at:Twenty-Fifth Annual SAS Users Group International Conference 2000
  18. Schafer JL, Graham JW. Missing data: our view of thestate of the art. Psychol Methods 2002;7(2):147-177 https://doi.org/10.1037/1082-989X.7.2.147
  19. Harel O, Zhou XH. Multiple imputation: review oftheory, implementation and software. Stat Med 2007;26(16):3057-3077 https://doi.org/10.1002/sim.2787
  20. Haykin S. Neural networks and learning machines. 3rded. New York: Prentice Hall;2008
  21. Bishop CM. Pattern recognition and machine learning.2nd ed. New York:Springer;2005. pp. 291-358
  22. Rokach L, Maimon O. Data mining with decision trees:theroy and applications. Danvers, MA: World ScientificPublishing Company;2008
  23. Heckerman DE. Learning Bayesian networks: Thecombination of knowledge and statistical data. Redmond,WA: Microsoft Research;1995. MSR-TR-94-09
  24. Heckerman DE. Bayesian networks for data mining.Data Mining and Knowledge Discovery 1997;1:79-119 https://doi.org/10.1023/A:1009730122752
  25. Heckerman DE, Fayyad UM, Piatetsky-Shapiro G,Smyth P, Uthurusamy R. Bayesian networks forknowledge discovery. Advances in knowledge discoveryand data mining. Menlo Park, CA: The MIT Press;1996. pp. 273-305
  26. Lee SM, Abbott P. Bayesian networks for knowledgediscovery in large datasets: basics for nurse researchers Journal of Biomedical Informatics 2003;36(4/5):389-399 https://doi.org/10.1016/j.jbi.2003.09.022
  27. SPSS. Clementine 12.0 modeling nodes. Chicago:SPSS;2007
  28. SPSS. Clementine manual-Basic. Seoul:SPSS;2007
  29. Menard SW. Applied logistic regression analysis. 2nded. London: Sage Publications;2001
  30. Lee SM, Abbott P, Johantgen M. Logistic regressionand bayesian networks to study outcomes using largedata sets. Nursing Research 2005;54(2):133-138
  31. Tu JV. Advantages and disadvantages of using artificialneural networks versus logistic regression for predictingmedical outcomes. Journal of Clinical Epidemiology1996;49:1225-1232 https://doi.org/10.1016/S0895-4356(96)00002-9
  32. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S,Yeang CH, Angelo M, et al. Multiclass cancer diagnosisusing tumor gene expression signatures. Proc Natl AcadSci U S A 2001;98(26):15149-15154 https://doi.org/10.1073/pnas.211566398
  33. Furey TS, Cristianini N, Duffy N, Bednarski DW,Schummer M, Haussler D. Support vector machineclassification and validation of cancer tissue samplesusing microarray expression data. Bioinformatics2000;16:906-914 https://doi.org/10.1093/bioinformatics/16.10.906
  34. Lauritzen SL, Spiegelhalter DJ. Local computationswith probabilities on graphical structures and theirapplication to expert systems. Journal of the Royal Statistical Society Series B 1988;50(2):157-194
  35. Eisenstein EL, Alemi F. A comparison of threetechniques for rapid model development: an applicationin patient risk-stratification. Proceedings/AMIA Annual Fall Symposium 1996:443-447
  36. Hanley JA, McNeil BJ. The meaning and use of the areaunder a receiver operating characteristic (ROC) curve.Radiology 1982;143(1):29-36
  37. Hanley JA, McNeil BJ. A method of comparing theareas under receiver operating characteristic curvesderived from the same cases. Radiology 1983;148(3):839-843
  38. Rowland T, Ohno-Machado L, Ohrn A. Comparison ofmultiple prediction models for ambulation followingspinal cord injury. Proceedings/AMIA Annual Symposium1998:528-532
  39. Hosmer DW, Lemeshow S. Goodness of fit tests for themultiple logistic regression model. Communications inStatistics 1980;A 9(10):1043-1069 https://doi.org/10.1080/03610928008827941
  40. Lemeshow S, Hosmer DW. A review of goodness of fitstatistics for use in the development of logisticregression models. American Journal of Epidemiology 1982;115(1):92-106
  41. Blum RL. Displaying clinical data from a time-orienteddatabase. Computers in Biology and Medicine 1981;11(4):197-210 https://doi.org/10.1016/S0010-4825(81)80022-4
  42. Elomaa T HN. An experimental comparison ofinducing decision trees and decision lists in noisydomains. Paper presented at: 4th European Working Session on Learning; Dec 4-6, 1989; Montpeiller
  43. Lesmo L SL, Torasso P. Learning of fuzzy productionrules for medical diagnoses. In: Gupta MM SE, editor.Approximate reasoning in decision analysis. Amsterdam:North-Holland;1982. pp.249-260
  44. Hojker S KI, Jauk A, Fidler V, Porenta M. Expertsystem's development in the management of thyroiddiseases. Paper presented at: European Congress for Nuclear Medicine; Sep, 1988; Milano
  45. Horn W. AI in medicine on its way from knowledge-intensive to data-intensive systems. ArtificialIntelligence in Medicine 2001;23(1):5-12
  46. Quinlan R CP, Horn KA, Lazarus L. Inductive knowledgeacquisition: a case study. In: JR Q, editor. Applicationsof expert systems. Boston:Addison-Wesley; 1987. pp. 137-156
  47. Zupan B, Dzeroski S. Acquiring background knowledgefor machine learning using function decomposition:a case study in rheumatology. Artif Intell Med1998; 14(1-2):101-117 https://doi.org/10.1016/S0933-3657(98)00018-9
  48. Cohen ME, Hudson DL. Neural network models forbiosignal analysis. Conf Proc IEEE Eng Med Biol Soc2006;1:3537-3540
  49. Chun FK, Karakiewicz PI, Briganti A, Walz J, KattanMW, Huland H, et al. A critical appraisal of logisticregression-based nomograms, artificial neural networks,classification and regression-tree models, lookuptables and risk-group stratification models for prostatecancer. BJU Int 2007;99(4):794-800 https://doi.org/10.1111/j.1464-410X.2006.06694.x
  50. Rodriguez Alonso A, Pertega Diaz S, Gonzalez BlancoA, Pita Fernandez S, Suarez Pascual G, Cuerpo PerezMA. The utility of artificial neural networks in the prediction of prostate cancer on transrectal biopsy.Actas Urol Esp 2006;30(1):18-24 https://doi.org/10.1016/S0210-4806(06)73391-5
  51. Stephan C, Cammann H, Jung K. Artificial neuralnetworks: has the time come for their use in prostatecancer patients? Nat Clin Pract Urol 2005;2(6):262-263 https://doi.org/10.1038/ncpgasthep0188
  52. Gamito EJ, Crawford ED. Artificial neural networks forpredictive modeling in prostate cancer. Curr Oncol Rep 2004;6(3):216-221 https://doi.org/10.1007/s11912-004-0052-z
  53. Porter CR, Crawford ED. Combining artificial neuralnetworks and transrectal ultrasound in the diagnosis ofprostate cancer. Oncology (Williston Park) 2003;17(10):1395-1399; discussion 1399, 1403-1396
  54. Schwarzer G, Schumacher M. Artificial neuralnetworks for diagnosis and prognosis in prostate cancer.Semin Urol Oncol 2002;20(2):89-95 https://doi.org/10.1053/suro.2002.32492
  55. Errejon A, Crawford ED, Dayhoff J, O'Donnell C,Tewari A, Finkelstein J, et al. Use of artificial neuralnetworks in prostate cancer. Mol Urol 2001;5(4):153-158 https://doi.org/10.1089/10915360152745821
  56. Murphy GP, Snow P, Simmons SJ, Tjoa BA, RogersMK, Brandt J, et al. Use of artificial neural networks inevaluating prognostic factors determining the response to dendritic cells pulsed with PSMA peptides in prostatecancer patients. Prostate 2000;42(1):67-72 https://doi.org/10.1002/(SICI)1097-0045(20000101)42:1<67::AID-PROS8>3.0.CO;2-I
  57. Gamito EJ, Stone NN, Batuello JT, Crawford ED. Useof artificial neural networks in the clinical staging ofprostate cancer: implications for prostate brachytherapy.Tech Urol 2000;6(2):60-63
  58. Snow PB, Smith DS, Catalona WJ. Artificial neuralnetworks in the diagnosis and prognosis of prostatecancer: a pilot study. J Urol 1994;152(5 Pt 2):1923-1926
  59. Giles LC, Whitehead CH, Jeffers L, McErlean B,Thompson D, Crotty M. Falls in hospitalized patients:can nursing information systems data predict falls?Computers, Informatics, Nursing 2006;24(3):167-172 https://doi.org/10.1097/00024665-200605000-00014
  60. Tiet Q, Ilgen MA, Byrnes HF, Moos RH. Suicideattempts among substance use disorder patients: aninitial step toward a decision tree for suicide management. Alcoholism: Clinical and Experimental Research2006;30(6):998-1005 https://doi.org/10.1111/j.1530-0277.2006.00114.x
  61. Modai I, Valevski A, Solomish A, Kurs R, Hines IL,Ritsner M, et al. Neural network detection of files ofsuicidal patients and suicidal profiles. Medical Informaticsand the Internet in Medicine 1999;24(4):249-256 https://doi.org/10.1080/146392399298276
  62. Anthony D, Clark M, Dallender J. An optimization ofthe Waterlow score using regression and artificialneural networks. Clinical Rehabilitation 2000;14(1):102-109 https://doi.org/10.1191/026921500670250429
  63. Brossette SE, Sprague AP, Hardin JM, Waites KB,Jones WT, Moser SA. Association rules and datamining in hospital infection control and public health surveillance. Journal of the American Medical InformaticsAssociation 1998;5(4):373-381 https://doi.org/10.1136/jamia.1998.0050373
  64. Rapeli CB, Botega NJ. Clinical profiles of serioussuicide attempters consecutively admitted to auniversity-based hospital: a cluster analysis study. Revista Brasileira de Psiquiatria 2005;27(4):285-289 https://doi.org/10.1590/S1516-44462005000400006