DOI QR코드

DOI QR Code

Computational Discrimination of Breast Cancer for Korean Women Based on Epidemiologic Data Only

  • Lee, Chiwon (The Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University) ;
  • Lee, Jung Chan (Department of Biomedical Engineering, Seoul National University College of Medicine) ;
  • Park, Boyoung (Graduate School of Cancer Science and Policy and National Cancer Control Institute, National Cancer Center) ;
  • Bae, Jonghee (Korea Aerospace Research Institute) ;
  • Lim, Min Hyuk (Department of Biomedical Engineering, Seoul National University College of Medicine) ;
  • Kang, Daehee (Department of Preventive Medicine, Seoul National University College of Medicine) ;
  • Yoo, Keun-Young (Department of Preventive Medicine, Seoul National University College of Medicine) ;
  • Park, Sue K. (Department of Preventive Medicine, Seoul National University College of Medicine) ;
  • Kim, Youdan (Department of Mechanical and Aerospace Engineering, Seoul National University College of Engineering) ;
  • Kim, Sungwan (Department of Biomedical Engineering, Seoul National University College of Medicine)
  • Received : 2015.01.02
  • Accepted : 2015.04.09
  • Published : 2015.08.10

Abstract

Breast cancer is the second leading cancer for Korean women and its incidence rate has been increasing annually. If early diagnosis were implemented with epidemiologic data, the women could easily assess breast cancer risk using internet. National Cancer Institute in the United States has released a Web-based Breast Cancer Risk Assessment Tool based on Gail model. However, it is inapplicable directly to Korean women since breast cancer risk is dependent on race. Also, it shows low accuracy (58%-59%). In this study, breast cancer discrimination models for Korean women are developed using only epidemiological case-control data (n = 4,574). The models are configured by different classification techniques: support vector machine, artificial neural network, and Bayesian network. A 1,000-time repeated random sub-sampling validation is performed for diverse parameter conditions, respectively. The performance is evaluated and compared as an area under the receiver operating characteristic curve (AUC). According to age group and classification techniques, AUC, accuracy, sensitivity, specificity, and calculation time of all models were calculated and compared. Although the support vector machine took the longest calculation time, the highest classification performance has been achieved in the case of women older than 50 yr (AUC = 64%). The proposed model is dependent on demographic characteristics, reproductive factors, and lifestyle habits without using any clinical or genetic test. It is expected that the model could be implemented as a web-based discrimination tool for breast cancer. This tool can encourage potential breast cancer prone women to go the hospital for diagnostic tests.

Keywords

References

  1. Shin HR, Joubert C, Boniol M, Hery C, Ahn SH, Won YJ, Nishino Y, Sobue T, Chen CJ, You SL, et al. Recent trends and patterns in breast cancer incidence among Eastern and Southeastern Asian women. Cancer Causes Control 2010; 21: 1777-85. https://doi.org/10.1007/s10552-010-9604-8
  2. Survival analysis of Korean breast cancer patients diagnosed between 1993 and 2002 in Korea: a Nationwide Study of the Cancer Registry. J Breast Cancer 2006; 9: 214-29. https://doi.org/10.4048/jbc.2006.9.3.214
  3. National Cancer Institute. Breast cancer risk assessment tool. Available at http://www.cancer.gov/bcrisktool/ [accessed on 8 December 2014].
  4. Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, Mulvihill JJ. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 1989; 81: 1879-86. https://doi.org/10.1093/jnci/81.24.1879
  5. Rockhill B, Spiegelman D, Byrne C, Hunter DJ, Colditz GA. Validation of the Gail et al. model of breast cancer risk prediction and implications for chemoprevention. J Natl Cancer Inst 2001; 93: 358-66. https://doi.org/10.1093/jnci/93.5.358
  6. Boyd CR, Tolson MA, Copes WS. Evaluating trauma care: the TRISS method. Trauma Score and the Injury Severity Score. J Trauma 1987; 27: 370-8. https://doi.org/10.1097/00005373-198704000-00005
  7. Levy SM, Herberman RB, Maluish AM, Schlien B, Lippman M. Prognostic risk assessment in primary breast cancer by behavioral and immunological parameters. Health Psychol 1985; 4: 99-113. https://doi.org/10.1037/0278-6133.4.2.99
  8. Choi JP, Han TH, Park RW. A hybrid bayesian network model for predicting breast cancer prognosis. J Korean Soc Med Inform 2009; 15: 49-57. https://doi.org/10.4258/jksmi.2009.15.1.49
  9. Kiyan T, Yildirim T. Breast cancer diagnosis using statistical neural networks. IU-JEEE 2004; 4: 1149-53.
  10. Ayer T, Alagoz O, Chhatwal J, Shavlik JW, Kahn CE Jr, Burnside ES. Breast cancer risk estimation with artificial neural networks revisited: discrimination and calibration. Cancer 2010; 116: 3310-21. https://doi.org/10.1002/cncr.25081
  11. Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, Leung WK. Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: initial experience. Radiology 2006; 240: 666-73. https://doi.org/10.1148/radiol.2403051096
  12. Lee SM. Comparisons of predictive modeling techniques for breast cancer in Korean women. J Korean Soc Med Inform 2008; 14: 37-44.
  13. Smigal C, Jemal A, Ward E, Cokkinides V, Smith R, Howe HL, Thun M. Trends in breast cancer by race and ethnicity: update 2006. CA Cancer J Clin 2006; 56: 168-83. https://doi.org/10.3322/canjclin.56.3.168
  14. Centers for Disease Control and Prevention. United States Cancer Statistics: 1999-2011 Cancer Incidence and Mortality Data. Available at www.cdc.gov/uscs [accessed on 08 December 2014].
  15. Jung KW, Park S, Kong HJ, Won YJ, Lee JY, Park EC, Lee JS. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2008. Cancer Res Treat 2011; 43: 1-11. https://doi.org/10.4143/crt.2011.43.1.1
  16. Park B, Ma SH, Shin A, Chang MC, Choi JY, Kim S, Han W, Noh DY, Ahn SH, Kang D, et al. Korean risk assessment model for breast cancer risk prediction. PLoS One 2013; 8: e76736. https://doi.org/10.1371/journal.pone.0076736
  17. McPherson K, Steel CM, Dixon JM. ABC of breast diseases. Breast cancer-epidemiology, risk factors, and genetics. BMJ 2000; 321: 624-8. https://doi.org/10.1136/bmj.321.7261.624
  18. Suzuki S, Kojima M, Tokudome S, Mori M, Sakauchi F, Fujino Y, Wakai K, Lin Y, Kikuchi S, Tamakoshi K, et al.; Japan Collaborative Cohort Study Group. Effect of physical activity on breast cancer risk: findings of the Japan collaborative cohort study. Cancer Epidemiol Biomarkers Prev 2008; 17: 3396-401. https://doi.org/10.1158/1055-9965.EPI-08-0497
  19. Won YJ, Sung J, Jung KW, Kong HJ, Park S, Shin HR, Park EC, Ahn YO, Hwang IK, Lee DH, et al. Nationwide cancer incidence in Korea, 2003-2005. Cancer Res Treat 2009; 41: 122-31. https://doi.org/10.4143/crt.2009.41.3.122
  20. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996; 49: 1373-9. https://doi.org/10.1016/S0895-4356(96)00236-3
  21. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20: 273-97.
  22. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000; 16: 906-14. https://doi.org/10.1093/bioinformatics/16.10.906
  23. Rodriguez-Moguel L, Bega-Ramos B. Risk of breast cancer of low differentiation in tumors with estrogen-negative receptors. Ginecol Obstet Mex 1999; 67: 503-7.
  24. Polat K, Gunes S. Breast cancer diagnosis using least square support vector machine. Digit Signal Process 2007; 17: 694-701. https://doi.org/10.1016/j.dsp.2006.10.008
  25. Hecht-Nielsen R. Theory of the backpropagation neural network. Proceedings of the International Joint Conference on Neural Networks; Washington, D.C.: IEEE Press, 1989, p.593-605.
  26. Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo, CA: Morgan Kaufmann Publishers Inc., 1988.
  27. Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples). Biometrika 1965; 52: 591-611. https://doi.org/10.1093/biomet/52.3-4.591
  28. Clemons M, Goss P. Estrogen and the risk of breast cancer. N Engl J Med 2001; 344: 276-85. https://doi.org/10.1056/NEJM200101253440407
  29. Park B. Development of sporadic and hereditary breast cancer risk assessment model in Korean women. Seoul: Seoul National University, 2012. Dissertation.
  30. Rokach L. Pattern classification using ensemble methods. Danvers, MA: World Scientific Pub. Co., 2010. (Series in Machine Perception and Artificial Intelligence; vol 75).

Cited by

  1. Breast Cancer Risk Based on the Gail Model and its Predictors in Iranian Women vol.17, pp.8, 2015, https://doi.org/10.14456/apjcp.2016.163/apjcp.2016.17.8.3741
  2. Development of a Cancer Risk Prediction Tool for Use in the UK Primary Care and Community Settings vol.10, pp.7, 2015, https://doi.org/10.1158/1940-6207.capr-16-0288
  3. Review of non-clinical risk models to aid prevention of breast cancer vol.29, pp.10, 2015, https://doi.org/10.1007/s10552-018-1072-6
  4. The Usefulness of Bayesian Network in Assessing the Risk of Triple-Negative Breast Cancer vol.27, pp.12, 2020, https://doi.org/10.1016/j.acra.2019.12.023