DOI QR코드

DOI QR Code

Cancer cluster detection using scan statistic

스캔 통계량을 이용한 암 클러스터 탐색

  • Han, Junhee (Division of Biostatistics, Pusan National University Yangsan Hospital) ;
  • Lee, Minjung (Department of Statistics, Kangwon National University)
  • 한준희 (양산부산대학교병원 의학통계실) ;
  • 이민정 (강원대학교 정보통계학과)
  • Received : 2016.08.28
  • Accepted : 2016.09.21
  • Published : 2016.09.30

Abstract

In epidemiology or etiology, we are often interested in identifying areas of elevated risk, so called, hot spot or cluster. Many existing clustering methods only tend to a result if there exists any clustering pattern in study area. Recently, however, lots of newly introduced clustering methods can identify the location, size, and shape of clusters and test if the clusters are statistically significant as well. In this paper, one of most commonly used clustering methods, scan statistic, and its implementation SaTScan software, which is freely available, will be introduced. To exemplify the usage of SaTScan software, we used cancer data from the SEER program of National Cancer Institute of U.S.A.We aimed to help researchers and practitioners, who are interested in spatial cluster detection, using female lung cancer mortality data of the SEER program.

공간 또는 시공간 데이터에서 다른 지역에 비해 유난히 높은 위험률을 보이는 소위 핫 스팟 (hot spot)으로 불리는 클러스터 (cluster)를 찾으려고 하는 경우가 많다. 기존의 많은 방법들은 이러한 클러스터 패턴이 존재하는지에 대한 해답만 주었지만, 최근의 많은 방법들은 클러스터의 위치, 모양, 크기뿐만 아니라 찾아진 클러스터가 통계적으로 유의한지까지 검정해준다. 본 논문에서는 이러한 다양한 방법 중 가장 많이 사용되는 클러스터 탐색 방법 중 하나인 스캔 통계량을 이용한 방법을 소개하고 그 방법이 구현된 무료 소프트웨어 SaTScan을 이용한 결과를 보여주고 장단점을 논하고자 한다. 미국 국립암센터의 SEER 프로그램에서 제공하는 미국의 각 카운티별 암 사망자 자료 중 2006년 여성 폐암 사망자 데이터를 예시 데이터로 사용하여 스캔 통계량을 이용하여 구한 클러스터 탐색 결과를 제시하고 비슷한 연구를 하고자는 연구자에게 도움을 주고자 한다.

Keywords

References

  1. Ahn, D. S., Han, J. H., Yoon, T. H., Kim, C. H. and Noh, M. S. (2015). Small area estimations for disease mapping by using spatial model. Journal of the Korean Data & Information Science Society, 26, 101-109. https://doi.org/10.7465/jkdi.2015.26.1.101
  2. Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2014). Hierarchical modeling and analysis for spatial data, CRC Press, New York.
  3. Chandra, H., Salvati, N. and Chambers, R. (2007). Small area estimation for spatially correlated populations-a comparison of direct and indirect model-based methods. Statistics in Transition, 8, 887-906.
  4. Coly, S., Charras-Garrido, M., Abrial, D. and Yao-Lafourcade, A. (2015). Spatiotemporal disease mapping applied to infectious diseases. Procedia Environmental Sciences, 26, 32-37. https://doi.org/10.1016/j.proenv.2015.05.019
  5. Geary, R. C. (1954). The contiguity ratio and statistical mapping. The Incorporated Statistician, 5, 115-145. https://doi.org/10.2307/2986645
  6. Ghosh, M. and Rao, J. (1994). Small area estimation: An appraisal. Statistical Science, 9, 55-76. https://doi.org/10.1214/ss/1177010647
  7. Han, J., Zhu L, Kulldorff, M., Hostovich, S., Stinchcomb, D., Tatalovich, Z., Lewis D. and Feuer, E. (2016). Using Gini coefficient to determining optimal cluster reporting sizes for spatial scan statistics. International Journal of Health Geographics, 15-27.
  8. Huang, L., Kulldorff, M. and Gregorio, D. (2007). A spatial scan statistic for survival data. Biometrics, 63, 109-118. https://doi.org/10.1111/j.1541-0420.2006.00661.x
  9. Huang, L., Tiwari, R. C., Zhaohui, Z., Kulldorff, M. and Feuer, E. J. (2009). Weighted normal spatial scan statistic for heterogeneous population data. Journal of the American Statistical Association, 104, 886-898. https://doi.org/10.1198/jasa.2009.ap07613
  10. Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26, 1487-1496.
  11. Kulldorff, M. (2016). SaTScan user guide v9.4.4, http://www.satscan.org/.
  12. Kulldorff, M., Huang, L., Pickle, L. and Duczmal, L. (2006). An elliptic spatial scan statistic. Statistics in Medicine, 25, 3929-3943. https://doi.org/10.1002/sim.2490
  13. Lawson, A. B. (2013). Bayesian disease mapping: Hierarchical modeling in spatial epidemiology, 2nd Ed., Chapman and Hall/CRC, New York.
  14. Lee, W. and Park, C. (2015). Prediction of apartment prices per unit in Daegu-Gyeongbuk areas by spatial regression models. Journal of the Korean Data & Information Science Society, 26, 561-568. https://doi.org/10.7465/jkdi.2015.26.3.561
  15. Moran, P. A. (1950). Notes on continuous stochastic phenomena. Biometrika, 37, 17-23. https://doi.org/10.1093/biomet/37.1-2.17
  16. NCI. (2016). Surveillance, Epidemiology, and End Results (SEER) Program, www.seer.cancer.org.
  17. Patil, G. and Taillie, C. (2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environmental and Ecological Statistics, 11, 183-197. https://doi.org/10.1023/B:EEST.0000027208.48919.7e
  18. Pfeffermann, D. (2002). Small area estimation: New developments and directions. International Statistical Review/Revue Internationale De Statistique, 70, 125-143.
  19. Tango, T. and Takahashi, K. (2005). A flexibly shaped spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4-11.
  20. Waller, L. A. and Jacquez, G. M. (1995) Disease models implicit in statistical tests of disease clustering. Epidemiology, 6, 584-590. https://doi.org/10.1097/00001648-199511000-00004
  21. Wheeler, D. C. (2007). A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996-2003, International Journal of Health Geographics. 6-13.

Cited by

  1. 공간이웃정보를 고려한 공간회귀분석 vol.28, pp.3, 2016, https://doi.org/10.7465/jkdi.2017.28.3.505
  2. 공간정보기반 클러스터링을 이용한 초고속인터넷 결합유형별 해지의 지역별 특성연구 vol.23, pp.3, 2017, https://doi.org/10.13088/jiis.2017.23.3.045
  3. 벌칙가능도함수를 이용한 1인가구와 저소득 독거노인의 공간군집 탐색 vol.28, pp.6, 2017, https://doi.org/10.7465/jkdi.2017.28.6.1257
  4. Cluster of Parasite Infections by the Spatial Scan Analysis in Korea vol.58, pp.6, 2020, https://doi.org/10.3347/kjp.2020.58.6.603
  5. Evaluating the spatial and temporal patterns of the severe fever thrombocytopenia syndrome in Republic of Korea vol.16, pp.2, 2021, https://doi.org/10.4081/gh.2021.994