DOI QR코드

DOI QR Code

Comparison and analysis of multiple testing methods for microarray gene expression data

유전자 발현 데이터에 대한 다중검정법 비교 및 분석

  • Seo, Sumin (Department of Information and Statistics, Duksung Women's University) ;
  • Kim, Tae Houn (Department of PrePharmMed, Duksung Women's University) ;
  • Kim, Jaehee (Department of Information and Statistics, Duksung Women's University)
  • 서수민 (덕성여자대학교 정보통계학과) ;
  • 김태훈 (덕성여자대학교 PrePharmMed 학과) ;
  • 김재희 (덕성여자대학교 정보통계학과)
  • Received : 2014.06.03
  • Accepted : 2014.07.11
  • Published : 2014.09.30

Abstract

When thousands of hypotheses are tested simultaneously, the probability of rejecting any true hypotheses increases, and large multiplicity problems are generated. To solve these problems, researchers have proposed different approaches to multiple testing methods, considering family-wise error rate (FWER), false discovery rate (FDR) or false nondiscovery rate (FNR) as a type I error and some test statistics. In this article, we discuss Bonferroni (1960), Holm (1979), Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001) procedures based on T statistics, modified T statistics or local-pooled-error (LPE) statistics. We also consider Sun and Cai (2007) procedure based on Z statistics. These procedures are compared in the simulation and applied to Arabidopsis microarray gene expression data to identify differentially expressed genes.

동시에 여러 개의 가설검정 수행시 귀무가설이 참일 경우 귀무가설을 기각할 확률이 커지는 문제가 발생한다. 이러한 다중검정 문제 해결을 위해 여러 연구에서는 가설검정시 필요한 집단별 오류율(FWER; family-wise error rate), 위발견율 (FDR; false discovery rate) 또는 위비발견율 (FNR; false nondiscovery rate) 과 통계량을 고려하여 검정력을 높이고자 하였다. 본 연구에서는 T 통계량, 수정된 T 통계량, 그리고 LPE (local pooled error) 통계량 기반 P값을 이용한 Bonferroni (1960) 방법, Holm (1979) 방법, Benjamini와 Hochberg (1995) 방법과 Benjamini와 Yekutieli (2001) 방법 그리고 Z 통계량 기반 Sun과 Cai (2007) 방법을 고찰하고 모의실험을 통해 다중검정 능력을 비교하였다. 또한 실제 데이터로 애기장대 유전자 발현 데이터에 대해 여러 가지 다중검정법을 통해 유의한 유전자들을 선별하였다.

Keywords

References

  1. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S. Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of Amarica, 96, 6745-6750. https://doi.org/10.1073/pnas.96.12.6745
  2. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57, 289-300.
  3. Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165-1188. https://doi.org/10.1214/aos/1013699998
  4. Boldrick, J. C., Alizadeh, A. A., Diehn, M., Dudoit, S., Liu, C. L., Belcher, C. E., Botstein, D., Staudt, L. M., Brown, P. O. and Relman, D. A. (2002). Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proceedings of the National Academy of Sciences of the United States of Amarica, 99, 972-977. https://doi.org/10.1073/pnas.231625398
  5. Dudoit, S., Shaffer, J. P. and Boldrick J. C. (2003). Multiple hypothesis testing in microarray experiments. Statistical Science, 18, 71-103. https://doi.org/10.1214/ss/1056397487
  6. Efron, B. (2010). Large-scale inference, Cambridge University Press, London.
  7. Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment, Journal of the American Statistical Association, 96, 1151-1160. https://doi.org/10.1198/016214501753382129
  8. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H. Loh, M. L., Downing, J. R., Caligiuri, M. A.,Bloomfield, C. D. and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531-537. https://doi.org/10.1126/science.286.5439.531
  9. Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65-70.
  10. Jain, N., Thatte, J., Braciale, T., Ley, K., O'Connell, M. and Lee, J. K. (2003). Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics, 19, 1945-1951. https://doi.org/10.1093/bioinformatics/btg264
  11. Jang, W. (2013). Multiple testing and its applications in high-dimension. Journal of the Korean Data and Information Science Society, 24, 1063-1076. https://doi.org/10.7465/jkdi.2013.24.5.1063
  12. Kim, T. H., Hauser, F., Ha, T., Xue, S., Bohmer, M., Nishimura, N., Munemasa, S., Hubbard, K., Peine, N., Lee, B. H., Lee, S., Robert, N., Parker, J. E. and Schroeder, J. I. (2011). Chemical genetics reveals negative regulation of abscisic acid signaling by a plant immune response pathway. Current Biology, 21, 990-997 https://doi.org/10.1016/j.cub.2011.04.045
  13. Kim, T. H., Kunz, H. H., Bhattacharjee, S., Hauser, F., Park, J., Engineer, C., Liu, A., Ha, T., Parker, J. E., Gassmann, W. and Schroeder, J. I. (2012). Natural variation in small molecule-induced TIRNB-LRR signaling induces root growth arrest via EDS1- and PAD4-complexed R protein VICTR in Arabidopsis. Plant Cell, 24, 5177-5192 https://doi.org/10.1105/tpc.112.107235
  14. Sun, W. and Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association, 102, 901-912. https://doi.org/10.1198/016214507000000545
  15. Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceeding of the National Academy of Sciences, 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
  16. Xuan, W., Murphy, E., Beeckman, T., Audenaert, D. and Smet, I. D. (2013). Synthetic molecules: Helping to unravel plant signal transduction. Journal of Chemical Biology, 6, 43-50. https://doi.org/10.1007/s12154-013-0091-8

Cited by

  1. Validation of diacylglycerol O-acyltransferase1 gene effect on milk yield using Bayesian regression vol.26, pp.6, 2015, https://doi.org/10.7465/jkdi.2015.26.6.1249
  2. Estimation of Gini-Simpson index for SNP data vol.28, pp.6, 2014, https://doi.org/10.7465/jkdi.2017.28.6.1557