An Analysis of Ortholog Clusters Detected from Multiple Genomes

다종의 유전체로부터 탐지된 Ortholog 군집에 대한 분석

  • Published : 2008.04.15

Abstract

It is very useful to predict orthologs for new genome annotation and research on genome evolution. We showed that the previous work can be extended to construct OCs(Ortholog Clusters) automatically from multiple complete-genomes. The proposed method also has the quality of production of InParanoid, which produces orthologs from just two genomes. On the other hand, in order to predict more exactly the function of a newly sequenced gene it can be an important issue to prevent unwanted inclusion of paralogs into the OCs. We have, here, investigated how well it is possible to construct a functionally purer OCs with score cut-offs. Our OCs were generated from the datasets of 20 procaryotes. The similarity with both COG(Clusters of Orthologous Group) and KO(Kegg Orthology) against our OCs has about 90% and inclines to increase with the growth of score cut-offs.

새로운 유전체 주석달기와 유전체 진화에 대한 연구를 위해서 올소로그(Ortholog)를 탐지하는 일은 매우 유용하다. 이전에 제안한 연구에서, 우리는 여러 종의 유전체로부터 올소로그 클러스터를 자동적으로 구축하는 방법을 제안하였다. 이 방법은 단지 두 종의 결과를 생성하는 InParanoid를 여러 종으로 확장하고 이와 동일한 질을 가진 결과를 산출한다. 한편, 새롭게 서열이 밝혀진 유전자의 기능을 보다 정확히 예측하기 위해, 패럴로그(Paralog)가 가급적 적게 포함되는 올소로그 클러스터를 구축하는 것이 중요한 문제가 될 수 있다. 이 논문에서, 우리는 임계값을 사용하여 보다 순수한 올소로그 클러스터를 구축하는 방법에 대하여 조사하였다 우리는 20개의 원핵생물의 데이타셋으로부터 올소로그 클러스터를 구축하였다. 우리의 올소로그 클러스터를 COG(Clusters of Orthologous Group) 및 KO(Kegg Orthology)와 비교하였을 매, 약 90%의 유사도를 가지며 임계간의 증가와 더불어 증가하는 경향이 있다.

Keywords

References

  1. Tatusov, R. L., Koonin, E. V., Lipman, D. J.: A genomic perspective on protein families. Science, 278(5338) (1997) 631-637 https://doi.org/10.1126/science.278.5338.631
  2. Tatusov, R. L., Galperin, M. Y., Natale, D. A., Koonin, E. V., et al.: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research, 28 (2000) 33-36 https://doi.org/10.1093/nar/28.1.33
  3. Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Aviva R Jacobs, A. R. et al.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4, (2003) 41 https://doi.org/10.1186/1471-2105-4-41
  4. Remm, M., Storm, C. E., Sonnhammer, E. L.: Automatic Clustering of Orthologs and in-paralogs from Pairwise Species Comparisons. J. Mol. Biol., 314 (2001) 1041-1052 https://doi.org/10.1006/jmbi.2000.5197
  5. Li, L. et al. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 13(2003), 2178-89 https://doi.org/10.1101/gr.1224503
  6. Kim, S., Jung, K. S., Ryu, K. H.: Automatic Orthologous-Protein-Clustering from Multiple Complete-Genomes by the Best Reciprocal BLAST Hits. In Proc. of PAKDD 2006 Workshop, BioDM 2006, 3916(2006) 60-70
  7. Fitch, W. M.: Distinguishing homologous from analogous proteins. Syst. Zool., 19 (1970) 99-113 https://doi.org/10.2307/2412448
  8. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. et al.: Basic local alignment search tool. J. Mol. Biol., 215 (1990) 403-410 https://doi.org/10.1016/S0022-2836(05)80360-2
  9. Chervitz, S. A., Aravind, L., Sherlock, G., Ball C. A. et al.: Comparison of the complete protein set of worm and yeast:orthology and divergence. Science, 282 (1998) 2022-2028 https://doi.org/10.1126/science.282.5396.2022
  10. Rubin, G. M., Yandell, M. D., Wortman, J. R., Gabor Miklos, G. L. et al.: Comparative genomics of the eukaryotes. Science, 287 (2000) 2204-2215 https://doi.org/10.1126/science.287.5461.2204
  11. Wheelan, S. J., Boguski, M. S., Duret, L., Makalowski, W.: Human and nematode orthologs - lessons from the analysis of 1800 human genes and the proteome of Caenorhabditis elegans. Gene, 238 (1999) 163-170 https://doi.org/10.1016/S0378-1119(99)00298-X
  12. Mushegian, A. R., Garey, J. R., Martin, J., Liu, L. X.: Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins enclosed by the human, fly, nematode, and yeast genomes. Genome Res., 8 (1998) 590-598 https://doi.org/10.1101/gr.8.6.590
  13. Kanehisa M., Peer B.: Bioinformatics in the post-sequences era. nature genetics supplement, 33 (2003) 305-310 https://doi.org/10.1038/ng1109
  14. Bork P., Koonin E. V.: Predicting functions from protein sequence-where are the bottlenecks?. Nat. Genet., 18 (1998) 313-318 https://doi.org/10.1038/ng0498-313
  15. Eisen J. A.: Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res., 8 (1998) 163-167 https://doi.org/10.1101/gr.8.3.163
  16. Galperin M. Y., Koonin E. V.: Source of systematic error in functional annotation of genomes: domain rearrangement, nonorthologous gene displacement and operon disruption. In Silico Biol., 1 (1998) 55-67
  17. Kimmen S.: Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics, 20 (2004) 170-179 https://doi.org/10.1093/bioinformatics/bth021
  18. Bono, H., Goto, S., Fujibuchi, W., Ogata, H. et al.: Systematic Prediction of Orthologous Units of Genes in the Complete Genomes. Genome Inform Ser Workshop Genome Inform., 9 (1998) 32-40
  19. Dongen V. (2000), http://micans.org/mcl/