DOI QR코드

DOI QR Code

miRNA Pattern Discovery from Sequence Alignment

  • Sun, Xiaohan (School of Computer Science and Technology, Xidian University) ;
  • Zhang, Junying (School of Computer Science and Technology, Xidian University)
  • Received : 2017.01.06
  • Accepted : 2017.06.21
  • Published : 2017.12.31

Abstract

MiRNA is a biological short sequence, which plays a crucial role in almost all important biological process. MiRNA patterns are common sequence segments of multiple mature miRNA sequences, and they are of significance in identifying miRNAs due to the functional implication in miRNA patterns. In the proposed approach, the primary miRNA patterns are produced from sequence alignment, and they are then cut into short segment miRNA patterns. From the segment miRNA patterns, the candidate miRNA patterns are selected based on estimated probability, and from which, the potential miRNA patterns are further selected according to the classification performance between authentic and artificial miRNA sequences. Three parameters are suggested that bi-nucleotides are employed to compute the estimated probability of segment miRNA patterns, and top 1% segment miRNA patterns of length four in the order of estimated probabilities are selected as potential miRNA patterns.

Keywords

References

  1. R. M. Marin, M. Sulc, and J. Vanicek, "Searching the coding region for microRNA targets," RNA, vol. 19, no. 4, pp. 467-474, 2013. https://doi.org/10.1261/rna.035634.112
  2. S. T. Kalinowski, T. M. Andrews, M. J. Leonard, and M. Snodgrass, "Are Africans, Europeans, and Asians different 'races'? A guided-inquiry lab for introducing undergraduate students to genetic diversity and preparing them to study natural selection," CBE Life Sciences Education, vol. 11, no. 2, pp. 142-151, 2012. https://doi.org/10.1187/cbe.11-09-0087
  3. B. Liu, J. Li, and M. J. Cairns, "Identifying miRNAs, targets and functions," Briefings in Bioinformatics, vol. 15, no. 1, pp. 1-19, 2014. https://doi.org/10.1093/bib/bbs075
  4. I. Bentwich, "Prediction and validation of microRNAs and their targets," FEBS Letters, vol. 579, no. 26, pp. 5904-5910, 2005. https://doi.org/10.1016/j.febslet.2005.09.040
  5. M. R. Friedlander, W. Chen, C. Adamidi, J. Maaskola, R. Einspanier, S. Knespel, and N. Rajewsky, "Discovering microRNAs from deep sequencing data using miRDeep," Nature Biotechnology, vol. 26, no. 4, pp. 407-415, 2008. https://doi.org/10.1038/nbt1394
  6. V. Williamson, A. Kim, B. Xie, G. O. McMichael, Y. Gao, and V. Vladimirov, "Detecting miRNAs in deep-sequencing data: a software performance comparison and evaluation," Briefings in Bioinformatics, vol. 14, no. 1, pp. 36-45, 2013. https://doi.org/10.1093/bib/bbs010
  7. M. R. Friedlander, S. D. Mackowiak, N. Li, W. Chen, and N. Rajewsky, "miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades," Nucleic Acids Research, vol. 40, no. 1, pp. 37-52, 2012. https://doi.org/10.1093/nar/gkr688
  8. W. Shen, M. Chen, G. Wei, and Y. Li, "MicroRNA prediction using a fixed-order Markov model based on the secondary structure pattern," PLoS One, vol. 7, no. 10, article no. e48236, 2012.
  9. X. Ji, J. Bailey, and G. Dong, "Mining minimal distinguishing subsequence patterns with gap constraints," Knowledge and Information Systems, vol. 11, no. 3, pp. 259-286, 2007. https://doi.org/10.1007/s10115-006-0038-2
  10. G. Dong and J. Bailey, Contrast Data Mining: Concepts, Algorithms, and Applications. Boca Raton, FL: CRC Press, 2013.
  11. Y. Saeys, I. Inza, and P. Larranaga, "A review of feature selection techniques in bioinformatics," Bioinformatics, vol. 23, no. 19, pp. 2507-2517, 2007. https://doi.org/10.1093/bioinformatics/btm344
  12. R. C. de Amorim, "Computational methods of feature selection," Information Processing & Management, vol. 45, no. 4, pp. 490-493, 2009. https://doi.org/10.1016/j.ipm.2009.03.003
  13. G. Nuel, L. Regad, J. Martin, and A. C. Camproux, "Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data," Algorithms for Molecular Biology, vol. 5, article no. 15, 2010.
  14. R. Jackups and J. Liang, "Combinatorial analysis for sequence and spatial motif discovery in short sequence fragments," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 3, pp. 524-536, 2010. https://doi.org/10.1109/TCBB.2008.101
  15. H. Zheng, H. Wang, and F. Azuaje, "Improving pattern discovery and visualization of SAGE data through poisson-based self-adaptive neural networks," IEEE Transactions on Information Technology in Biomedicine, vol. 12, no. 4, pp. 459-469, 2008. https://doi.org/10.1109/TITB.2007.901208
  16. O. Westesson, L. Barquist, and I. Holmes, "HandAlign: bayesian multiple sequence alignment, phylogeny and ancestral reconstruction," Bioinformatics, vol. 28, no. 8, pp. 1170-1171, 2012. https://doi.org/10.1093/bioinformatics/bts058
  17. A. Kawrykow, G. Roumanis, A. Kam, D. Kwak, C. Leung, C. Wu, et al., "Phylo: a citizen science approach for improving multiple sequence alignment," PLoS One, vol. 7, no. 3, article no. e31362, 2012.
  18. E. Pruesse, J. Peplies, and F. O. Glockner, "SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes," Bioinformatics, vol. 28, pp. 1823-1829, 2012. https://doi.org/10.1093/bioinformatics/bts252
  19. K. C. Miranda, T. Huynh, Y. Tay, Y. S. Ang, W. L. Tam, A. M. Thomson, B. Lim, and I. Rigoutsos, "A pattern-based method for the identification of microRNA binding sites and their corresponding heteroduplexes," Cell, vol. 126, no. 6, pp. 1203-1217, 2006. https://doi.org/10.1016/j.cell.2006.07.031
  20. M. Hafner, P. Landgraf, J. Ludwig, A. Rice, T. Ojo, C. Lin, D. Holoch, C. Lim, and T. Tuschl, "Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing," Methods, vol. 44, no. 1, pp. 3-12, 2008. https://doi.org/10.1016/j.ymeth.2007.09.009
  21. N. Lavrac, P. Flach, and B. Zupan, "Rule evaluation measures: a unifying view," in Proceedings of 9th International Workshop on Inductive Logic Programming (ILP-99), Bled, Slovenia, 1999, pp. 174-185.
  22. L. Geng and H. J. Hamilton, "Interestingness measures for data mining: a survey," ACM Computing Surveys (CSUR), vol. 38, no. 3, article no. 9, 2006.
  23. A. K. C. Wong, D. Zhuang, G. C. L. Li, and E. S. A. Lee, "Discovery of non-induced patterns from sequences," in Proceedings of 5th IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB 2010), Nijmegen, The Netherlands, 2010, pp. 149-160.