پیدا کردن موتیف در نواحی بالادست ژن های هم بیان بر اساس الگوریتم بهینه سازی فاخته و سرمایش تدریجی

نویسندگان

دانشگاه صنعتی امیرکبیر

چکیده

چکیده: در این مقاله برای حل مسئله­ کشف موتیف یک روش ترکیبی جدید بر اساس الگوریتم بهینه­سازی فاخته، روش سرمایش تدریجی و بیشینه­سازی زمان انتظار به­نام SA-COAMF ارائه می­شود. این روش ترکیبی در همگرایی بهینه سراسری بسیار کارآمد است. یکی دیگر از ویـژگـی­­هـای شـاخـص ایـن الـگوریتـم، بهـره بـردن از هـر دو مـدل نمـایـش مـوتیـف (تـوالـی اجمـاع و مـاتـریـس احتمـالاتی) اسـت. عملکرد الگوریتم پیشنهادی بر روی یـک مجموعه از داده­هـای زیستی (پـایگـاه داده SCPD) تسـت شـده و بـا تعـدادی از الگوریتم­هـای معـروف کشف مـوتیـف (GA-DPAF، PSO+ و MEME) مقایسه می­گردد. نتایج به­دست­آمده نشان­دهنده توانایی بالای الگوریتم پیشنهادی است.

کلیدواژه‌ها


عنوان مقاله [English]

Motif Finding in upstream regulatory regions of co-expressed genes using Cuckoo optimization Algorithm and Simulated Annealing

چکیده [English]

Abstract: In this paper, a novel hybrid algorithm is represented named SA-COAMF by using cuckoo optimization algorithm, simulated annealing and expected maximization for motif finding problem. SA-COAMF is very efficient in global optimal convergence. In the algorithm, two models of motif representation, consensus and probability matrix representations, are applied to take the advantage of them. SA-COAMF is run on experimental datasets (SCPD database). The results are compared with some well-known algorithms (GA-DPAF, PSO+ and MEME) to show that our algorithm is efficient.

کلیدواژه‌ها [English]

  • Keywords: Cuckoo optimization algorithm
  • simulated annealing algorithm
  • coexpressed genes
  • motif finding
  • expected maximization
[1] B. Brejová, C. DiMarco, T. Vinar, S. R. Hidalgo, G. Holguin, and C. Patten, Finding patterns in biological sequences, Unpublished project report for CS798G, University of Waterloo, Fall, 2000.
[3] L. Shao, Y. Chen and A. Abraham, "Motif discovery using evolutionary algorithms," in Soft Computing and Pattern Recognition, (SOCPAR'09), pp. 420-425, 2009.
[4] C. Reyes-Rico, "Finding DNA motifs using genetic algorithms," in Fifth Mexican International Conference on Artificial Intelligence, pp. 331-339, 2006.
[5] C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald and J. C. Wootton, "Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment," Science, vol. 262, no. 5131, pp. 208-214, 1993.
[6] J. van Helden, B. André and J. Collado-Vides, "Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies," Journal of molecular biology, vol. 281, no. 4, pp. 827-842, 1998.
[7] M. Tompa, "An exact method for finding short motifs in sequences, with application to the ribosome binding site problem," ISMB, pp. 262-271, 1999.
[8] T. F. Smith and M. S. Waterman, "Identification of common molecular subsequences," Journal of molecular biology, vol. 147, no. 1, pp. 195-197, 1981.
[9] J. Kevin Lanctot, M. Li, B. Ma, S. Wang and L. Zhang, "Distinguishing string selection problems," Information and Computation, vol. 185, no. 1, pp. 41-55, 2003.
[10] G. Pavesi, G. Mauri and G. Pesole, "An algorithm for finding signals of unknown length in DNA sequences," Bioinformatics, vol. 17, no. 1, pp. S207-S214, 2001.
[11] S. Sinha and M. Tompa, "YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation," Nucleic acids research, vol. 31, no. 13, pp. 3586-3588, 2003.
[12] U. Keich and P. A. Pevzner, "Finding motifs in the twilight zone," in Proceedings ofthe sixth annual international conference on Computational biology, pp. 195-204, 2002.
[13] J. Buhler and M. Tompa, "Finding motifs using random projections," Journal of computational biology, vol. 9, no. 2, pp. 225-242, 2002.
[14] C. E. Lawrence and A. A. Reilly, "An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences," Proteins: Structure, Function, and Bioinformatics, vol. 7, no. 1, pp. 41-51, 1990.
[15] T. L. Bailey and C. Elkan, "Unsupervised learning of multiple motifs in biopolymers using expectation maximization," Machine learning, vol. 21, no. 1, pp. 51-80, 1995.
[16] J. D. Hughes, P. W. Estep, S. Tavazoie and G. M. Church, "Computational identification of Cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae," Journal of Molecular Biology, vol. 296, no. 5, pp. 1205-1214, 2000.
[17] X. Liu, D. L. Brutlag and J. S. Liu, "BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes," in Pacific symposium on biocomputing, 2001, pp. 127-138.
[18] L. Shao and Y. Chen, "Bacterial foraging optimization algorithm integrating tabu search for motif discovery," in Bioinformatics and Biomedicine, pp. 415-418, 2009.
[19] D. L. González-Álvarez, M. A. Vega-Rodríguez, J. A. Gómez-Pulido and J. M. Sánchez-Pérez, "Finding motifs in DNA sequences applying a multiobjective artificial bee colony (MOABC) algorithm," in Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, pp. 89-100, 2011.
[20] S. Bouamama, A. Boukerram and A. F. Al-Badarneh, "Motif finding using ant colony optimization," in Swarm Intelligence, pp. 464-471, 2010.
[21] C. Lei and J. Ruan, "A particle swarm optimization-based algorithm for finding gapped motifs," BioData mining, vol. 3, no. 9, pp. 1-12, 2010.
[22] F. Zare-Mirakabad, H. Ahrabian, M. Sadeghi, J. Mohammadzadeh, S. Hashemifar, A. Nowzari-Dalini, et al., "PSOMF: An algorithm for pattern discovery using PSO," in Proceedings of the Third IAPR International Conferences on Pattern Recognition in Bioinformatics, pp. 61-72, 2008.
[23] J. M. Keith, P. Adams, D. Bryant, D. P. Kroese, K. R. Mitchelson, D. A. Cochran, et al., "A simulated annealing algorithm for finding consensus sequences," Bioinformatics, vol. 18, no. 11, pp. 1494-1499, 2002.
[24] T. L. Bailey and C. Elkan, "The value of prior knowledge in discovering motifs with MEME," ISMB, pp. 21-29, 1995.
[25] F. Zare-Mirakabad, H. Ahrabian, M. Sadeghi, S. Hashemifar, A. Nowzari-Dalini and B. Goliaei, "Genetic algorithm for dyad pattern finding in DNA sequences," Genes & Genetic Systems, vol. 84, no. 1, pp. 81-93, 2009.
[26] W. Liu, H. Chen and L. Chen, "An ant colony optimization based algorithm for identifying gene regulatory elements," Computers in Biology and Medicine, vol. 43, no. 7, pp. 922-932, 2013.
[27] R. Rajabioun, "Cuckoo optimization algorithm," Applied Soft Computing, vol. 11, no. 8, pp. 5508-5518, 2011.
[28] M. Sarkar, B. Yegnanarayana and D. Khemani, "A clustering algorithm using an evolutionary programming-based approach," Pattern Recognition Letters, vol. 18, no. 10, pp. 975-986, 1997.
[29] J. Zhu and M. Q. Zhang, "SCPD: a promoter database of the yeast Saccharomyces cerevisiae," Bioinformatics, vol. 15, no. 7-8, pp. 607-611, 1999.
[30] S. Sinha and M. Tompa, "Performance comparison of algorithms for finding transcription factor binding sites," in Bioinformatics and Bioengineering, pp. 214-220, 2003.