Genomic Signals Compression by Compressed Sensing and Its Application in Sequences Comparison

Authors

Electrical Engineering Department, Faculty of Engineering, University of Zanjan, Zanjan, Iran

Abstract

The analysis of gene sequences is fundamentally important for exploring biological functions. Recently, the cost of gene sequencing has dropped sharply, thereby resulting in the production of considerable genomic data. However, the costs of saving, processing, and transferring these data are rising. At present, processing this massive volume of information is done by character based method which is highly time - consuming. Alternative methods challenge these problems in the realm of signal processing. Accordingly, the signal outlook to the genome, signal processing of the genome and compression of the genome are presently hot issues which are practically in demand. Compression reduces the cost, memory space, bandwidth for exchange, and the time required for analysis.
In this study, the character genes were firstly represented as signals. Then, these genomic signals were compressed by compressed sensing. Consequently, they were reconstructed by bayesian learning method. Adopted criteria for reconstruction were PRD and NMSE, respectively. Then, signals were selected with a compression rate of 75% for comparison. Meanwhile, the same cluster analysis was run with character based method. The results indicated that the time needed for signal based method was considerably lower than the character based method.

Keywords


[1] B. S. Jeong, A. T. M. G. Bari, M. R. Reaz, S. Jeon, C. G. Lim, and H. J. Choi, “Codon-based encoding for DNA sequence analysis,” Methods, vol. 67, no. 3, pp. 373–379, 2014.
[2] P. Hanus, J. Dingel, G. Chalkidis, and J. Hagenauer, “Compression of whole genome alignments,” IEEE Trans. Inf. Theory, vol. 56, no. 2, pp. 696–705, 2010.
[3] GenBank Growth Statistics, http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
[4] S.Wandelt, M. Bux, and U. Leser, “Trends in genome compression,” Curr. Bioinform., vol. 9, no. 3, pp. 315–326, 2014.
[5] S. Arniker and H. K. Kwan, “Numerical representation of DNA sequences,” IEEE Int. Con. on Electrro/Information Technology,pp. 307–310, 2009.
[6] K. Sedlar, H. Skutkova, M. Vitek, and I. Provaznik, “Set of rules for genomic signal downsampling,” Comput. Biol. Med., vol. 69, pp. 308–314, 2016.
[7] K. Sedlar, H. Skutkova, M. Vitek, and I. Provaznik, “Prokaryotic DNA signal downsampling for fast whole genome comparison,” Inf. Technol. Biomed., vol. 3, pp. 373–383, 2014.
[8] http://www.ncbi.nlm.nih.gov/genbank/
[9] G. Mendizabal-Ruiz, I. Román-Godínez, S. Torres-Ramos, R. A. Salido-Ruiz, and J. A. Morales, “On DNA numerical representations for genomic similarity computation,” PLoS One, vol. 12, no. 3, pp. 1–27, 2017.
[10] T. Hoang, Ch. Yin, and S. Yau, “Numerical encoding of DNA sequences by chaos game representation with application in similarity comparision,” Genomics, vol. 108, no. 3-4, pp. 134–142, 2016.
[11] D. Anastassiou, “Genomic signal processing,” Signal Processing Magazine,vol. 18, pp. 8–20, 2001.
[12] هادی شکری و محمدحسین کهایی، «حسگری فشرده تصاویر ابرطیفی با دسته‌بندی طیفی و بازسازی با تنظیم‌کننده تغییرات کلی طیفی-مکانی»، مجله مهندسی برق دانشگاه تبریز، دوره 47، شماره 4، صفحه 1513-1521، زمستان 1396.
[13] Z. Zhang and B. D. Rao, “Extension of SBL algorithms for the recovery of block sparse signals with intra-block correlation,” IEEE Trans. Signal Process., vol. 61, no. 8, pp. 2009–2015, 2013.
[14] Z. Zhilin and B. D. Rao, “Sparse signal recovery with temporally correlated source vectors using sparse bayesian learning,” IEEE J. Selected Topics in Signal Process., vol. 5, no. 5, pp. 912–926, 2011.
[15] Z. Zhang, Sparse signal recovery exploiting spatiotemporal correlation, Ph.D. Thesis, University of California, San Diego, 2012.
[16] Z. Zhang, T. Jung and S. Makeig, “Compressed sensing for energy-efficient wireless telemonitoring of noninvasive fetal ECG via block sparse bayesian learning,” IEEE Trans. on Biomedical Engineering, vol. 60, no. 2, pp. 300–309, 2013.
[17] محمدمهدی محدث و محمدحسین کهایی، «ساخت ماتریس‌های نمونه‌برداری یقینی براساس توابع هش»، مجله مهندسی برق دانشگاه تبریز، دوره 46، شماره 4، صفحه 307-315، زمستان 1395.
[18]  رمضانی, مجیدی, مهدی‌زاده, ملکی و پوربرخورداری, اصول بیوانفورماتیک، زنجان، جامعه نگر, 1393.