Graph-based Clustering using the Wilcoxon Test to Extract the Biological Communication of Cells and Tissues

Document Type : Original Article

Authors

1 Department of Computer Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran

2 Department of Computer Engineering, Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran,

3 Department of Electrical Engineering, Yasooj Branch, Islamic Azad University, Yasooj, Iran

Abstract

Finding a graph-based clustering is an applied method for detecting the relationship between nodes in complex networks, which has attracted considerable attention. Since recognizing different communities in large-scale data is a challenging task, by understanding the relationship between the behavior of elements in a society (cluster), we can predict the general characteristics of the clusters. Graph-based clustering techniques have played an important role in the clustering of gene expression data due to their ability to show the relationship between data. In order to detect effective genes in the development of diseases, it is necessary to achieve the relationship between cells or tissues. The interaction between cells or different tissues can be demonstrated by expressing different genes between them. In this research, the problem of cell-to-cell and tissue-to-cell communication is expressed as a graph and is extracted by the recognition of relationships. The Phantom 5 database is used to simulate and calculate the similarity between cells and tissues. After preprocessing and normalizing the data, for the conversion of these data to the graph, the expression of the gene in different cells and tissues has been examined and considering the threshold and the Wilcoxon test, using clustering of communications They were identified.

Keywords


[1]   C. Pizzuti and S. E. Rombo, Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods, Bioinformatics. 2014.
[2]   P. De Meo, E. Ferrara, G. Fiumara, and A. Provetti, “Mixing local and global information for community detection in large networks,” in Journal of Computer and System Sciences, 2014, vol. 80, no. 1, pp. 72–87.
[3]   J. Xie and B. K. Szymanski, Towards linear time overlapping community detection in social networks, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012.
[4]   N. Ozaki, H. Tezuka, and M. Inaba, A Simple Acceleration Method for the Louvain Algorithm, Int. J. Comput. Electr. Eng., 2016.
[5]   K. Macropol, T. Can, and A. K. Singh, RRW: Repeated random walks on genome-scale protein networks for local cluster discovery, BMC Bioinformatics, 2009.
[6]   T. Qin and K. Rohe, Regularized spectral clustering under the degree-corrected stochastic blockmodel, Adv. Neural Inf. Process. Syst., 2013.
[7]   A. Lancichinetti, S. Fortunato, and J. Kertész, Detecting the overlapping and hierarchical community structure in complex networks, New J. Phys., 2009.
[8]   S. Bahadori and P. Moradi, A local Random Walk method for identifying communities in social networks, in 7th Conference on Artificial Intelligence and Robotics, IRANOPEN 2017, 2017.
[9]   G. J. McLachlan, R. W. Bean, and D. Peel, A mixture model-based approach to the clustering of microarray expression data., Bioinformatics, 2002.
[10] M. B. Gorzałczany, F. Rudziński, and J. Piekoszewski, Gene expression data clustering using tree-like SOMs with evolving splitting-merging structures, in Proceedings of the International Joint Conference on Neural Networks, 2016.
[11] A. Csikász-Nagy, Computational systems biology of the cell cycle, Briefings in Bioinformatics. 2009.
[12] J. Hofbauer and K. Sigmund, Evolutionary game dynamics, Bulletin of the American Mathematical Society. 2003.
[13] مجید محمدپور و حمید پروین، «الگوریتم ژنتیک آشوب گونه مبتنی بر حافظه و خوشه بندی برای حل مسائل بهینه سازی پویا»، مجله مهندسی برق دانشگاه تبریز، دوره 46، شماره 3، صفحه 299-318، تبریز، پاییز 1395.
[14] سمیرا رفیعی و پرهام مرادی، «بهبود عملکرد الگوریتم خوشه‌بندی فازی سی- مینز با وزن‌دهی اتوماتیک و محلی ویژگی‌ها»، مجله مهندسی برق دانشگاه تبریز، دوره 46، شماره 2، صفحه 75-86، تبریز، تابستان 1395.
[15] D. M. Lane and A. Sándor, Designing Better Graphs by Including Distributional Information and Integrating Words, Numbers, and Images, Psychol. Methods, 2009.
[16] D. Chen, Y., Kamath, G., Suh, C., & Tse, “Community recovery in graphs with locality,” in In International Conference on Machine Learning, 2016, pp. 689–698.
[17] C. Chekuri and A. Goldberg, “Experimental study of minimum cut algorithms,” in SODA ’97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms, 1997, pp. 324–333.
[18] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, vol. 1. Springer series in statistics Springer, Berlin, 2001.
[19] R. Babers, A. E. Hassanien, and N. I. Ghali, “A nature-inspired metaheuristic Lion Optimization Algorithm for community detection,” in 2015 11th International Computer Engineering Conference: Today Information Society What’s Next?, ICENCO 2015, 2016.
[20] S. Fortunato, Community detection in graphs, Physics Reports. 2010.
[21] M. E. J. Newman, Finding community structure in networks using the eigenvectors of matrices, Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys., 2006.
[22] S. U. Rehman, A. U. Khan, and S. Fong, “Graph mining: A survey of graph mining techniques,” in Seventh International Conference on Digital Information Management (ICDIM 2012), 2012.
[23] W. Maharani and A. A. Gozali, Collaborative Social Network Analysis and Content-based Approach to Improve the Marketing Strategy of SMEs in Indonesia, in Procedia Computer Science, 2015.
[24] K. H. Li, P., Piao, Y., Shon, H. S., & Ryu, “Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data,” BMC Bioinformatics, vol. 16, no. 1, p. 347, 2015.
[25] A. Mortazavi, B. A. Williams, K. McCue, L. Schaeffer, and B. Wold, Mapping and quantifying mammalian transcriptomes by RNA-Seq, Nat. Methods, 2008.
[26] C. Trapnell et al., Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation, Nat. Biotechnol., 2010.
[27] L. C. Gandolfo and T. P. Speed, RLE plots: Visualizing unwanted variation in high dimensional data, PLoS One, 2018.
[28] A. R. R. Forrest et al., A promoter-level mammalian expression atlas, Nature, 2014.
[29] F. Wilcoxon, Individual Comparisons by Ranking Methods, Biometrics Bull., 1945.
[30] S. Aranganayagi and K. Thangavel, “Clustering categorical data using Silhouette coefficient as a relocating measure,” in Proceedings - International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 2007, 2008, vol. 2, pp. 13–17.
[31] Y. Pan, Inferring Mechanism-Based Gene Regulatory Network Models from Expression and Sequence Data, 2009.
[32] E. Côme and P. Latouche, “Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood,” Stat. Modelling, vol. 15, no. 6, pp. 564–589, 2015.
[33] J. Y. Jiang, R. J. Liou, and S. J. Lee, A fuzzy self-constructing feature clustering algorithm for text classification, IEEE Trans. Knowl. Data Eng., 2011.