A Model for Detection of brain cancer sub-types based on deep random forest and augmented features using genomic data

Document Type : Original Article

Authors

Electrical and Computer Engineering Department, Nooshirvani University of Technology, Babol, Iran

Abstract

Diagnosing the type of cancer, which is called the subtype, is very important in determining the treatment process. This paper focuses on the diagnose of the four subtypes of the brain cancer. Disease subtype diagnosis can be modeled as a classification problem. Due to the significant progress made in bioinformatics in extracting genetic information from the human body, recently this information is widely used in the representing of patients in machine learning. In this paper, three types of genetic information including mRNA, miRNA and DNA methylation are used.
It should be noted that combining different information sources in the form of multimodal data instead of using a single information source increases the accuracy of information classification. To extract more desirable features from the original genetic data, auto-encoder has been used so that the features extracted from auto-encoder are concatenated to the original genetic data.
Random forest has performed well as a classifier in classifying patients based on genetic information. By extending deep methods in neural networks and their good performance, a version of deep random forest with layered structure has been proposed. The deep random forest has the advantage that has a limited number of parameters and lower computational complexity in addition to the optimal performance in information classification. In this paper, deep random forest is used to determine the subtype of a special type of brain cancer. The experiment results show the desired performance of the proposed method.

Keywords


[۱] ب. باباعباسی, «بیوانفورماتیک سلولی و مولکولی»،  صفحه ۱-۱۶ ، ۱۳۹۵.
[2] A. Rahimi, and M. Gönen, “Discriminating early- and late-stage cancers using multiple kernel learning on gene sets”, Bioinformatics, vol. 34, no. 13, pp. i412–i421, 2018.
[3] M. Sherafatian, “Tree-based machine learning algorithms identified minimal set of miRNA biomarkers for breast cancer diagnosis and molecular subtyping”, Gene, vol. 677, pp. 111–118, 2018.
[4] W.Y. Cheng, Ch.Ch. Yang, J.H. Kao, Ch.Ch. Shen, Y.Ch. Yang, and M.H. Tsai, "An Intelligent and Prognostic machine learning model for Glioblastoma Multiforme", Research Square, 2023.
[5] P. Sanghani, "Machine Learning Based Overall Survival Prediction of Glioblastoma Multiforme Patients Using Magnetic Resonance Image Derived Features", PhD Dissertation, National University of Singapore, 2018.
[6] S. Bijari, A. Jahanbakhshi, P. Hajishafiezahramini, and P. Abdolmaleki, "Differentiating glioblastoma multiforme from brain metastases using multidimensional radiomics features derived from MRI and multiple machine learning models", BioMed Research International, vol. 2022, 2022.
[7] Y. Kim, K.H. Kim, J. Park, H.I. Yoon, and W. Sung, "Prognosis prediction for glioblastoma multiforme patients using machine learning approaches: Development of the clinically applicable model", Radiotherapy and Oncology, vol. 183, pp. 109617, 2023.
[8] Zh. Ya, L. Ao, H. Jie, and M. Wang, "A novel MKL method for GBM prognosis prediction by integrating histopathological image and multi-omics data", IEEE journal of biomedical and health informatics, vol. 24, no. 1 pp. 171-179, 2019.
[9] Z. Zhou, J. Feng, “Deep forest”, national science review, vol. 6, pp.74–86, 2018.
[10] Y.Boualleg, M. Farah, and I.R. Farah, "Remote sensing scene classification using convolutional features and deep forest classifier", IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 12, pp.1944-1948, 2019.
 [11] B.Yu, Ch. Chen, X. Wang, Z. Yu, A. Ma, and B. Liu, "Prediction of protein–protein interactions based on elastic net and deep forest", Expert Systems with Applications, vol.176, pp.114876, 2021.
[12] L.Sun, Zh. Mo, F. Yan, L. Xia, F. Shan, Zh. Ding, B. Song, W. Gao, W. Shao, F. Shi, H. Yuan, and H. Jiang, "Adaptive feature selection guided deep forest for covid-19 classification with chest ct", IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 10, pp. 2798-2805, 2020.
[13] W.Qin, D. Xu, X. Dong, X. Cui, and S. Zhang, "EEG signal classification based on improved variational mode decomposition and deep forest", Biomedical Signal Processing and Control, vol. 83, pp.104644, 2023.
[14] J. Xia, Z. Ming, and A. Iwasaki, "Multiple sources data fusion via deep forest", In IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 1722-1725. IEEE, 2018.
[15] H. Yang, R. Chen, D. Li, and Zh. Wang, "Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data", Bioinformatics, vol. 37, no. 16, pp.2231-2237, 2021.
[16] I. Bichindaritz, G. Liu, and Ch. Bartlett, "Integrative survival analysis of breast cancer with gene expression and DNA methylation data", Bioinformatics, vol. 37, no. 17 pp.2601-2608, 2021.
 [17] A. Cheerla, and O. Gevaert,"Deep learning with multimodal representation for pancancer prognosis prediction", Bioinformatics, vol. 35, no. 14, pp.i446-i454, 2019.
[18]Q. Meng, D. Catchpoole, D. Skillicom, P. J. Kennedy, “Relational autoencoder for feature extraction”, Proc. Int. Jt. Conf. Neural Networks, pp. 364–371, Proceedings of 2017 International Joint Conference on Neural Networks, 2017.
[19] W. Liu, H. Lin, L. Huang, L. Peng, T. Tang, Q. Zhao, and L. Yang, "Identification of miRNA–disease associations via deep forest ensemble learning based on autoencoder", Briefings in Bioinformatics, vol. 23, no. 3 pp. bbac104, 2022.
[20] X. Hu, Y. Zhixiang, Z. Zhiliang, and Y. Peng, "Prediction of miRNA–Disease Associations by Cascade Forest Model Based on Stacked Autoencoder", Molecules, vol. 28, no. 13, pp. 5013, 2023.
[۲۱] مرتضی جهان‌تیغ و مصطفی چرمی، «افزایش صحت طبقه بندی سیگنالهایEEG تصور حرکتی با ترکیب منطقی طبقه‌بندها و با به کارگیری الگوریتم ژنتیک و درختان تصمیم کوچک»، مجله مهندسی برق دانشگاه تبریز، جلد ۴۷ ، شماره ۳، صفحه ۹۳۱-۹۳۸، ۱۳۹۶.
[۲۲] فرنوش عارفی و علی نادیان، «تشخیص اجزای بدن انسان در تصاویر RGB-D با استفاده از ویژگی‌های الگوی تغییرات عمق و تفاضل مکانی عمق»، مجله مهندسی برق دانشگاه تبریز، جلد ۴۹ ، شماره ۴، صفحه ۱۷۵۵-۱۷۴۵، ۱۳۹۸.
[۲۳] ندا خانبانی و امیرمسعود افتخاری مقدم، «ارائه یک روش تشخیص زبان علامت مبتنی بر رویکردMLRF فازی با استفاده از اطلاعات عمق تصویر»، مجله مهندسی برق دانشگاه تبریز، جلد ۴۷ ، شماره ۳، صفحه ۹۷۸-۹۸۷، ۱۳۹۶.
[24]        Z. E. Ashari, S. L. Broschat, “T-Tree and t-Forest: Decision Tree and Random Forest Algorithms Including the Relevance Factor with Applications in Bioinformatics”, Proceedings of  2019 IEEE International Conference Bioinforma. Biomed, pp. 2779–2783, 2019.
[25]        S. Zhou, S. Wang, Q. Wu, R. Azim, W. Li, “Predicting potential miRNA-disease associations by combining gradient boosting decision tree with logistic regression”, Computational Biology and Chemistry, vol. 85, 2020.
[26]        M. Fratello, R. Tagliaferri, “Decision trees and random forests”, Encyclopedia of bioinformatics and computational biology: ABC of bioinformatics, vol. 1–3, pp. 374–383, 2018.
[27]        Z. Jagga, D. Gupta, “Classification models for clear cell renal carcinoma stage progression, based on tumor RNAseq expression trained supervised machine learning algorithms”, BMC proceedings, vol. 8, pp. 1–7, 2014.
[28]        Datema, Frank R., Ana Moya, Peter Krause, Thomas Bäck, Lars Willmes, Ton Langeveld, Robert J. Baatenburg de Jong, Henk M. Blom., “Novel head and neck cancer survival analysis approach: Random survival forests versus cox proportional hazards regression”, Head Neck, vol. 34, no. 1, p. Pages 50-58, 2010. 
[29]        A. A. Kim, S. Rachid Zaim, V. Subbian, “Assessing reproducibility and veracity across machine learning techniques in biomedicine: A case study using TCGA data”, International Journal of Medical Informatics, vol. 141, p. 104148, 2020.      
[30]        Y. O. Nunez Lopez, B. Victoria, P. Golusinski, W. Golusinski, M. M. Masternak, “Characteristic miRNA expression signature and random forest survival analysis identify potential cancer-driving miRNAs in a broad range of head and neck squamous cell carcinoma subtypes”, Reports Pract. Oncol. Radiother, vol. 23, no. 1, pp. 6–20, 2018.
[31]         Y. Fang, H. Lu, and H. Liu, "Multi-modality deep forest for hand motion recognition via fusing sEMG and acceleration signals", International Journal of Machine Learning and Cybernetics, vol. 14, no. 4, pp.1119-1131, 2023.
[32]        Y. Guo, S. Liu, Z. Li, X. Shang, “BCDForest : a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data”, BMC Bioinformatics, vol. 19, no. Suppl 5, pp. 1–13, 2018.
[33]        J. Xu, P. Wu, Y. Chen, Q. Meng, H. Dawood, M. M. Khan, “A Novel Deep Flexible Neural Forest Model for Classification of Cancer Subtypes Based on Gene Expression Data”, IEEE Access, vol. 7, pp. 22086–22095, 2019.            
[34]        J. Xu, P. Wu, Y. Chen, Q. Meng, H. Dawood, H. Dawood, “Open Access A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data”, BMC Bioinformatics, pp. 1–11, 2019.