A New Solution for Workload Change Detection in Self-Tuning NoSQL Database

Document Type : Original Article

Authors

1 Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran

2 Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran

Abstract

Database management systems are the main part of information system that the size and complexity of these systems significantly have been increased in recent years. With the growing and being more complicated database management systems, database Administrators face more problems and challenges, so management of these systems is Time-consuming and costly. More over the main part of the total cost of ownership includes the cost of expert database administrator (DBA) who can manage these large and complicated systems. Autonomic databases by providing self-management functionality are caused to reduce the total cost of ownership for a database system. The self-management decisions as the automated schema database tuning depend on the database workload. One of the important issues in realizing the database automated tuning is workload monitoring and analysis for changes detection and schema re-tuning with this changes. In this paper is presented the feedback control loop for continuous monitoring and light-weight analysis of workload in NoSQL column-oriented database. This loop describes a design pattern for self-tuning feature and uses for workload change detection which require automated schema database re-tuning. The experimental results exhibit the effectiveness of the proposed solution for workload change detection.

Keywords


[1]      R. A. Nzekwa, R. Rouvoy, and L. Seinturier, “Modelling feedback control loops for self-adaptive systems”, Electronic Communications of the EASST, vol. 28, no. 2, pp. 106-121, 2010.
[2]      S. Chaudhuri and V. Narasayya, “Self-tuning database systems: a decade of progress”, in Proceedings of the 33rd international conference on Very large data bases, pp. 3-14, 2007.
[3]      E. Hewitt, Cassandra: the definitive guide, O'Reilly Media, 2010.
[4]      M. J. Mior, K. Salem, A. Aboulnaga, and R. Liu, “NoSE: Schema design for NoSQL applications”, in Data Engineering (ICDE), 2016 IEEE 32nd International Conference on, pp. 181-192. 2016
[5]      D. Bermbach, S. Müller, J. Eberhardt, and S. Tai, “Informed Schema Design for Column Store-Based Database Services”, in Service-Oriented Computing and Applications (SOCA), IEEE 8th International Conference on, pp. 163-172, 2015.
[6]      M. Boussahoua, O. Boussaid, and F. Bentayeb, “Logical Schema for Data Warehouse on Column-Oriented NoSQL Databases”, in International Conference on Database and Expert Systems Applications, pp. 247-256, 2017.
[7]      A. Chebotko, A. Kashlev, and S. Lu, “A big data modeling methodology for Apache Cassandra”, in Big Data (BigData Congress) 2015 IEEE International Congress on , pp. 238-245,2015.
[8]      C. de Lima and R. dos Santos Mello, “A workload-driven logical design approach for NoSQL document databases”, in Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services, p. 73-79, 2015.
[9]      M. J. Mior, K. Salem, A. Aboulnaga, and R. Liu, “NoSE: Schema design for NoSQL applications”, IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 10,2017.
[10]      T. Vajk, L. Deák, K. Fekete, and G. Mezei, “Automatic nosql schema development: A case study”, in Artificial Intelligence and Applications, pp. 656-663, 2013.
[11]      T. Vajk, P. Feher, K. Fekete, and H. Charaf, “Denormalizing data into schema-free databases”, in Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on, pp. 747-752: IEEE 2013.
[12]      F. Yang, D. Milosevic, and J. Cao, “Optimising column family for OLAP queries in HBase”, International Journal of Big Data Intelligence,vol. 4, no. 1, pp. 23-35, 2017.
[13]      G. Valentin, M. Zuliani, D. C. Zilio, G. Lohman, and A. Skelley, “DB2 advisor: An optimizer smart enough to recommend its own indexes”, in Data Engineering, 2000. Proceedings. 16th International Conference on, pp. 101-110: IEEE 2000.
[14]      D. C. Zilio, J. Roa, S. Lightstone, G, Lohman, A. Storm, and S. Fadden, “DB2 design advisor: integrated automatic physical database design”, in Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pp. 1087-1097: VLDB Endowment 2004.
[15]      B. Dageville, D. Das, K. Dias, K. Yagoub, M. Zait, and M. Ziauddin, “Automatic SQL tuning in Oracle 10g”, in Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pp. 1098-1109: VLDB Endowment 2004.
[16]      S. Agrawal, S. Chaudhuri, L. Kollar, A. Marathe, V. Narasayya, and M. Syamala, “Database tuning advisor for Microsoft SQL Server 2005: demo”, in Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 930-932: ACM 2005.
[17]      N. Bruno and S. Chaudhuri, “An online approach to physical design tuning”, in Data Engineering, ICDE 2007. IEEE23rd International Conference on, pp. 826-835: IEEE 2007.
[18]      M. Holze and N. Ritter, “Towards workload shift detection and prediction for autonomic databases”, in Proceedings of the ACM first Ph. D. workshop in CIKM, pp. 109-116: ACM 2007.
[19]      M. Holze and N. Ritter, “Autonomic databases: Detection of workload shifts with n-gram-models”, in East European Conference on Advances in Databases and Information Systems, pp. 127-142: Springer, Berlin, Heidelberg 2008.
[20]      K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis, “Colt: continuous on-line tuning”, in Proceedings of the 2006 ACMSIGMOD international conference on Management of data, pp. 793-795: ACM 2006.
[21]      K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis, “On-line index selection for shifting workloads”, in Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, pp. 459-468: IEEE Computer Society, 2007.
[22]      محیا ارومیه و نگین دانش‌پور، «مدلی سه لایه در طراحی سطح منطقی پایگاه داده تحلیلی»، مجله مهندسی برق، دانشگاه تبریز، جلد 47، شماره 2، صفحات 371-380، 1396.
[23]      پروانه شایق بروجی و نگین دانشپور، « انتخاب دید جهت ذخیرهسازی دید در پایگاه داده تحلیلی با استفاده از الگوریتم فرهنگی ترکیبی»، مجله مهندسی برق، دانشگاه تبریز، جلد 46، شماره 2، صفحات 97-108، 1395.
[24]      A. Pavlo et al., “Self-Driving Database Management Systems”, in CIDR 2017, Conference on Innovative Data Systems Research, January 8-11, Chaminade, CA, 2017.
[25]      R. Schroeder and R. d. S. Mello, “Improving query performance on XML documents: a workload-driven design approach”, in Proceedings of the eighth ACM symposium on Document engineering, pp. 177-186: ACM 2008.
[26]      P. S. Yu, M.-S. Chen, H.-U. Heiss, and S. Lee, “On workload characterization of relational database environments”, IEEE Transactions on Software Engineering,vol. 18, no. 4, pp. 347-355, 1992.
[27]      S. Elnaffar, P. Martin, B. Schiefer, and S. Lightstone, “Is it DSS or OLTP: automatically identifying DBMS workloads”, Journal of Intelligent Information Systems,vol. 30, no. 3, pp. 249-271, 2008.
[28]      S. Elnaffar and P. Martin, “The Psychic–Skeptic Prediction framework for effective monitoring of DBMS workloads”, Data & Knowledge Engineering,vol. 68, no. 4, pp. 393-414, 2009.
[29]      Z. Zewdu, M. K. Denko, and M. Libsie, “Workload characterization of autonomic dbmss using statistical and data mining techniques”, in Advanced Information Networking and Applications Workshops, WAINA'09. International Conference on, pp. 244-249: IEEE 2009.
[30]      M. Holze and N. Ritter, “Autonomic Databases: Detection of Workload Shifts with n-Gram-Models”, In East European Conference on Advances in Databases and Information Systems (pp. 127-142). Springer, Berlin, Heidelberg, 2008.
[31]      Q. Yao, A. An, and X. Huang, “Finding and analyzing database user sessions”, In International Conference on Database Systems for Advanced Applications (pp. 851-862). Springer, Berlin, Heidelberg, 2005.
[32]      M. Abdul, A. M. Muhammad, N. Mustapha, S. Muhammad, and N. Ahmad, “Database workload management through CBR and fuzzy based characterization”, Applied Soft Computing, vol. 22, pp. 605-621, 2014.
[33]      A. Aamodt and E. Plaza, “Case-based reasoning: Foundational issues, methodological variations, and system approaches”, AI communications, vol. 7, no. 1, pp. 39-59, 1994.
[34]      F. Bugiotti, L. Cabibbo, P. Atzeni, and R. Torlone, “Database design for NoSQL systems”, in International Conference on Conceptual Modeling, pp. 223-231: Springer 2014.
[35]      M. C. Huebscher and J. A. McCann, “A survey of autonomic computing—degrees, models, and applications”, ACM Computing Surveys (CSUR),vol. 40, no. 3, pp. 191-213, 2008.
[36]      W. Karwowski, International encyclopedia of ergonomics and human factors, Second Edition ed. Crc Press, 2006.
[37]      J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011.
[38]      P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to data mining. Boston: Pearson Addison Wesley, 2005.
[39]      C. Ordonez and E. Omiecinski, “FREM: fast and robust EM clustering for large data sets”, in Proceedings of the eleventh international conference on Information and knowledge management, pp. 590-599: ACM 2002.
[40]      L. O'callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani, “Streaming-data algorithms for high-quality clustering, in Data Engineering”, Proceedings 18th International Conference on, pp.685-694, 2002.
[41]      D. Jurafsky and J. H. Martin, Speech and language processing. Pearson London, 2014.