Distributed Job Scheduling in on-Demand GPU as a Service Systems

Document Type : Original Article

Authors

Faculty of Electrical Engineering, Sahand University of Technology, Tabriz, Iran.

Abstract

Optimal scheduling of resources is essential on GPU-based servers that are suitable for parallel tasks. These resources usually have a high speed and therefore have a high cost. In order to make optimal use of these resources, service providers must be able to choose the best type of virtual machine, the best type of GPU processor, and the best number of this type of processor for each request. Such a problem is called an optimization problem. The present article, while modeling the resource allocation problem as a linear optimization problem, presents a new method for distributing requests. The proposed method uses a central queue and then distributes requests among several local queues using a new request distribution method. Then it schedules and executes the tasks in each local queue in parallel. Scheduling in each local queue determines, for each request: (1) the best type of virtual machine, (2) the best type of GPU processor, and (3) the best number of GPU processors. The comparison of the proposed method with the latest available methods shows a decrease in execution time, a decrease in response time, and a significant decrease in the cost of using resources in the proposed method.

Keywords

Main Subjects


[1] Peddie, Jon. "What is a GPU?" In The History of the GPU-Steps to Invention, pp. 333-345. Cham: Springer International Publishing, 2023.
[2] Buber, Ebubekir, and D. I. R. I. Banu. "Performance analysis and CPU vs GPU comparison for deep learning." In 2018 6th International Conference on Control Engineering & Information Technology (CEIT), pp. 1-6. IEEE, 2018.
[3] Keckler, Stephen W., William J. Dally, Brucek Khailany, Michael Garland, and David Glasco. "GPUs and the future of parallel computing." IEEE micro 31, no. 5, pp. 7-17, 2011.
[4] Arunarani, A. R., Dhanabalachandran Manjula, and Vijayan Sugumaran. "Task scheduling techniques in cloud computing: A literature survey." Future Generation Computer Systems 91, pp. 407-415, 2019.
[5] Filippini, Federica, Marco Lattuada, Arezoo Jahani, Michele Ciavotta, Danilo Ardagna, and Edoardo Amaldi. "Hierarchical Scheduling in on-demand GPU-as-a-Service Systems." In 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 125-132. IEEE, 2020.
[6] Wu and B. Hong, "Collocating cpu-only jobs with gpuassisted jobs on gpu-assisted hpc," in CCGrid, 2013 13th IEEE/ACM International Symposium on, pp. 418–425, IEEE, 2013.
[7] Kayiran, N. C. Nachiappan, A. Jog, R. Ausavarungnirun, M. T. Kandemir, G. H. Loh, O. Mutlu, and C. R. Das, "Managing gpu concurrency in heterogeneous architectures," in Microarchitecture, 47th Annual IEEE/ACM International Symposium on, pp. 114–126, IEEE, 2014.
[8] Reano, F. Silla, D. S. Nikolopoulos, and B. Varghese, "Intranode memory safe gpu co-scheduling," IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 5, pp. 1089–1102, 2018.
[9] Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa, "Timegraph: Gpu scheduling for real-time multi-tasking environments," in Proc. USENIX ATC, pp. 17–30, 2011.
[10] Kang, W. Joo, S. Lee, and D. Shin, "Priority-driven spatial resource sharing scheduling for embedded graphics processing units," Journal of Systems Architecture, vol. 76, pp. 17–27, 2017.
[11] -M. Oprescu and T. Kielmann, "Bag-of-tasks scheduling under budget constraints," in 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp. 351–359, IEEE, 2010.
[12] Cai, X. Li, R. Ruiz, and Q. Li, "A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds," Future Generation Computer Systems, vol. 71, pp. 57–72, 2017.
[13] Åsberg, Mikael, Thomas Nolte, Shinpei Kato, and Ragunathan Rajkumar. "Exsched: An external cpu scheduler framework for real-time systems." In 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp. 240-249. IEEE, 2012.
[14] Ukidave, Yash, Xiangyu Li, and David Kaeli. "Mystic: Predictive scheduling for gpu based cloud servers using machine learning." In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 353-362. IEEE, 2016.
[15] Jahani, Arezoo, Marco Lattuada, Michele Ciavotta, Danilo Ardagna, Edoardo Amaldi, and Li Zhang. "Optimizing on-demand gpus in the cloud for deep learning applications training." In 2019 4th International Conference on Computing, Communications and Security (ICCCS), 1-8. IEEE, 2019.
[16] Lattuada, Marco, Eugenio Gianniti, Danilo Ardagna, and Li Zhang. "Performance prediction of deep learning applications training in GPU as a service systems." Cluster Computing, 1-24, 2022.