A self-adaptive approach to job scheduling in cloud computing environments

Document Type : Article

Authors

Cloud Computing Center, School of Computer Engineering, Iran University of Science and Technology, Tehran, P.O. Box 1684613114, Iran

Abstract

Due to its convenience and flexible services, cloud users have drastically increased during the past decade. Manual configuration for the available resources makes the resource management process potentially error-prone. While optimal scheduling is an NP-complete problem, it becomes more complicated due to other factors such as resource dynamicity and on-demand consumer applications’ requirements. In this research, we have used deep reinforcement learning (DRL) as a sequential decision-making method for automatic resource management that changes its behavior to deal with environmental changes. The proposed approach uses the discrete soft actor-critic algorithm which is a model-free deep reinforcement learning algorithm. The proposed approach is compared to similar reinforcement learning-based automatic resource management researches using Google’s dataset. Results show that the proposed approach improves the slowdown and the balance of slowdown at least, 3 and 5 times in the left-bi-model, 4 and 3 times in the right-bi-model, 3 and 7 times in the normal-model, 4 and 2 times in the balanced-bi-model and 3 and 3 times using the Google's dataset.

Keywords


References:
1. Rjoub, G., Bentahar, J., Abdel Wahab, O., et al. "Deep and reinforcement learning for automated task scheduling in large-scale cloud computing systems", Concurrency and Computation: Practice and Experience, (2020). DOI: 10.1002/cpe.5919.
2. Maqableh, M., Karajeh, H., and Masa'deh, R. "Job scheduling for cloud computing using neural networks", Communications and Network, 06(03), pp. 191-200 (2014). DOI: 10.4236/cn.2014.63021.
3. Singh, S. and Chana, I. "QoS-Aware autonomic resource management in cloud computing", ACM Computing Surveys, 48(3), pp. 1-46 (2016). DOI: 10.1145/2843889.
4. Liang, S., Yang, Z., Jin, F., et al. "Data centers job scheduling with deep reinforcement learning", In Proceedings of 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, Singapore, pp. 906-917 (2020).
5. Mao, H., Alizadeh, M., Menache, I., et al. "Resource management with deep reinforcement learning", In Proceedings of the 15th ACM Workshop on Hot Topics in Networks, Atlanta GA, USA, pp. 50-56 November- (2016).
6. Haarnoja, T., Zhou, A., Abbeel, P., et al. "Soft actorcritic: Off-Policy maximum entropy deep reinforcement learning with a stochastic actor", arXiv preprint arXiv: 1801.01290 (2018).
7. "Apache Hadoop 3.3.0-Hadoop: Fair Scheduler", URL: https://hadoop.apache.org/docs/current/hadoopyarn/ hadoop-yarn-site/FairScheduler.html Access date: 7 September (2020).
8. Ghodsi, A., Zaharia, M., Hindman, B., et al. "Dominant resource fairness: Fair allocation of multiple resource types", In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, Boston, MA, USA, pp. 323-336 (2011).
9. Song, W., Xiao, Z., Chen, Q., et al. "Adaptive resource provisioning for the cloud using online bin packing", IEEE Transactions on Computers, 63(11), pp. 2647- 2660 (2014). DOI: 10.1109/tc.2013.148. 
10. Grandl, R., Ananthanarayanan, G., Kandula, S., et al. "Multi-resource packing for cluster schedulers", In Proceedings of the 2014 ACM Conference on SIGCOMM, Chicago Illinois, USA, pp. 455-466 (2014).
11. Pezzella, F., Morganti, G., and Ciaschetti, G. "A genetic algorithm for the  flexible job-shop scheduling problem", Computers & Operations Research, 35(10), pp. 3202-3212 (2008). DOI: 10.1016/j.cor.2007.02.014.
12. Azad, P. and Navimipour, N. "An energy-aware task scheduling in the cloud computing using a hybrid cultural and ant colony optimization algorithm", International Journal of Cloud Applications and Computing, 7(4), pp. 20-40 (2017). DOI: 10.4018/ijcac. 2017100102.
13. Huang, J., Xiao, C., and Wu, W. "RLSK: A job scheduler for federated kubernetes clusters based on reinforcement learning", In Proceedings of 2020 IEEE International Conference on Cloud Engineering (IC2E), Sydney, Australia, Australia, (2020).
14. Mao, H., Schwarzkopf, M., Venkatakrishnan, S., et al. "Learning scheduling algorithms for data processing clusters", In Proceedings of the ACM Special Interest Group on Data Communication, Beijing, China (2019). DOI: 10.1145/3341302.3342080.
15. Chen, W., Xu, Y., and Wu, X. "Deep reinforcement learning for multi-resource multi-machine job scheduling", arXiv preprint arXiv:1711.07440 (2017).
16. Ye, Y., Ren, X., Wang, J., et al. "A new approach for resource scheduling with deep reinforcement learning", arXiv preprint arXiv:1806.08122 (2018).
17. Domeniconi, G., Lee, E., Venkataswamy, V., et al. "CuSH: cognitive scheduler for heterogeneous high-performance computing system", In Proceedings of DRL4KDD 19: Workshop on Deep Reinforcement Learning for Knowledge Discovery (DRL4KDD), Alaska, USA (2019).
18. Cheong, M., Lee, H., Yeom, I., et al. "SCARL: Attentive reinforcement learning-based scheduling in a multi-resource heterogeneous cluster", IEEE Access, 7, pp. 153432-153444 (2019). DOI: 10.1109/access. 2019.2948150.
19. Li, F. and Hu, B. "DeepJS: Job scheduling based on deep reinforcement learning in cloud data center", In Proceedings of the 2019 4th International Conference on Big Data and Computing, Guangzhou, China, pp. 48-53 (2019).
20. Liang, S., Yang, Z., Jin, F., et al. "Data centers job scheduling with deep reinforcement learning", In Proceedings of 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, Singapore, pp. 906-917 (2020).
21. Guo, W., Tian, W., Ye, Y., et al. "Cloud resource scheduling with deep reinforcement learning and imitation learning", IEEE, Internet of Things Journal, 8(5), pp. 3576-3586 (2021).
22. Xu, M., Song, C., Ilager, S., et al. "CoScal: Multifaceted scaling of microservices with reinforcement learning", IEEE Transactions on Network and Service Management, 19(4), pp. 3995-4009 (2022).
23. Zhong, Z., Xu, M., Rodriguez, M.A., et al. "Machine learning-Based orchestration of containers: A taxonomy and future directions", ACM Computing Surveys, 54(10)s, pp. 1-35 (2022).
24. Christodoulou, P. "Soft actor-critic for discrete action settings", arXiv preprint arXiv: 1910.07207 (2019).
25. Haarnoja, T., Zhou, A., Abbeel, P., et al. "Soft actorcritic: Off-Policy maximum entropy deep reinforcement learning with a stochastic actor", arXiv preprint arXiv: 1801.01290 (2018).
26. Arndt, C., Information Measures, 1st ed. Berlin: Springer (2004).
27. Aubret, A., Matignon, L., and Hassas, S. "A survey on intrinsic motivation in reinforcement learning ", arXiv preprint arXiv:1908.06976 (2019).
28. Howard, R. "Dynamic programming and Markov processes", Cambridge: M.I.T. Press, (1972).
29. "In-depth review of soft actor-critic", URL: https://towardsdatascience.com/in-depth-review-ofsoft- actor-critic-91448aba63d4, Access date: May 17th, (2021).
30. Haarnoja, T., Zhou, A., Hartikainen, K., et al. "Soft actor-critic algorithms and applications", arXiv preprint arXiv:1812.05905 (2019).
31. Geist, M., Scherrer, B., and Pietquin, O. "A theory of regularized Markov decision processes", arXiv preprint arXiv:1901.11275 (2019).
32. "Google's dataset", URL: gs://clusterdata 2019 a/ instance usage-0000001.json.gz.
33. "PyTorch", URL: https://pytorch.org/, Access date: March 12th, (2020).
34. "PyCharm", URL: https://www.jetbrains.com/ pycharm/, Access date: March 14th, (2020).
35. "Google Cluster Workload Traces 2019", URL: https://research.google/tools/datasets/google-clusterworkload- traces-2019/, Access date: April 1th, (2020).
36. Khan, T., Tian, W., Zhou, G., et al. "Machine learning (ML)-centric resource management in cloud computing: A review and future directions," Journal of Network and Computer Applications, 204, pp. 1-51 (2022).