Comparative analysis of advanced machine learning classifiers based on feature engineering framework for weather prediction

Document Type : Research Article

Authors

1 Department of Electrical Engineering, Netaji Subhas University of Technology, Dwarka, New Delhi, 110078, India.

2 Department of Computer Science and Engineering, SRM Institute of Science and Technology, Delhi NCR Campus, 201204, Ghaziabad, India.

3 Department of Instrumentation and Control Engineering, Dr BR Ambedkar National Institute of Technology Jalandhar, Punjab-144008, India.

4 Department of ICE, Netaji Subhas University of Technology, Dwarka, New Delhi, 110078, India.

5 School of Automation, Banasthali Vidyapith, Rajasthan, 304022, India.

10.24200/sci.2024.61305.7242

Abstract

Significant climatic change is a really difficult task that affects people all across the world. Rainfall is considered one of the most significant phenomena in the weather system, and its rate is one of the most crucial variables. To develop a prediction model by standard approaches, meteorological experts attempt to detect the atmospheric attributes such as sunlight, temperature, humidity and cloudiness etc. Machine Learning (ML) techniques are recently more evolved which provides results that are more satisfactory than those of traditional methods and are simple to use. This paper presents the ML classifiers such as Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Light Gradient Boost Machine (LGBM), Cat Boost (CB), and Extreme Gradient Boost (XGB) to predict the rainfall using feature engineering framework. The Area Under the Receiver Operating Characteristic (AUROC) curve and the other statistical indicators such as recall, accuracy, precision, and Cohen Kappa are employed to predict and compare the success rate of the above-mentioned approaches. The validation results of the models in terms of AUROC values are XGB (0.94) > CB (0.93) > LGBM (0.87) >RF (0.93) >DT (0.88) > LR (0.78). Conclusively, the XGB model outperforms the other models in terms of statistical parameters.

Keywords

Main Subjects


References
1. Xiao, Z., Liu, B., Liu, H., et al. “Progress in climate prediction and weather forecast operations in China”, Advances in Atmospheric Sciences, 29, pp. 943-957 (2012). https://doi.org/10.1007/s00376-012-1194-9
2. Jha, A., Goel, V., Kumar, M., et al. “An efficient and interpretable stacked model for wind speed estimation based on ensemble learning algorithms”, Energy Technology, 12(6), 2301188 (2024). https://doi.org/10.1002/ente.202301188
3. Gupta, R., Yadav, A.K., and Jha, S.K. “Prediction of global horizontal irradiance using an explainable data driven machine learning algorithms”, Electric Power Components and Systems, pp. 1-18 (2024). https://doi.org/10.1080/15325008.2024.2310771
4. Rodríguez-Mazahua, L., Rodríguez-Enríquez, C.A., Sánchez-Cervantes, et al. “A general perspective of Big Data: applications, tools, challenges and trends”, The Journal of Supercomputing, 72, pp. 3073-3113 (2016). https://doi.org/10.1007/s11227-015-1501-1
5. Gupta, R., Yadav, A.K., and Jha, S.K. “Harnessing the power of hybrid deep learning algorithm for the estimation of global horizontal irradiance”, Science of the Total Environment, 943,173958 (2024). https://doi.org/10.1016/j.scitotenv.2024.173958
6. Li, K. and Liu, Y.S. “A rough set based fuzzy neural network algorithm for weather prediction”, In IEEE International Conference on Machine Learning and Cybernetics, pp. 1888-1892 (2005). https://doi.org/10.1109/ICMLC.2005.1527253
7. Hewage, P., Trovati, M., Pereira, E., et al. “Deep learning-based effective fine-grained weather forecasting model”, Pattern Analysis and Applications, 24, pp. 343-366 (2021). https://doi.org/10.1007/s10044-020-00898-1
8. Gupta, R., Yadav, A.K., Jha, S.K., et al. “A robust regressor model for estimating solar radiation using an ensemble stacking approach based on machine learning”, International Journal of Green Energy, 21(8), pp. 1853-1873 (2023). https://doi.org/10.1080/15435075.2023.2276152
9. Singh, S.K., Jha, S.K., and Gupta, R. “Enhancing the accuracy of wind speed estimation model using an efficient hybrid deep learning algorithm”, Sustainable Energy Technologies and Assessments, 61, 103603 (2024). https://doi.org/10.1016/j.seta.2023.103603
10. Pathak, P.K. and Yadav, A.K. “Design of battery charging circuit through intelligent MPPT using SPV system”, Solar Energy, 178, pp. 79-89 (2019). https://doi.org/10.1016/j.solener.2018.12.018
11. Jebli, I., Belouadha, F.Z., Kabbaj, M.I., et al. “Prediction of solar energy guided by pearson correlation using machine learning”, Energy, 224, 120109 (2021). https://doi.org/10.1016/j.energy.2021.120109
12. Singh, S.K., Jha, Sh. K., and Gupta, R. “Comparative analysis between Bi-LSTM and Uni-LSTM algorithms for wind speed estimation”, In 2023 7th International Conference on Computer Applications in Electrical Engineering-Recent Advances (CERA), IEEE, pp. 1-6 (2023). https://doi.org/10.1109/CERA59325.2023.10455462
13. Gupta, R., Yadav, A.K., Jha, S.K., et al. “Predicting global horizontal irradiance of north central region of India via machine learning regressor algorithms”, Engineering Applications of Artificial Intelligence, 133, 108426 (2024). https://doi.org/10.1016/j.engappai.2024.108426
14. Markuna, S., Kumar, P., Ali, R., et al. “Application of innovative machine learning techniques for long-term rainfall prediction”, Pure and Applied Geophysics, 180, pp. 335-363 (2023). https://doi.org/10.1007/s00024-022-03189-4
15. Zhang, X., Chen, H., Wen, Y., et al. “A new rainfall prediction model based on ICEEMDAN-WSD-BiLSTM and ESN”, Environmental Science and Pollution Research, 30, pp. 53381-53396 (2023). https://doi.org/10.1007/s11356-023-25906-9
16. Rao, J., Wu, T., Garfinkel, C.I., et al. “Impact of the initial stratospheric polar vortex state on East Asian Spring rainfall prediction in seasonal forecast models”, Climate Dynamics, 60, pp. 4111-4131 (2023). https://doi.org/10.1007/s00382-022-06551-3
17. Abebe, W.T. and Endalie, D. “Artificial intelligence models for prediction of monthly rainfall without climatic data for meteorological stations in Ethiopia”, Journal of Big Data, 10(2), pp. 1-15 (2023). https://doi.org/10.1186/s40537-022-00683-3
18. Hussein, E.A., Ghaziasgar, M., Thron, C., et al. “Rainfall prediction using machine learning models: Literature survey”, Artificial Intelligence for Data Science in Theory and Practice, pp. 75-108 (2022). https://doi.org/10.1007/978-3-030-92245-0_4
19. Diez-Sierra, J. and Del Jesus, M. “Long-term rainfall prediction using atmospheric synoptic patterns in semi-arid climates with statistical and machine learning methods”, Journal of Hydrology, 586, 124789 (2020). https://doi.org/10.1016/j.jhydrol.2020.124789
20. Bansal, K., Tripathi, A.K., Pandey, A.C., et al. “RfGanNet: An efficient rainfall prediction method for India and its clustered regions using RfGan and deep convolutional neural networks”, Expert Systems with Applications, 235, 121191 (2024). https://doi.org/10.1016/j.eswa.2023.121191
21. Wu, Z., Zhou, Y., Wang, H., et al. “Depth prediction of urban flood under different rainfall return periods based on deep learning and data warehouse”, Science of the Total Environment, 716, 137077 (2020). https://doi.org/10.1016/j.scitotenv.2020.137077
22. Potočnik, P., Vidrih, B., Kitanovski, A., et al. “Neural network, ARX, and extreme learning machine models for the short-term prediction of temperature in buildings”, In Building Simulation, Tsinghua University Press, 12, pp. 1077-1093 (2019).
https://doi.org/10.1007/s12273-019-0548-y
23. Dueben, P.D. and Bauer, P. “Challenges and design choices for global weather and climate models based on machine learning”, Geoscientific Model Development, 11, pp. 3999-4009 (2018). https://doi.org/10.5194/gmd-11-3999-2018
24. Wang, F., Zhen, Z., Wang, B., et al. “Comparative study on KNN and SVM based weather classification models for day ahead short-term solar PV power forecasting”, Applied Sciences, 8(1), 28 (2017). https://doi.org/10.3390/app8010028
25. del Campo-Ávila, J., Takilalte, A. et al. “Binding data mining and expert knowledge for one-day-ahead prediction of hourly global solar radiation”, Expert Systems with Applications, 167, 114147 (2021). https://doi.org/10.1016/j.eswa.2020.114147
26. Kannan, M., Prabhakaran, S., and Ramachandran, P. “Rainfall forecasting using data mining technique”, International Journal of Engineering and Technology, 2(6), pp. 397-401 (2010).
27. Nikam, V.B. and Meshram, B.B. “Modeling rainfall prediction using data mining method: A Bayesian approach”, In 2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation, pp. 132-136 (2013). https://doi.org/10.1109/CIMSim.2013.29
28. Rasheed, F. and Wahid, A. “Learning style detection in E-learning systems using machine learning techniques”, Expert Systems with Applications, 174, 114774 (2021). https://doi.org/10.1016/j.eswa.2021.114774
29. Bagirov, A.M., Mahmood, A., and Barton, A. “Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach”, Atmospheric Research, 188, pp. 20-29 (2017). https://doi.org/10.1016/j.atmosres.2017.01.003
30. Kusiak, A., Wei, X., Verma, A.P., et al. “Modeling and prediction of rainfall using radar reflectivity data: A data-mining approach”, IEEE Transactions on Geoscience and Remote Sensing, 51(4), pp. 2337-2342 (2012). https://doi.org/10.1109/TGRS.2012.2210429
31. Radhika, Y. and Shashi, M. “Atmospheric temperature prediction using support vector machines”, International Journal of Computer Theory and Engineering, 1(1), 55 (2009).
32. Esteves, J.T., de Souza Rolim, G., and Ferraudo, A.S. “Rainfall prediction methodology with binary multilayer perceptron neural networks”, Climate Dynamics, 52, pp. 2319-2331 (2019). https://doi.org/10.1007/s00382-018-4252-x
33. Cakir, S., Kadioglu, M., and Cubukcu, N. “Multischeme ensemble forecasting of surface temperature using neural network over Turkey”, Theoretical and Applied Climatology, 111, pp. 703-711 (2013). https://doi.org/10.1007/s00704-012-0703-1
34. Scher, S. and Messori, G. “Predicting weather forecast uncertainty with machine learning”, Quarterly Journal of the Royal Meteorological Society, 144(717), pp. 2830-2841 (2018). https://doi.org/10.1002/qj.3410
35. Shehadeh, A., Alshboul, O., Al Mamlook, R.E., et al. “Machine learning models for predicting the residual value of heavy construction equipment: An evaluation of modified decision tree, LightGBM, and XGBoost regression”, Automation in Construction, 129, 103827 (2021). https://doi.org/10.1016/j.autcon.2021.103827
36. De Clercq, D., Jalota, D., Shang, R., et al. “Machine learning powered software for accurate prediction of biogas production: A case study on industrial-scale Chinese production data”, Journal of Cleaner Production, 218, pp. 390-399 (2019). https://doi.org/10.1016/j.jclepro.2019.01.031
37. https://www.kaggle.com/jsphyg/weather-dataset-rattle-package
38. Vergara, J.R. and Estévez, P.A. “A review of feature selection methods based on mutual information”, Neural Computing and Applications, 24, pp. 175-186 (2014). https://doi.org/10.1007/s00521-013-1368-0
39. Yang, Y. and Loog, M. “A benchmark and comparison of active learning for logistic regression”, Pattern Recognition, 83, pp. 401-415 (2018). https://doi.org/10.1016/j.patcog.2018.06.004
40. Ekström, M., Esseen, P.A., Westerlund, B., et al. “Logistic regression for clustered data from environmental monitoring programs”, Ecological Informatics, 43, pp.165-173 (2018). https://doi.org/10.1016/j.ecoinf.2017.10.006
41. Huang, T., Li, B., Shen, D., et al. “Analysis of the grain loss in harvest based on logistic regression”, Procedia Computer Science, 122, pp. 698-705 (2017). https://doi.org/10.1016/j.procs.2017.11.426
42. Genuer, R., Poggi, J.M., Tuleau-Malot, C., et al. “Random forests for big data”, Big Data Research, 9, pp. 28-46 (2017).
https://doi.org/10.1016/j.bdr.2017.07.003
43. Ke, G., Meng, Q., Finley, T., et al. “Lightgbm: A highly efficient gradient boosting decision tree”, Advances in Neural Information Processing Systems, 30, pp. 3146-3154 (2017).
44. Hussain, S., Mustafa, M.W., Jumani, T.A., et al. “A novel feature engineered-CatBoost-based supervised machine learning framework for electricity theft detection”, Energy Reports, 7, pp. 4425-4436 (2021). https://doi.org/10.1016/j.egyr.2021.07.008
45. Tao, H., Awadh, S.M., Salih, S.Q., et al. “Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction”, Neural Computing and Applications, 34, pp. 515–533(2022). https://doi.org/10.1007/s00521-021-06362-3
46. Asselman, A., Khaldi, M., and Aammou, S. “Enhancing the prediction of student performance based on the machine learning XGBoost algorithm”, Interactive Learning Environments, 31 (6), pp. 3360-3379 (2023). https://doi.org/10.1080/10494820.2021.1928235
47. Fushiki, T. “Estimation of prediction error by using k-fold cross-validation”, Statistics and Computing, 21, pp. 137-146 (2011). https://doi.org/10.1007/s11222-009-9153-8
Volume 32, Issue 10
Transactions on Computer Science & Engineering and Electrical Engineering
May and June 2026 Article ID:7242
  • Receive Date: 20 October 2022
  • Revise Date: 17 July 2024
  • Accept Date: 29 September 2024