Analysis of Occupational Injury and Forecasting the Number of Lost Days: A Machine Learning Approach
Analysis of Occupational Injury with Machine Learning
Abstract
Background: Work-related injuries data analysis helps the choice of measures to prevent accidents, therefore, the information and data of occupational injuries were analyzed to identify the overall trend of occupational injuries, and develop a forecasting model of lost workdays and provinces clustering.
Methods: To achieve the first goal, we calculated NFIR and FIR per 100000 workers and injury indices (including AFR, ASR, FSI, Safe T-Score, IR, MDR, and LTIR). To reach the second purpose, the FEE, FTE, FETE, and RE linear models and supervised machine learning alongside linear models (Random Forest, Extra trees, XG Boost, G Boost) were used. Finally, the AP clustering algorithm for provinces clustering, time series clustering (DTW method), and the KNN forecasting algorithm were applied. Data for 378826 occupational injuries, which occurred from 2001 to 2019, were extracted from the publications of the ISSO. Industries data were extracted from the Ministry of Industry publications.
Results: NFIR to FIR ratio ranged between 60 to 265.35 and injuries indices increase from 2001 to 2008 and then experienced a decline. In linear models, OLS and RE had the best performance in forecasting the loss days and the extra trees method had better performance as a blender than random forest method. The clustering results showed that Khuzestan, Markazi, Mazandaran, Qazvin, and Tehran provinces are in cluster 1 and other provinces are in cluster 2.
Conclusion: This study can be regarded to forecast occupational injuries and safety promotion planning.
2. Dorman P. The economics of safety, health, and well-being at work: an overview. Geneva: ILO; 2000.
3. Alizadeh SS, Mortazavi SB, Sepehri MM. Analysis of occupational accident fatalities and injuries among male group in Iran between 2008 and 2012. Iran Red Crescent Med J. 2015;17(10):e18975.
4. International Labour Organization. Safety and health at work. Geneva: ILO; 2018.
5. Rohani JM, Johari MF, Hamid WHW, Atan H, Adeyemi AJ, Udin A. Occupational Accident Direct Cost Model Validation Using Confirmatory Factor Analysis. Procedia Manuf. 2015;2:286-90.
6. Asady H, Yaseri M, Hosseini M, Zarif-Yeganeh M, Yousefifard M, Haghshenas M, et al. Risk factors of fatal occupational accidents in Iran. Ann Occup Environ Med. 2018;30(1):29.
7. Rahmani A, Khadem M, Madreseh E, Aghaei HA, Raei M, Karchani M. Descriptive Study of Occupational Accidents and their Causes among Electricity Distribution Company Workers at an Eight-year Period in Iran. Saf Health Work. 2013;4(3):160-5.
8. Mehrdad R, Seifmanesh S, Chavoshi F, Aminian O, Izadi N. Epidemiology of occupational accidents in Iran based on social security organization database. Iran Red Crescent Med J. 2014;16(1):e10359.
9. Kim Y, Park J, Park M. Creating a Culture of Prevention in Occupational Safety and Health Practice. Saf Health Work. 2016;7(2):89-96.
10. Hamidi N, Omidvari M, Meftahi M. The effect of integrated management system on safety and productivity indices: Case study; Iranian cement industries. Saf Sci. 2012;50(5):1180-9.
11. Bakhtiyari M, Delpisheh A, Riahi SM, Latifi A, Zayeri F, Salehi M, et al. Epidemiology of occupational accidents among Iranian insured workers. Saf Sci. 2012;50(7):1480-4.
12. Villanueva V, Garcia AM. Individual and occupational factors related to fatal occupational injuries: A case-control study. Accid Anal Prev. 2011;43(1):123-7.
13. Vahabi N, Kazemnejad A, Datta S. Empirical Bayesian Geographical Mapping of Occupational Accidents among Iranian Workers. Arch Iran Med. 2017;20(5):298-304.
14. Lin YH, Chen CY, Luo JL. Gender and age distribution of occupational fatalities in Taiwan. Accid Anal Prev. 2008;40(4):1604-10.
15. Sarkar S, Maiti J. Machine learning in occupational accident analysis: A review using science mapping approach with citation network analysis. Saf Sci. 2020;131:104900.
16. Zhu R, Hu X, Hou J, Li X. Application of machine learning techniques for predicting the consequences of construction accidents in China. Process Saf Environ Prot. 2021;145:293-302.
17. Shayboun M, Kifokeris D, Koch C. Machine Learning for Analysis of Occupational Accidents Registration Data. In: Proceedings of the 36th Annual ARCOM Conference; 2020 Sep 7-8; Leeds, UK. p. 618-27.
18. Iranian Social Security Organization. Statistics and information [Internet]. Tehran: ISSO; [cited 2026 Jun 4]. Available from: https://tamin.ir
19. Ministry of Industry, Mine and Trade. Statistics and information [Internet]. Tehran: MIMT; [cited 2026 Jun 4]. Available from: https://mimt.gov.ir
20. Occupational Safety and Health Administration. OSHA statistics. Washington (DC): OSHA; 2014.
21. Channing J. Safety at work. 7th ed. London: Routledge; 2013.
22. International Association of Drilling Contractors. Incident statistics program. Houston (TX): IADC; 2018.
23. Müller AC, Guido S. Introduction to machine learning with Python: a guide for data scientists. Sebastopol (CA): O’Reilly Media; 2016.
24. Verbeek M. A guide to modern econometrics. 5th ed. Hoboken (NJ): John Wiley & Sons; 2017.
25. Wooldridge JM. Introductory econometrics: A modern approach. 6th ed. Boston (MA): Cengage Learning; 2015.
26. Sarkar S, Chain M, Nayak S, Maiti J. Decision support system for prediction of occupational accident: a case study from a steel plant. In: Verma N, Ghosh AK, editors. Emerging technologies in data mining and information security. Singapore: Springer; 2019. p. 787-96.
27. Majumder M, Bhattacharyya S. An alternate arrangement of geofoam blocks and air pocket to mitigate confined blast induced vibration. Int J Geotech Eng. 2021;15(1):52-65.
28. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3-42.
29. Schaal S, Atkeson C. From isolation to cooperation: An alternative view of a system of experts. In: Touretzky D, Mozer M, Hasselmo M, editors. Advances in Neural Information Processing Systems 8 (NIPS 1995). Cambridge (MA): MIT Press; 1995. p. 605-11.
30. Breiman L. Bias, variance, and arcing classifiers. Berkeley (CA): Statistics Department, University of California; 1996. Technical Report No. 460.
31. Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. 2nd ed. Sebastopol (CA): O’Reilly Media; 2019.
32. Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002;38(4):367-78.
33. Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug 13-17; San Francisco, CA. New York: ACM; 2016. p. 785-94.
34. Zhou Y, Li T, Shi J, Qian Z. A CEEMDAN and XGBOOST-based approach to forecast crude oil prices. Complexity. 2019;2019:4391675.
35. Chen JM, Zovko M, Šimurina N, Zovko V. Fear in a Handful of Dust: The Epidemiological, Environmental and Economic Drivers of Death by PM2.5 Pollution. Int J Environ Res Public Health. 2021;18(16):8688.
36. Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315(5814):972-6.
37. Kumar S, Toshniwal D. A novel framework to analyze road accident time series data. J Big Data. 2016;3(1):1-11.
38. Sardá-Espinosa A. Comparing time-series clustering algorithms in R using the dtwclust package. R Package Vignette. 2017;12:41.
39. Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process. 1978;26(1):43-9.
40. Tajmouati S, Wahbi BE, Bedoui A, Abarda A, Dakkoun M. Applying k-nearest neighbors to time series forecasting: two new approaches. arXiv [Preprint]. 2021:arXiv:2103.14200.
41. Salguero-Caparros F, Suarez-Cebador M, Rubio-Romero JC. Analysis of investigation reports on occupational accidents. Saf Sci. 2015;72:329-36.
42. Statistical Center of Iran. Iranian labor force survey report 2005-2017. Tehran: SCI; 2018.
43. Eurostat. Accidents at work statistics [Internet]. Luxembourg: European Commission; 2015 [cited 2026 Jun 4]. Available from: https://ec.europa.eu/eurostat/statistics-explained/index.php/Accidents_at_work_statistics
44. Bureau of Labor Statistics. National census of fatal occupational injuries in 2017. Washington (DC): BLS; 2017.
45. Health and Safety Authority. Summary of Workplace Injury, Illness and Fatality Statistics 2014-2015. Dublin: HSA; 2016.
46. Hämäläinen P. Global estimates of occupational accidents and work-related illnesses 2017. Espoo: Finnish Institute of Occupational Health; 2017.
47. Health and Safety Executive. Fatal injuries arising from accidents at work in Great Britain 2017. Bootle: HSE; 2017.
48. Safe Work Australia. Key Work Health and Safety Statistics Australia 2017. Canberra: SWA; 2017.
49. Kakhki FD, Freeman SA, Mosher GA. Evaluating machine learning performance in predicting injury severity in agribusiness industries. Saf Sci. 2019;117:257-62.
50. Kang K, Ryu H. Predicting types of occupational accidents at construction sites in Korea using random forest model. Saf Sci. 2019;120:226-36.
51. Koc K, Ekmekcioğlu Ö, Gurgun AP. Integrating feature engineering, genetic algorithm and tree-based machine learning methods to predict the post-accident disability status of construction workers. Autom Constr. 2021;131:103896.
52. Sarkar S, Pramanik A, Maiti J, Reniers G. Predicting and analyzing injury severity: A machine learning-based approach using class-imbalanced proactive and reactive data. Saf Sci. 2020;125:104616.
| Files | ||
| Issue | Vol 17 No 3 (2025) | |
| Section | Original Article(s) | |
| Published | 2026-06-08 | |
| Keywords | ||
| Analysis of Occupational Injury Accident Data Capturing System Lost Days Machine Learning Forecasting | ||
| Rights and permissions | |
|
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |

