The use of Machine Learning in diabetes prevention

Authors

  • Maria Alice Lopes Instituto Politécnico de Viseu, Viseu, Portugal
  • Cristina Lacerda Instituto Politécnico de Viseu, Viseu, Portugal | Centro de Estudos em Educação e Inovação (CI&DEI), Viseu, Portugal https://orcid.org/0000-0002-8921-4747
  • Joana Fialho Instituto Politécnico de Viseu, Viseu, Portugal | Centro de Estudos em Educação e Inovação (CI&DEI), Viseu, Portugal https://orcid.org/0000-0002-3910-8292

DOI:

https://doi.org/10.29352/mill0221e.43168

Keywords:

diabetes mellitus; machine learning; deep learning; recurrent neural networks; feature selection

Abstract

Introduction: Diabetes Mellitus is one of the fastest-growing chronic diseases globally. Machine Learning (ML) techniques offer significant potential for identifying patterns useful for disease control.

Objective: To analyze the impact of ML techniques and the use of feature selection techniques in predicting diabetes, using the “Diabetes Health Indicators” dataset.

Methods: The CRISP-DM methodology was applied. The data were balanced using the NearMiss subsampling technique. Recursive Feature Elimination (RFE) and Principal Component Analysis (PCA) were used for attribute selection. Six models were tested: Random Forest, Gradient Boosting, KNN, Logistic Regression, Multilayer Perceptron (MLP), and Recurrent Neural Networks (RNN).

Results: Class balancing significantly improved results. The RNN achieved the best performance, with 86.8% accuracy and an F1-score of 0.868. The combination of RFE with MLP also showed strong performance. Feature selection (RFE and PCA) reduced dimensionality without loss of accuracy

Conclusion: ML and DL techniques are promising for prioritizing clinical follow-up and informing public health policies. Enhancing data representativeness, integrating Explainable AI techniques, and adjusting thresholds to reduce false negatives are essential for practical applications.

Downloads

Download data is not yet available.

References

Alzyoud, M., Alazaidah, R., Aljaidi, M., Samara, G., Qasem, M. H., Khalid, M., & Al-Shanableh, N. (2024). Diagnosing diabetes mellitus using machine learning techniques. International Journal of Data and Network Science, 8(1), 179–188. https://doi.org/10.5267/j.ijdns.2023.10.006

Daghistani, T., & Alshammari, R. (2020). Comparison of statistical logistic regression and random forest machine learning techniques in predicting diabetes. Journal of Advances in Information Technology, 11(2), 78–83. https://doi.org/10.12720/jait.11.2.78-83

International Diabetes Federation (2019). IDF diabetes atlas (9th ed.). The Diabetes Atlas. Consultado a 14 de março de 2025. https://diabetesatlas.org/

Khan, Q.W., Iqbal, K., Ahmad, R., Rizwan, A., Khan, A.N., & Kim, D. (2024). An intelligent diabetes classification and perception framework based on ensemble and deep learning method. PeerJ Computer Science, 10:e1914. https://doi.org/10.7717/peerj-cs.1914

Olisah, C.C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 22, 106773. https://doi.org/10.1016/j.cmpb.2022.106773

Srinivasu, P. N., Shafi, J., Krishna, T. B., Sujatha, C. N., Praveen, S. P., & Ijaz, M. F. (2022). Using recurrent neural networks for predicting type-2 diabetes from genomic and tabular data. Diagnostics, 12(12), 3067. https://doi.org/10.3390/diagnostics12123067

Sterlin, E. (2024). Health spending takes up 10% of the global economy: How can tech help reduce costs and improve lives? World Economic Forum. Consultado a 14 de março de 2025. https://www.weforum.org/stories/2024/08/healthcare-costs-digital-tech/

Sujon, K. M., Hassan, R. B., Towshi, Z. T., Othman, M. A., Samad, M. A., & Choi, K. (2024). When to use standardization and normalization: Empirical evidence from machine learning models and XAI. IEEE Access, 12, 135300–135314. https://doi.org/10.1109/ACCESS.2024.3461234

Tanimoto, A., Yamada, S., Takenouchi, T., Sugiyama, M., & Kashima, H. (2022). Improving imbalanced classification using near-miss instances. Expert Systems with Applications, 201, 117130. https://doi.org/10.1016/j.eswa.2022.117130

Teboul, A. Diabetes health indicators dataset. Kaggle. Consultado a 14 de março de 2025. https://encurtador.com.br/dKan

Wee, B.F., Sivakumar, S., Lim, K.H., Wong, W.K., & Juwono, F.H. (2024). Diabetes detection based on machine learning and deep learning approaches. Multimedia Tools and Applications, 83, 24153–24185. https://doi.org/10.1007/s11042-023-16407-5

Downloads

Published

2026-01-16

How to Cite

Lopes, M. A., Lacerda, C., & Fialho, J. (2026). The use of Machine Learning in diabetes prevention. Millenium - Journal of Education, Technologies, and Health, 2(21e), e43168. https://doi.org/10.29352/mill0221e.43168

Issue

Section

Engineering, Technology, Management and Tourism