The use of Machine Learning in diabetes prevention
DOI:
https://doi.org/10.29352/mill0221e.43168Keywords:
diabetes mellitus; machine learning; deep learning; recurrent neural networks; feature selectionAbstract
Introduction: Diabetes Mellitus is one of the fastest-growing chronic diseases globally. Machine Learning (ML) techniques offer significant potential for identifying patterns useful for disease control.
Objective: To analyze the impact of ML techniques and the use of feature selection techniques in predicting diabetes, using the “Diabetes Health Indicators” dataset.
Methods: The CRISP-DM methodology was applied. The data were balanced using the NearMiss subsampling technique. Recursive Feature Elimination (RFE) and Principal Component Analysis (PCA) were used for attribute selection. Six models were tested: Random Forest, Gradient Boosting, KNN, Logistic Regression, Multilayer Perceptron (MLP), and Recurrent Neural Networks (RNN).
Results: Class balancing significantly improved results. The RNN achieved the best performance, with 86.8% accuracy and an F1-score of 0.868. The combination of RFE with MLP also showed strong performance. Feature selection (RFE and PCA) reduced dimensionality without loss of accuracy
Conclusion: ML and DL techniques are promising for prioritizing clinical follow-up and informing public health policies. Enhancing data representativeness, integrating Explainable AI techniques, and adjusting thresholds to reduce false negatives are essential for practical applications.
Downloads
References
Alzyoud, M., Alazaidah, R., Aljaidi, M., Samara, G., Qasem, M. H., Khalid, M., & Al-Shanableh, N. (2024). Diagnosing diabetes mellitus using machine learning techniques. International Journal of Data and Network Science, 8(1), 179–188. https://doi.org/10.5267/j.ijdns.2023.10.006
Daghistani, T., & Alshammari, R. (2020). Comparison of statistical logistic regression and random forest machine learning techniques in predicting diabetes. Journal of Advances in Information Technology, 11(2), 78–83. https://doi.org/10.12720/jait.11.2.78-83
International Diabetes Federation (2019). IDF diabetes atlas (9th ed.). The Diabetes Atlas. Consultado a 14 de março de 2025. https://diabetesatlas.org/
Khan, Q.W., Iqbal, K., Ahmad, R., Rizwan, A., Khan, A.N., & Kim, D. (2024). An intelligent diabetes classification and perception framework based on ensemble and deep learning method. PeerJ Computer Science, 10:e1914. https://doi.org/10.7717/peerj-cs.1914
Olisah, C.C., Smith, L., & Smith, M. (2022). Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Computer Methods and Programs in Biomedicine, 22, 106773. https://doi.org/10.1016/j.cmpb.2022.106773
Srinivasu, P. N., Shafi, J., Krishna, T. B., Sujatha, C. N., Praveen, S. P., & Ijaz, M. F. (2022). Using recurrent neural networks for predicting type-2 diabetes from genomic and tabular data. Diagnostics, 12(12), 3067. https://doi.org/10.3390/diagnostics12123067
Sterlin, E. (2024). Health spending takes up 10% of the global economy: How can tech help reduce costs and improve lives? World Economic Forum. Consultado a 14 de março de 2025. https://www.weforum.org/stories/2024/08/healthcare-costs-digital-tech/
Sujon, K. M., Hassan, R. B., Towshi, Z. T., Othman, M. A., Samad, M. A., & Choi, K. (2024). When to use standardization and normalization: Empirical evidence from machine learning models and XAI. IEEE Access, 12, 135300–135314. https://doi.org/10.1109/ACCESS.2024.3461234
Tanimoto, A., Yamada, S., Takenouchi, T., Sugiyama, M., & Kashima, H. (2022). Improving imbalanced classification using near-miss instances. Expert Systems with Applications, 201, 117130. https://doi.org/10.1016/j.eswa.2022.117130
Teboul, A. Diabetes health indicators dataset. Kaggle. Consultado a 14 de março de 2025. https://encurtador.com.br/dKan
Wee, B.F., Sivakumar, S., Lim, K.H., Wong, W.K., & Juwono, F.H. (2024). Diabetes detection based on machine learning and deep learning approaches. Multimedia Tools and Applications, 83, 24153–24185. https://doi.org/10.1007/s11042-023-16407-5
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Millenium - Journal of Education, Technologies, and Health

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who submit proposals for this journal agree to the following terms:
a) Articles are published under the Licença Creative Commons (CC BY 4.0), in full open-access, without any cost or fees of any kind to the author or the reader;
b) The authors retain copyright and grant the journal right of first publication, allowing the free sharing of work, provided it is correctly attributed the authorship and initial publication in this journal;
c) The authors are permitted to take on additional contracts separately for non-exclusive distribution of the version of the work published in this journal (eg, post it to an institutional repository or as a book), with an acknowledgment of its initial publication in this journal;
d) Authors are permitted and encouraged to publish and distribute their work online (eg, in institutional repositories or on their website) as it can lead to productive exchanges, as well as increase the impact and citation of published work
Documents required for submission
Article template (Editable format)

