A methodological proposal to address the academic dropout phenomenon based on an intelligent prediction model: a case study
DOI:
https://doi.org/10.29352/mill0223.31378Keywords:
case study; dynamic modeling; educational data mining; metaheuristicsAbstract
Introduction: University dropout is now considered a complex phenomenon that goes beyond the number of students not enrolled and that is continuously growing, especially in the first years of study.
Objective: In the present study, a prediction model combining Survival Analysis, Decision Trees, and Random Forest, under the Machine Learning philosophy, is proposed for the early diagnosis of possible factors causing dropout in university students.
Methods: The proposal consists of 3 phases: the Survival Analysis that allows estimating the probability of permanence of the student (survival). Phase 2 starts from the probability value obtained in the previous phase and uses it as a response variable in the modeling process based on Decision Trees to establish survival patterns around the variables considered. Finally, in phase 3, the critical variables in the model are identified using the Random Forest.
Results: The proposed methodology allowed the design of a prediction model to identify the main segmentation variables in behavior patterns of possible cases of academic dropout.
Conclusion: Even though the proposal was developed considering a particular case of a Chilean university, the efficient combination of metaheuristics allows the extrapolation of the methodology to any context and academic reality. However, the conditions and needs of each institution must be considered.
Downloads
References
Acevedo, F. (2021), “Concepts and measurement of dropout in higher education: A critical perspective from Latin America,” Issues in Educational Research, 31, 661–678. https://www.iier.org.au/iier31/acevedo.pdf.
Agrusti, F., Bonavolontà, G., and Mezzini, M. (2019), “University dropout prediction through educational data mining techniques: A systematic review,” Journal of E-Learning and Knowledge Society, 15, 161–182. https://doi.org/10.20368/1971-8829/1135017.
Bramer, M. (2016), Principles of Data Mining, London: Springer. https://doi.org/10.1007/978-1-4471-7307-6_1.
Breiman, L. (2001), “Random Forests,” Machine Learning, 45, 5–32. https://doi.org/10.1007/978-3-030-62008-0_35.
Breiman, L., Friedman, J., Olsen, R., and Stone, C. (1984), Classification and Regression Trees, Encyclopedia of Data Warehousing and Mining, Monterey, California, U.S.A: Wadsworth, Inc. https://doi.org/10.4018/9781591405573.ch027.
Dekker, G. W., Pechenizkiy, M., and Vleeshouwers, J. M. (2009), “Predicting students drop out: A case study,” in EDM’09 - Educational Data Mining 2009: 2nd International Conference on Educational Data Mining, eds. T. Barnes, M. Desmarais, C. Romero, and S. Ventura, Córdoba, Spain, pp. 41–50. https://files.eric.ed.gov/fulltext/ED539041.pdf
Feng, G., Fan, M., and Chen, Y. (2022), “Analysis and Prediction of Students’ Academic Performance Based on Educational Data Mining,” IEEE Access, IEEE, 10, 19558–19571. https://doi.org/10.1109/ACCESS.2022.3151652.
González, J., Galvis, D., and Hurtado, L. (2014), “La distribución Beta Generalizada como un modelo de sobrevivencia para analizar la evasión universitaria,” Estudios pedagógicos, 40, 133–144. https://doi.org/10.4067/s0718-07052014000100008.
Hastie, T., Tibshirani, R., and Friedman, J. (2009), The elements of Statistical learning: data mining, inference, and prediction, Springer.
Iam-On, N., and Boongoen, T. (2017), “Improved student dropout prediction in Thai University using an ensemble of mixed-type data clusterings,” International Journal of Machine Learning and Cybernetics, Springer Berlin Heidelberg, 8, 497–510. https://doi.org/10.1007/s13042-015-0341-x.
Kleinbaum, D. G., and Klein, M. (2012), Statistics for Biology and Health, Survival Analysis: a self-learning text, Springer.
Kubat, M. (2017), An Introduction to Machine Learning, Cham, Switzerland: Springer International Publishing. https://doi.org/10.1002/9781119720492.ch7.
Miranda, M. A., and Guzmán, J. (2017), “Análisis de la deserción de estudiantes universitarios usando técnicas de minería de datos,” Formacion Universitaria, 10, 61–68. https://doi.org/10.4067/S0718-50062017000300007.
Munizaga, F., Cifuentes, M., and Beltrán, A. (2018), “Retención y abandono estudiantil en la Educación Superior Universitaria en América Latina y el Caribe: una revisión sistemática,” Archivos Analíticos de Políticas Educativas, 26, 1–31. https://doi.org/10.14507/epaa.26.3348.
Pérez-Gutiérrez, B. R. (2020), “Comparación de técnicas de minería de datos para identificar indicios de deserción estudiantil, a partir del desempeño académico,” Revista UIS Ingenierías, 19, 193–204. https://doi.org/10.18273/revuin.v19n1-2020018.
R Core Team (2018), R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org/
Segal, M. R. (2004), Machine Learning Benchmarks and Random Forest Regression, UCSF: Center for Bioinformatics and Molecular Biostatistics. https://escholarship.org/uc/item/35x3v9t4
Siroky, D. S. (2009), “Navigating random forests and related advances in algorithmic modeling,” Statistics Surveys, 3, 147–163. https://doi.org/10.1214/07-SS033.
Torrado Fonseca, M., and Figuera Gazo, P. (2019), “Estudio longitudinal del proceso de abandono y reingreso de estudiantes de Ciencias Sociales. El caso de Administración y Dirección de Empresas,” Educar, 55, 401–417. https://doi.org/10.5565/rev/educar.1022.
Villa-Murillo, A. (2012), “Optimización del diseño de parametros metodos Forest-Genetic,” Universitat Politecnica de Valencia. https://dialnet.unirioja.es/servlet/tesis?codigo=25802
Yamao, E., Saavedra, L. C., Campos Pérez, R., De Jesús, V., and Hurtado, H. (2018), “Prediction of academic performance using data mining in first year students of peruvian university,” Revista Campus, 23, 151–160.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Millenium - Journal of Education, Technologies, and Health
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who submit proposals for this journal agree to the following terms:
a) Articles are published under the Licença Creative Commons (CC BY 4.0), in full open-access, without any cost or fees of any kind to the author or the reader;
b) The authors retain copyright and grant the journal right of first publication, allowing the free sharing of work, provided it is correctly attributed the authorship and initial publication in this journal;
c) The authors are permitted to take on additional contracts separately for non-exclusive distribution of the version of the work published in this journal (eg, post it to an institutional repository or as a book), with an acknowledgment of its initial publication in this journal;
d) Authors are permitted and encouraged to publish and distribute their work online (eg, in institutional repositories or on their website) as it can lead to productive exchanges, as well as increase the impact and citation of published work
Documents required for submission
Article template (Editable format)