Artificial intelligence models for log event analysis
DOI:
https://doi.org/10.29352/mill0220e.41569Keywords:
artificial intelligence; anomaly detection; logs; cybersecurity; machine learningAbstract
Introduction: In cybersecurity, log analysis plays a crucial role by identifying patterns, anomalies, and potentially malicious activities in computer networks, supporting proactive and informed responses.
Objective: To explore and compare the different models of Artificial Intelligence built around detecting anomalies in log events, mainly prioritizing their use in an institution's network.
Methods: This work is characterized as a systematic literature review with a comparative analysis. The analysis was done following a literature review, extended through supervised and unsupervised models of Machine Learning and Deep Learning, to consider several contingencies as their sensitivity to anomaly patterns or use of computational resources.
Results: The review depicted variability within models in their characteristics and applications, highlighting their versatility. This systematic analysis provides a baseline knowledge to guide decision makers in the future regarding obstacles in the analysis of substantial amounts of data.
Conclusion: This research establishes a solid basis for the initial selection of Artificial Intelligence models for log analysis in cybersecurity. The next phase of the investigation will involve the practical implementation of these models, evaluating their performance in an operational environment. This process will allow for the validation of the theoretical choices made and the optimization of their applicability.
Downloads
References
Abdiyeva-Aliyeva, G., Aliyev, J., & Sadigov, U. (2022). Application of classification algorithms of machine learning in cybersecurity. Procedia Computer Science, 215, 909–919. https://doi.org/10.1016/J.PROCS.2022.12.093
Abellán, J., Mantas, C. J., & Castellano, J. G. (2017). A Random Forest approach using imprecise probabilities. Knowledge-Based Systems, 134, 72–84. https://doi.org/10.1016/J.KNOSYS.2017.07.019
Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics, 9(8), 1295. https://doi.org/10.3390/electronics9081295
Al Farizi, W. S., Hidayah, I., & Rizal, M. N. (2021). Isolation forest based anomaly detection: A systematic literature review. 2021 8th International Conference on Information Technology, Computer and Electrical Engineering, ICITACEE 2021, 118–122. https://doi.org/10.1109/ICITACEE53184.2021.9617498
Amaratunga, D., Cabrera, J., & Lee, Y. S. (2008). Enriched random forests. Bioinformatics, 24(18), 2010–2014. https://doi.org/10.1093/BIOINFORMATICS/BTN356
Asperti, A., Evangelista, D., & Loli Piccolomini, E. (2021). A Survey on Variational Autoencoders from a Green AI Perspective. SN Computer Science, 2(4), 1–23. https://doi.org/10.1007/S42979-021-00702-9/FIGURES/20
Astekin, M., Zengin, H., & Sözer, H. (2018). Evaluation of distributed machine learning algorithms for anomaly detection from large-scale system logs: A case study. In Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 (pp.2071–2077). IEEE. https://doi.org/10.1109/BIGDATA.2018.8621967
Aung, Y. Y., & Min, M. M. (2017). An analysis of random forest algorithm based network intrusion detection system. Proceedings - 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2017, (pp.127–132). IEEE. https://doi.org/10.1109/SNPD.2017.8022711
Belcic, I., & Stryker, C. (n.d.). What Is supervised learning? IBM. Retrieved April 4, 2025, from https://www.ibm.com/think/topics/supervised-learning
Bergmann, D., & Stryker, C. (2024). What is a Variational Autoencoder? IBM. https://www.ibm.com/think/topics/variational-autoencoder
Chabchoub, Y., Togbe, M. U., Boly, A., & Chiky, R. (2022). An In-Depth Study and Improvement of Isolation Forest. IEEE Access, 10, 10219–10237. https://doi.org/10.1109/ACCESS.2022.3144425
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, (pp. 1724–1734). https://doi.org/10.3115/v1/d14-1179
Chung, H., & Shin, K. S. (2018). Genetic algorithm-optimized long short-term memory network for Stock Market Prediction. Sustainability, 10(10), 3765. https://doi.org/10.3390/SU10103765
de Amorim, R. C. (2016). A survey on feature weighting based k-means algorithms. Journal of Classification, 33(2), 210–242. https://doi.org/10.1007/S00357-016-9208-4/METRICS
Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the ACM Conference on Computer and Communications Security, (pp. 1285–1298). https://doi.org/10.1145/3133956.3134015
El Mrabet, M. A., El Makkaoui, K., & Faize, A. (2021). Supervised machine learning: A survey. IN 2021 4th International Conference on Advanced Communication Technologies and Networking (CommNet), (pp. 1–10). IEEE. https://doi.org/10.1109/CommNet52204.2021.9641998
Feng, W., Ma, C., Zhao, G., & Zhang, R. (2020). FSRF:An improved random forest for classification. In Proceedings of 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications, AEECA 2020, (pp. 173–178). https://doi.org/10.1109/AEECA49918.2020.9213456
Giradin, L., & Brodbeck, D. (2002). A visual approach for monitoring logs. USENIX Association. https://abrir.link/NglCN
Graves, A. (2012). Long short-term memory. In Supervised sequence labelling with recurrent neural networks (pp. 37-45). Springer. https://doi.org/10.1007/978-3-642-24797-2_4
Guo, H., Yuan, S., & Wu, X. (2021). LogBERT: Log anomaly detection via BERT. In Proceedings of the International Joint Conference on Neural Networks, (pp. 1-8). IEEE. https://doi.org/10.1109/IJCNN52387.2021.9534113
Hashemi-Pour, C., & Lutkevich, B. (n.d.). What is the BERT language model? TechTarget. https://www.techtarget.com/searchenterpriseai/definition/BERT-language-model
Gohiya, H. M., Lohiya H., & Patidar, K. (2018). A survey of XGBoost system. International Journal of Advanced Technology & Engineering Research (IJATER), 8(3).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/NECO.1997.9.8.1735
Holdsworth, J., & Scapicchio, M. (n.d.). What Is Deep Learning? IBM. https://www.ibm.com/think/topics/deep-learning
Kulkarni, O., & Burhanpurwala, A. (2024). A survey of advancements in DBSCAN clustering algorithms for big data. In 2024 3rd International Conference on Power Electronics and IoT Applications in Renewable Energy and Its Control, PARC 2024 (pp. 106–111). IEEE. https://doi.org/10.1109/PARC59193.2024.10486339
Landauer, M., Onder, S., Skopik, F., & Wurzenberger, M. (2023). Deep learning for anomaly detection in log data: A survey. Machine Learning with Applications, 12, 100470. https://doi.org/10.1016/J.MLWA.2023.100470
Li, Y., & Wu, H. (2012). A clustering method based on K-means algorithm. Physics Procedia, 25, 1104–1109. https://doi.org/10.1016/J.PHPRO.2012.03.206
Liu, H., & Lang, B. (2019). Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences 2019, 9(20), 4396. https://doi.org/10.3390/APP9204396
Marinho, T. L. (2021). Otimização de hiperparâmetros do XGBoost utilizando meta-aprendizagem [Dissertação de Mestrado, Universidade Federal de Alagoas]. Repositório Institucional da UFAL. http://www.repositorio.ufal.br/jspui/handle/123456789/9851
Mello, T. R. de. (2021). Comparativo entre redes neurais recorrentes GRU e LSTM para a predição de instrumentos financeiros [Trabalho de Conclusão de Curso].
Na, S., Xumin, L., & Yong, G. (2010). Research on k-means clustering algorithm: An improved k-means clustering algorithm. In 2010 Third International Symposium on Intelligent Information Technology and Security Informatics, (pp. 63–67). IEEE. https://doi.org/10.1109/IITSI.2010.74
Parilama, M., Lopez, D., & Senthilkumar, N. C. (2011). A survey on density based clustering algorithms for mining large spatial databases. International Journal of Advanced Science and Technology, 31. https://abrir.link/PVZdG
Podlodowski, L., & Kozlowski, M. (2019). Application of XGBoost to the cyber-security problem of detecting suspicious network traffic events. In Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, (pp. 5902–5907). https://doi.org/10.1109/BIGDATA47090.2019.9006586
Resende, P. A. A., & Drummond, A. C. (2018). A survey of random forest based methods for intrusion detection systems. ACM Computing Surveys (CSUR), 51(3). https://doi.org/10.1145/3178582
Schölkopf, Bernhard., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press.
Singh, H. V., Girdhar, A., & Dahiya, S. (2022). A Literature survey based on DBSCAN algorithms. In Proceedings - 2022 6th International Conference on Intelligent Computing and Control Systems, ICICCS 2022, (pp. 751–758). IEEE. https://doi.org/10.1109/ICICCS53718.2022.9788440
Singh, P., & Meshram, P. A. (2018). Survey of density based clustering algorithms and its variants. In Proceedings of the International Conference on Inventive Computing and Informatics, ICICI 2017, (pp. 920–926). IEEE. https://doi.org/10.1109/ICICI.2017.8365272
Somvanshi, M., Chavan, P., Tambade, S., & Shinde, S. V. (2017). A review of machine learning techniques using decision tree and support vector machine. In Proceedings - 2nd International Conference on Computing, Communication, Control and Automation, ICCUBEA 2016. IEEE. https://doi.org/10.1109/ICCUBEA.2016.7860040
Masolo, C. (2017). Supervised, unsupervised and deep learning. TDS Archive, Medium. https://medium.com/data-science/supervised-unsupervised-and-deep-learning-aa61a0e5471c
Van Houdt, G., Mosquera, C., & Nápoles, G. (2020). A review on the long short-term memory model. Artificial Intelligence Review, 53(8), 5929–5955. https://doi.org/10.1007/S10462-020-09838-1/TABLES/1
Variational AutoEncoders. (n.d). GeeksforGeeks. https://www.geeksforgeeks.org/variational-autoencoders/
Wang, H., Li, J., & Li, Z. (2024). AI-generated text detection and classification based on BERT deep learning algorithm. arXiv. https://arxiv.org/abs/2405.16422v1
Pradhan, A. (2012). Support Vector Machine -A Survey. International Journal of Emerging Technology and Advanced Engineering, 2(8).
Xu, D., Wang, Y., Meng, Y., & Zhang, Z. (2018). An improved data anomaly detection method based on isolation forest. In Proceedings - 2017 10th International Symposium on Computational Intelligence and Design, ISCID 2017 (pp. 287–291). IEEE. https://doi.org/10.1109/ISCID.2017.202
Xu, H., Pang, G., Wang, Y., & Wang, Y. (2023). Deep Isolation Forest for anomaly detection. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12591–12604. https://doi.org/10.1109/TKDE.2023.3270293
Yen, S., & Moh, M. (2019). Intelligent log analysis using machine and deep learning. In M.A. Ferrag & A.Ahmim (Eds.), Machine learning and cognitive science applications in cyber security (pp. 154-189). IGI Global. https://doi.org/10.4018/978-1-5225-8100-0.CH007
Zemouri, R., Levesque, M., Boucher, E., Kirouac, M., Lafleur, F., Bernier, S., & Merkhouf, A. (2022). Recent research and applications in variational autoencoders for industrial prognosis and health management: A Survey. In Proceedings - 2022 Prognostics and Health Management Conference, PHM-London 2022, (pp. 193–203). IEEE. https://doi.org/10.1109/PHM2022-LONDON52454.2022.00042
Zhang, Y., Lin, J., Zhao, L., Zeng, X., & Liu, X. (2021). A novel antibacterial peptide recognition algorithm based on BERT. Briefings in Bioinformatics, 22(6), 1–11. https://doi.org/10.1093/BIB/BBAB200
Zhao, W.-L., Deng, C.-H., & Ngo, C.-W. (2018). K-means: A revisit. Neurocomputing, 291, 195–206. https://doi.org/10.1016/j.neucom.2018.02.072
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Millenium - Journal of Education, Technologies, and Health

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who submit proposals for this journal agree to the following terms:
a) Articles are published under the Licença Creative Commons (CC BY 4.0), in full open-access, without any cost or fees of any kind to the author or the reader;
b) The authors retain copyright and grant the journal right of first publication, allowing the free sharing of work, provided it is correctly attributed the authorship and initial publication in this journal;
c) The authors are permitted to take on additional contracts separately for non-exclusive distribution of the version of the work published in this journal (eg, post it to an institutional repository or as a book), with an acknowledgment of its initial publication in this journal;
d) Authors are permitted and encouraged to publish and distribute their work online (eg, in institutional repositories or on their website) as it can lead to productive exchanges, as well as increase the impact and citation of published work
Documents required for submission
Article template (Editable format)

