Artificial intelligence models for log event analysis

Authors

DOI:

https://doi.org/10.29352/mill0220e.41569

Keywords:

artificial intelligence; anomaly detection; logs; cybersecurity; machine learning

Abstract

Introduction: In cybersecurity, log analysis plays a crucial role by identifying patterns, anomalies, and potentially malicious activities in computer networks, supporting proactive and informed responses.

Objective: To explore and compare the different models of Artificial Intelligence built around detecting anomalies in log events, mainly prioritizing their use in an institution's network.

Methods: This work is characterized as a systematic literature review with a comparative analysis. The analysis was done following a literature review, extended through supervised and unsupervised models of Machine Learning and Deep Learning, to consider several contingencies as their sensitivity to anomaly patterns or use of computational resources.

Results: The review depicted variability within models in their characteristics and applications, highlighting their versatility. This systematic analysis provides a baseline knowledge to guide decision makers in the future regarding obstacles in the analysis of substantial amounts of data.

Conclusion: This research establishes a solid basis for the initial selection of Artificial Intelligence models for log analysis in cybersecurity. The next phase of the investigation will involve the practical implementation of these models, evaluating their performance in an operational environment. This process will allow for the validation of the theoretical choices made and the optimization of their applicability.

Downloads

Download data is not yet available.

References

Abdiyeva-Aliyeva, G., Aliyev, J., & Sadigov, U. (2022). Application of classification algorithms of machine learning in cybersecurity. Procedia Computer Science, 215, 909–919. https://doi.org/10.1016/J.PROCS.2022.12.093

Abellán, J., Mantas, C. J., & Castellano, J. G. (2017). A Random Forest approach using imprecise probabilities. Knowledge-Based Systems, 134, 72–84. https://doi.org/10.1016/J.KNOSYS.2017.07.019

Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics, 9(8), 1295. https://doi.org/10.3390/electronics9081295

Al Farizi, W. S., Hidayah, I., & Rizal, M. N. (2021). Isolation forest based anomaly detection: A systematic literature review. 2021 8th International Conference on Information Technology, Computer and Electrical Engineering, ICITACEE 2021, 118–122. https://doi.org/10.1109/ICITACEE53184.2021.9617498

Amaratunga, D., Cabrera, J., & Lee, Y. S. (2008). Enriched random forests. Bioinformatics, 24(18), 2010–2014. https://doi.org/10.1093/BIOINFORMATICS/BTN356

Asperti, A., Evangelista, D., & Loli Piccolomini, E. (2021). A Survey on Variational Autoencoders from a Green AI Perspective. SN Computer Science, 2(4), 1–23. https://doi.org/10.1007/S42979-021-00702-9/FIGURES/20

Astekin, M., Zengin, H., & Sözer, H. (2018). Evaluation of distributed machine learning algorithms for anomaly detection from large-scale system logs: A case study. In Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 (pp.2071–2077). IEEE. https://doi.org/10.1109/BIGDATA.2018.8621967

Aung, Y. Y., & Min, M. M. (2017). An analysis of random forest algorithm based network intrusion detection system. Proceedings - 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2017, (pp.127–132). IEEE. https://doi.org/10.1109/SNPD.2017.8022711

Belcic, I., & Stryker, C. (n.d.). What Is supervised learning? IBM. Retrieved April 4, 2025, from https://www.ibm.com/think/topics/supervised-learning

Bergmann, D., & Stryker, C. (2024). What is a Variational Autoencoder? IBM. https://www.ibm.com/think/topics/variational-autoencoder

Chabchoub, Y., Togbe, M. U., Boly, A., & Chiky, R. (2022). An In-Depth Study and Improvement of Isolation Forest. IEEE Access, 10, 10219–10237. https://doi.org/10.1109/ACCESS.2022.3144425

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, (pp. 1724–1734). https://doi.org/10.3115/v1/d14-1179

Chung, H., & Shin, K. S. (2018). Genetic algorithm-optimized long short-term memory network for Stock Market Prediction. Sustainability, 10(10), 3765. https://doi.org/10.3390/SU10103765

de Amorim, R. C. (2016). A survey on feature weighting based k-means algorithms. Journal of Classification, 33(2), 210–242. https://doi.org/10.1007/S00357-016-9208-4/METRICS

Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the ACM Conference on Computer and Communications Security, (pp. 1285–1298). https://doi.org/10.1145/3133956.3134015

El Mrabet, M. A., El Makkaoui, K., & Faize, A. (2021). Supervised machine learning: A survey. IN 2021 4th International Conference on Advanced Communication Technologies and Networking (CommNet), (pp. 1–10). IEEE. https://doi.org/10.1109/CommNet52204.2021.9641998

Feng, W., Ma, C., Zhao, G., & Zhang, R. (2020). FSRF:An improved random forest for classification. In Proceedings of 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications, AEECA 2020, (pp. 173–178). https://doi.org/10.1109/AEECA49918.2020.9213456

Giradin, L., & Brodbeck, D. (2002). A visual approach for monitoring logs. USENIX Association. https://abrir.link/NglCN

Graves, A. (2012). Long short-term memory. In Supervised sequence labelling with recurrent neural networks (pp. 37-45). Springer. https://doi.org/10.1007/978-3-642-24797-2_4

Guo, H., Yuan, S., & Wu, X. (2021). LogBERT: Log anomaly detection via BERT. In Proceedings of the International Joint Conference on Neural Networks, (pp. 1-8). IEEE. https://doi.org/10.1109/IJCNN52387.2021.9534113

Hashemi-Pour, C., & Lutkevich, B. (n.d.). What is the BERT language model? TechTarget. https://www.techtarget.com/searchenterpriseai/definition/BERT-language-model

Gohiya, H. M., Lohiya H., & Patidar, K. (2018). A survey of XGBoost system. International Journal of Advanced Technology & Engineering Research (IJATER), 8(3).

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/NECO.1997.9.8.1735

Holdsworth, J., & Scapicchio, M. (n.d.). What Is Deep Learning? IBM. https://www.ibm.com/think/topics/deep-learning

Kulkarni, O., & Burhanpurwala, A. (2024). A survey of advancements in DBSCAN clustering algorithms for big data. In 2024 3rd International Conference on Power Electronics and IoT Applications in Renewable Energy and Its Control, PARC 2024 (pp. 106–111). IEEE. https://doi.org/10.1109/PARC59193.2024.10486339

Landauer, M., Onder, S., Skopik, F., & Wurzenberger, M. (2023). Deep learning for anomaly detection in log data: A survey. Machine Learning with Applications, 12, 100470. https://doi.org/10.1016/J.MLWA.2023.100470

Li, Y., & Wu, H. (2012). A clustering method based on K-means algorithm. Physics Procedia, 25, 1104–1109. https://doi.org/10.1016/J.PHPRO.2012.03.206

Liu, H., & Lang, B. (2019). Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences 2019, 9(20), 4396. https://doi.org/10.3390/APP9204396

Marinho, T. L. (2021). Otimização de hiperparâmetros do XGBoost utilizando meta-aprendizagem [Dissertação de Mestrado, Universidade Federal de Alagoas]. Repositório Institucional da UFAL. http://www.repositorio.ufal.br/jspui/handle/123456789/9851

Mello, T. R. de. (2021). Comparativo entre redes neurais recorrentes GRU e LSTM para a predição de instrumentos financeiros [Trabalho de Conclusão de Curso].

Na, S., Xumin, L., & Yong, G. (2010). Research on k-means clustering algorithm: An improved k-means clustering algorithm. In 2010 Third International Symposium on Intelligent Information Technology and Security Informatics, (pp. 63–67). IEEE. https://doi.org/10.1109/IITSI.2010.74

Parilama, M., Lopez, D., & Senthilkumar, N. C. (2011). A survey on density based clustering algorithms for mining large spatial databases. International Journal of Advanced Science and Technology, 31. https://abrir.link/PVZdG

Podlodowski, L., & Kozlowski, M. (2019). Application of XGBoost to the cyber-security problem of detecting suspicious network traffic events. In Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, (pp. 5902–5907). https://doi.org/10.1109/BIGDATA47090.2019.9006586

Resende, P. A. A., & Drummond, A. C. (2018). A survey of random forest based methods for intrusion detection systems. ACM Computing Surveys (CSUR), 51(3). https://doi.org/10.1145/3178582

Schölkopf, Bernhard., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press.

Singh, H. V., Girdhar, A., & Dahiya, S. (2022). A Literature survey based on DBSCAN algorithms. In Proceedings - 2022 6th International Conference on Intelligent Computing and Control Systems, ICICCS 2022, (pp. 751–758). IEEE. https://doi.org/10.1109/ICICCS53718.2022.9788440

Singh, P., & Meshram, P. A. (2018). Survey of density based clustering algorithms and its variants. In Proceedings of the International Conference on Inventive Computing and Informatics, ICICI 2017, (pp. 920–926). IEEE. https://doi.org/10.1109/ICICI.2017.8365272

Somvanshi, M., Chavan, P., Tambade, S., & Shinde, S. V. (2017). A review of machine learning techniques using decision tree and support vector machine. In Proceedings - 2nd International Conference on Computing, Communication, Control and Automation, ICCUBEA 2016. IEEE. https://doi.org/10.1109/ICCUBEA.2016.7860040

Masolo, C. (2017). Supervised, unsupervised and deep learning. TDS Archive, Medium. https://medium.com/data-science/supervised-unsupervised-and-deep-learning-aa61a0e5471c

Van Houdt, G., Mosquera, C., & Nápoles, G. (2020). A review on the long short-term memory model. Artificial Intelligence Review, 53(8), 5929–5955. https://doi.org/10.1007/S10462-020-09838-1/TABLES/1

Variational AutoEncoders. (n.d). GeeksforGeeks. https://www.geeksforgeeks.org/variational-autoencoders/

Wang, H., Li, J., & Li, Z. (2024). AI-generated text detection and classification based on BERT deep learning algorithm. arXiv. https://arxiv.org/abs/2405.16422v1

Pradhan, A. (2012). Support Vector Machine -A Survey. International Journal of Emerging Technology and Advanced Engineering, 2(8).

Xu, D., Wang, Y., Meng, Y., & Zhang, Z. (2018). An improved data anomaly detection method based on isolation forest. In Proceedings - 2017 10th International Symposium on Computational Intelligence and Design, ISCID 2017 (pp. 287–291). IEEE. https://doi.org/10.1109/ISCID.2017.202

Xu, H., Pang, G., Wang, Y., & Wang, Y. (2023). Deep Isolation Forest for anomaly detection. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12591–12604. https://doi.org/10.1109/TKDE.2023.3270293

Yen, S., & Moh, M. (2019). Intelligent log analysis using machine and deep learning. In M.A. Ferrag & A.Ahmim (Eds.), Machine learning and cognitive science applications in cyber security (pp. 154-189). IGI Global. https://doi.org/10.4018/978-1-5225-8100-0.CH007

Zemouri, R., Levesque, M., Boucher, E., Kirouac, M., Lafleur, F., Bernier, S., & Merkhouf, A. (2022). Recent research and applications in variational autoencoders for industrial prognosis and health management: A Survey. In Proceedings - 2022 Prognostics and Health Management Conference, PHM-London 2022, (pp. 193–203). IEEE. https://doi.org/10.1109/PHM2022-LONDON52454.2022.00042

Zhang, Y., Lin, J., Zhao, L., Zeng, X., & Liu, X. (2021). A novel antibacterial peptide recognition algorithm based on BERT. Briefings in Bioinformatics, 22(6), 1–11. https://doi.org/10.1093/BIB/BBAB200

Zhao, W.-L., Deng, C.-H., & Ngo, C.-W. (2018). K-means: A revisit. Neurocomputing, 291, 195–206. https://doi.org/10.1016/j.neucom.2018.02.072

Published

2025-10-16

How to Cite

Castro, P., Santos, F., & Lopes, P. (2025). Artificial intelligence models for log event analysis . Millenium - Journal of Education, Technologies, and Health, 2(20e), e41569. https://doi.org/10.29352/mill0220e.41569

Issue

Section

Engineering, Technology, Management and Tourism