Shopping intention prediction using decision trees

  • Dario Šebalj
  • Jelena Franjković
  • Kristina Hodak
Keywords: Shopping intention, Price image, Retailer’s image, Classification algorithms, Machine learning


Introduction: The price is considered to be neglected marketing mix element due to the complexity of price management and sensitivity of customers on price changes. It pulls the fastest customer reactions to that change. Accordingly, the process of making shopping decisions can be very challenging for customer.

Objective: The aim of this paper is to create a model that is able to predict shopping intention and classify respondents into one of the two categories, depending on whether they intend to shop or not.

Methods: Data sample consists of 305 respondents, who are persons older than 18 years involved in buying groceries for their household. The research was conducted in February 2017. In order to create a model, the decision trees method was used with its several classification algorithms.

Results: All models, except the one that used RandomTree algorithm, achieved relatively high classification rate (over the 80%). The highest classification accuracy of 84.75% gave J48 and RandomForest algorithms. Since there is no statistically significant difference between those two algorithms, authors decided to choose J48 algorithm and build a decision tree.

Conclusions: The value for money and price level in the store were the most significant variables for classification of shopping intention. Future study plans to compare this model with some other data mining techniques, such as neural networks or support vector machines since these techniques achieved very good accuracy in some previous research in this field.


Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., & Scuse, D. (2016). WEKA Manual for Version 3-8-0.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. doi: 10.1023/A:1010933404324

Crone, S. F., & Soopramanien, D. (2005). Predicting customer online shopping adoption-an evaluation of data mining and market modelling approaches. DMIN. pp. 215-221.

Franjković, J. (2017). Prices and price image of retailer. Thesis of the postgraduate specialist study. Osijek: Faculty of Economics in Osijek.

Ganchev, T., Zervas, P., Fakotakis, N., & Kokkinakis, G. (2006). Benchmarking Feature Selection Techniques on the Speaker Verification Task. Fifth International Symposium on Communication Systems, Networks And Digital Signal Processing, pp. 314-318.

Hall, M. A., & Holmes, G. (2003). Benchmarking Attribute Selection Techniques for Discrete Class Data Mining. IEEE Transactions on Knowledge and Data Engineering, 15(3), 1437-1447. doi: 10.1109/TKDE.2003.1245283

Hssina, B., Merbouha, A., Ezzikouri, H., & Erritali, M. (2014). A comparative study of decision tree ID3 and C4.5. International Journal of Advanced Computer Science and Applications, 4(2), 13-19.doi: 10.14569/SpecialIssue.2014.040203

Kalmegh, S. (2015). Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News. IJISET - International Journal of Innovative Science, Engineering & Technology, 2(2), 438-446. Retrieved from

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22. Retrieved from

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.

Oreški, D. (2014). Evaluation of contrast mining techniques for feature selection in classification. doctoral thesis, Varaždin: Faculty of Organization and Informatics.

Ozer, P. (2008). Data Mining Algorithms for Classification. BSc Thesis Artificial Intelligence, Radboud University Nijmegen.

Quinlan, R. J. (1987). Generating Production Rules from Decision Trees. IJCAI, 87, 304-307. Retrieved from

Quinlan, R. J. (1996). Improved Use of Continuous Attributes in C4.5. Journal of Arti, 4, 77-90. doi: 10.1613/jair.279

Rokach, L., & Maimon, O. (2014). Data Mining with Decision Trees: Theory and Applications. World scientific.

Shi, F., & Ghedira, C. (2016). Intention-based Online Consumer Classification for Recommendation and Personalization. In Hot

Topics in Web Systems and Technologies (HotWeb), 2016 Fourth IEEE Workshop on, pp. 36-41.

Suchacka, G., & Stemplewski, S. (2017). Application of Neural Network to Predict Purchases in Online Store: Information Systems Architecture and Technology: Proceedings of 37th International Conference on Information Systems Architecture and Technology–ISAT 2016–Part IV, Springer International Publishing, pp. 221-231.

Vandamme, J.-P., Meskens, N., & Superby, J.-F. (2007). Predicting Academic Performance by Data Mining Methods. Education Economics, 15(4), 405-419. doi: 10.1080/09645290701409939

Vieira, A. (2015). Predicting online user behaviour using deep learning algorithms. arXiv preprint arXiv:1511.06247.

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Burlington: Morgan Kaufmann Publishers.

Yulihasri, Islam, A., & Daud, K. A. K. (2011). Factors that Influence Customers’ Buying Intention on Shopping Online. International Journal of Marketing Studies. 3(1), 128-139. doi

Zhang, M., Chen, G., & Wei, Q. (2015). Discovering Consumers' Purchase Intentions Based on Mobile Search Behaviors. In Andreasen T. et al. (Eds), Flexible Query Answering Systems 2015: Vol. 400. Advances in Intelligent Systems and

Computing (pp. 15-28). doi: 10.1007/978-3-319-26154-6_2

Zielke, S. (2010). How price image dimensions influence shopping intentions for different store formats. European Journal of Marketing, 44(6), 748-770. Doi: 10.1108/03090561011032702

Zielke, S. (2006). Measurement of retailers’ price images with a multiple-item scale. Journal of Retailing and Consumer Services.15(5), 335-347.

Zuo, Y., & Yada, K. (2014). Using bayesian network for purchase behavior prediction from RFID data. In 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), IEEE, pp. 2262-2267. doi: 10.1109/SMC.2014.6974262

Engineering, Technology, Management and Tourism