Financial distress prediction in Slovakia: An application of the CART algorithm

The topic of predicting financial distress situation has been of interest to many economists and scientists from around the world for several years. As it turned out in practice, the application of existing prediction models to predict the financial difficulties of Slovak companies brings lower prediction accuracy, as these models were created in the conditions of another country. Therefore, the main aim of the article is to create the model for the prediction of the financial distress of the Slovak companies, based on the real conditions of Slovak economics. For this analysis, a dataset of the most important financial ratios that may affect the financial health of the Slovak companies was obtained from the Amadeus database, containing the data on more than 100,000 real companies operating in the Slovak economy in the period 2016 to 2018. For the creation of the models for the prediction of the financial distress of companies one year and Received: July, 2020 1st Revision: December, 2020 Accepted: March, 2021 DOI: 10.14254/20718330.2021/14-1/14 Journal of International Studies S ci en ti fi c P a pe rs © Foundation of International Studies, 2021 © CSR, 2021 Journal of International Studies Vol.14, No.1, 2021 202 two years in advance, the CART algorithm generating the binomial decision tree was used. The developed models achieve an overall accuracy of 87.3% and 91.9% and are very simple to the real application. The results from this prediction are important not only for companies themselves but also for all their stakeholders, as they could help the company to mitigate or eliminate the threat of financial distress and the other corporate risks related to such a situation in the company.


INTRODUCTION
During the last decades, the issue of the prediction of financial distress of the companies has been a very interesting topic, as it is very important mainly for the managers of the companies, but also for the employees, credit and business partners of the company, but overall for all other interested stakeholders. A functional and reliable prediction model can be a supporting tool for determining the financial situation of the company, which can subsequently lead to the introduction of the necessary measures to mitigate or eliminate the occurrence and threat of financial difficulties or even bankruptcy of the company. (Kliestikova et al., 2017). Investors' investments in the company will then be safer, or they can be adjusted by investors to reduce the risk of their investments. Ultimately, these operations will be for the company beneficial. (Kral et al., 2018) Methodological approaches usually used to create financial distress prediction models include the group of statistical methods and the group of data mining methods. Multivariate discriminant analysis (MDA), binary logit or probit are the typical representatives in the first group. The pioneers in using these methods for the creation of financial distress prediction models were Altman and Ohlson . In 1968, Altman developed a well-known Z-score model based on MDA, and then in 1980, Ohlson published the first logit bankruptcy model. These methods are fully parametric . The group of data mining methods consists of many various methods: artificial neural networks (ANN), genetic algorithms (GE), decision trees (DT), etc. These methods are fully non-parametric and there are no assumptions about the distribution of the variables and statistical relations among them (Paliwal & Kumar, 2009).
At present, we can say that both groups of methods are used in relation to prediction models of financial difficulties. At the beginning of this issue development there was a boom in using mainly the discriminant analysis and logistic regression, while in recent years, many methods of data mining and artificial intelligence have come to the fore, including also the CART method generating binomial decision trees. Of course, all these methods have their supporters as well as opponents, and there are also certain advantages and disadvantages associated with each of these methods (Gregova et al., 2020).
The CART algorithm can handle any type of input variables without some statistical assumptions about their probability distribution. The variables may even be collinear, and this technique has no problems with outliers or missing values. Moreover, CART generates a simple decision tree that can easily be programmed, and the classification of new cases is then very fast (Affes & Hentati-Kaffel, 2018).

LITERATURE REVIEW
Financial distress is a notion covering more terms used in the case that company is confronted with financial difficulties. For the situations resulting in the problems of continuity of the company operations, terms such as bankruptcy, failure, insolvency, and default are usually used. (Gandolfi et al., 2018) According to the Altman definition of failure (1993), failure is a situation where "the realized rate of return on invested capital, with allowances for risk consideration, is significantly and continually lower than prevailing rates of similar investments".
The models for the prediction of financial distress used to be created by various statistical techniques. Before the 1980s, the most widely used method in this field was the MDA. This method is sometimes criticised for its relatively strict assumptions and data requirements (Balcaen & Ooghe, 2006) because, in real-world applications, these assumptions are usually hard to satisfy (Karas & Srbova, 2019). The second approach widely used in the field of financial distress prediction is logistic regression, firstly used by Ohlson (1980). This method contains the assumptions of homogeneity of variances and independence of the predictors. Gradually, some other techniques were applied in the field of prediction of bankruptcy, for example, probit analysis firstly used by Zmijewski (1984) and later for example by Kasgari et al. (2012).
In the group of data mining methods, support vector machines (SVM) and decision trees (especially CART and C4.5 trees) are the most commonly used techniques for identifying financial distress (Alaminos et al., 2016;Zhao et al., 2016;Liu & Wu, 2017;Tsai et al., 2014;Zieba et al., 2016). The models based on these techniques are commonly used in a practice. This is because these models are relatively easy to understand and to apply. Regarding the differences between these two methods, CART generates binary decision trees, while C4.5 does not. To select the best splitting criterion, CART employs the Gini index, while the information gain is used in the C4.5 algorithm. Another difference is in pruning. A cost-complexity pruning is used in the CART algorithm, whereas C4.5 employs a single-pass pruning algorithm.
In the field of identification of financial difficulties, research brings using various AI-based methods. Researchers use various machine learning techniques to solve this problem, but ANN is the most used one (Georgescu, 2017;Jones et al., 2016;Kumar & Ravi, 2007). Among the alternative techniques, the casebased reasoning approach was used by Li and Sun (2009) and genetic algorithms were used by Shin and Lee, 2002. In 2012, du Jardin and Severin used self-organising maps (or Kohonen maps), while simulation analysis was applied by Cohen et al. (2012). In addition to classical methods, Virag and Nyitrai (2014) used rough sets of financial distress prediction.
CART algorithm was introduced by Breiman et al. (1984). Right in the year 1985, Frydman et al. were the first to use the technique of DT in the field of the prediction of the failure of the companies. They found their DT achieved a better prediction accuracy of the failure of the companies compared with the MDA or Logit. Among the models created in recent years, we can mention several studies dealing with financial distress prediction. Several decision trees generating algorithms (CART, C5.0, and CHAID) together with the logistic regression (LR) approach were used to model the financial health of Taiwanese listed companies (Chen, 2011). In 2015, Irimia-Dieguez et al. published their study in which selected prediction models such as CART and Logit-based were compared.
Also, several studies discussing the creation of prediction models for some selected specific groups of companies were published. In Slovakia, some national models have also been introduced in the last two decades. For agricultural companies, Gurcik (2002) and Chrastinova (1998) created a model using the classical MDA approach. These models are still used, not only for the agricultural sector. The first Slovak Logit models were published by Hurtosova (2009) and Gulka (2016). In 2017, Kovacova and Kliestik introduced the Logit model and the Probit model and compared the classification accuracy of these models. Gavurova et al. (2017) used the DT technique to develop a new Slovak national model. Karas & Reznakova (2017) focused their study on the development of the CART-based model for construction companies operating in the Slovak economy. In the study of Mihalovic (2016), the author developed two national models created by the MDA and Logit approach. Using the MDA, Ferancova & Sabolova (2015) created a prediction model for the automotive industry companies. None of these prediction models used such a large database of companies as was used in this study. Moreover, a significant part of the mentioned models for Slovak companies was created for specific economic sectors. Many studies have shown that the developed prediction models have a much lower classification capability when used in different economic conditions or at different times than where or when they were developed.
According to this, our study is focused on the creation of a new Slovak national model using a real sample of Slovak companies and their financial data from the period of the years 2016 -2018. (Kovacova et al., 2019) We apply a CART algorithm to create this model. The originality of our models lies in the use of real data on more than 100,000 Slovak companies. Therefore, the models created in this study reflect the real situation of the Slovak economy. Therefore, our prediction model could be an effective tool for business managers. In this article, we present two models. The first is designed to predict financial distress over a one-year horizon, and the second over a two-year horizon. Both models achieve pretty high prediction ability. We see its novelty in the use of such a large data sample of real Slovak companies, which come directly from their financial statements, and in the application of the data mining CART method, which is one of the modern, currently more used methods achieving very good results.
The rest of the paper is structured as follows. The Methodology describes the data used in this study and the base of the CART algorithm. Section 4 reports the results of estimated models, compares and discusses the results, lists the strengths and weaknesses of this research, and suggests possible future directions for the study. Finally, Section 6 concludes.

3.METHODOLOGY
This article presents a CART-based financial distress prediction model created for Slovak companies. As published studies have shown (Karas & Reznakova, 2017;Kumar & Ravi, 2007), the decision tree generating the CART algorithm is often and successfully used in this field. Also, this approach solves some of the problems associated with the use of the classic MDA and Logit methods, which have relatively strong preconditions for their use. CART can be used to model complex relationships between predictors regardless of their probability distribution. This nonparametric algorithm results in a list of simple if-then rules. These rules can be very clearly illustrated in the form of a dendrogram and, in particular, are easy to interpret and practical application. Moreover, the predictive power of these models is at least comparable to the classic MDA and Logit models (Li et al., 2010).
In this study, we used the following methodological procedure. We subjected the data set, containing the values of financial and other characteristics of companies, to a more thorough check of the values of individual variables. In the first step, the prosperity and non-prosperity of the company were determined, based on the criteria mentioned later. Those enterprises for which it was not possible to determine the (non)prosperity due to the missing value of some crucial variable. Then, the checks, edits, and verifications of the values of the variables were performed. The missing values of variables were checked and the zero values of some such variables, for which it is logically and economically justified based on their definition, were supplemented. The correctness of the values of variables, both in terms of logical but also economic correctness was checked. For example, some variables can only acquire positive values (variables 1, 2, 10,12,15,21,22). Those companies that do not meet this condition, are excluded from the database because they probably have poorly prepared financial statements (it contains, for example, logical errors). After performing such a detailed preparation of the data file, we applied the CART method to create a model for classifying a company into a group of prosperous or non-prosperous. As we mention later, this model was created on training and verified on a test sample of companies. We will present a more detailed choice of CART method settings later. The output of this method is a graphical representation of the tree using decision rules. In this study, we present the binomial tree that was created using a test sample of companies. Subsequently, the predictive ability of the model is quantified using a classification table and its classification power is also visualised by the ROC curve and quantified by the AUC criterion. We applied this procedure both for the prediction model created for one year in advance and for the model for two years in advance. The result is relatively simple binomial trees with very good classification ability.

Background of the research, data and variables used
Slovakia is an important industrial country. The most important industrial sectors include the automotive, electro-technical, engineering, and chemical industries. Slovakia is the world's largest producer of cars per capita. Approximately 85% of Slovak exports go to EU countries. Slovakia has been a member of the Eurozone since 2009, and its currency is the euro. In 2015, GDP per capita in purchasing power parity achieved 76% of the EU average. Slovakia has been one of the fastest-growing economies of the EU in recent years, the third most powerful economy in post-communist countries. In the last five years, real GDP grew by 1.5 to 3.9%, the unemployment rate fell annually on average by 2%, and the inflation rate grew annually by an average of 0.22%. For these reasons, we consider the Slovak economy to be specific and therefore consider it necessary to focus on the development of Slovak national prediction models.
As the main aim of this study is to create a model for predicting the financial distress of Slovak companies one or two years in advance, using the CART generating algorithm, we used the data on real Slovak companies obtained from the database Amadeus. Data contains financial and other characteristics of 104,452 Slovak companies in the year 2016 to 2018. The first reason why this period was chosen was that it did not differ significantly from other periods. In these years, there were no relevant disturbances in the Slovak economy (no economic difficulties, regulations, or other disruptive factors) that could have influenced the database and the trained model. At the same time, it is the first period after the crisis when the Slovak economy grew, and the last consequences of the crisis disappeared. It is why we did not include earlier periods in which the impact of the crisis was significant.  As independent variables, values of 37 financial indicators (not only ratios) calculated from real financial statements from 2016 to 2018 were used. They are presented and characterized in more details for example in Durica et al. (2019). Of course, these indicators are not only the most frequently used (Korol, 2013) but also those less frequently used, which may consider the specificities of the Slovak economy. For financial distress prediction two years in advance, data from 2016 were used. Analogously, data from 2017 were used to create a one-year prediction model.
In 2016, the Slovak legislation has introduced the institute of "a company in crisis" in §3 of Act no. 7/2005 Coll. on Bankruptcy and Restructuring as amended. According to this Act, the company in crisis is either over-indebted or is unable to repay its obligations. In other words, it is a company with negative equity, or it is unable to pay at least two financial obligations for 30 days after the due date. The company is considered as financial-distressed in case the mention conditions are satisfied and conversely if they were not met, the company was considered a non-financial-distressed one. Therefore, based on data from 2018, companies were initially divided into two groups: ones. Table 1 shows the numbers of companies indicated as financial-distressed and non-financial-distressed in the year 2018.

CART algorithm
For the creation of the financial distress prediction model, decision trees generating by the CART algorithm is used. The generated DT classifies the cases into two or more groups. CART generates DT by choosing the variable (financial ratios or other indicators of the company) that separate the companies into two sub-groups: financial-distressed companies and non-financial-distressed companies. This division into two groups is carried out with the aim that in each child node there are as many cases of companies from one group and a few from the other as possible. This operation is then repeated until the stop criterion is met. Finally, the model is formed by these final sub-groups and the list of rules on how to achieve all these final sub-groups.
The quality of the generated tree can be measured by the purity of the final sub-groups. CART algorithm usually employed the Gini index for measuring the impurity where are the ratios of the classes (of prosperous or non-prosperous companies) in the node to be predicted. According to the binary classification task, the Gini index varies from the value of 0 (all companies belong to one class) to 0.5 (half of the companies belong to the group of prosperous and half belong to the group of non-prosperous).
The stop criterion often combines several rules. This approach is used in this study, too. The division does not continue if at least one of these conditions is fulfilled: 1. The number of divisions reaches a fixed limit of 5 levels.
2. The group is pure (all companies belong to only one class). 3. The group does not contain more than one hundred companies. 4. Some of the sub-groups do not contain more than fifty companies. 5. The next division does not significantly increase the quality of the tree (the minimum change in improvement is 0.0001). Often, the CART tree can be overfitting or overtraining. Due to this problem, as a first, the CART algorithm generates a maximum tree (concerning the above-mentioned stop criterion). Then, the algorithm applies cost-complexity pruning of the maximum tree. According to this, it is sufficient to create a test sample, separate from a training sample to carry out this detection, where the ratio of these two samples is 80:20. On the test sample, each sub-tree of the maximum tree is testing and the best-pruned three is the one with the lowest error rate in testing.

Classification ability
The classification ability or accuracy of the financial distress models is evaluated by analysis of a classification table, containing the numbers and frequencies of companies that are classified correctly and incorrectly into considered two classes. In the case of data, used to create a model, the overall accuracy is often overestimated. To improve this classification, the dataset is divided into the training sample (80% of all data) and test sample (the rest 20% of the data). The training sample is used for generating the model and the test sample is used for calculating the classification of the model. The classification ability is also checked using the ROC curve and quantified by the value of the AUC criterion, where the value of the good is close to the value of 1. Korol (2013) claims that financial distress prediction models generated as decision trees usually achieve higher classification capability compared to models created using other widely used methods, including classical MDA or Logit or even ANNs. These findings and the same CART advantages discussed hereabove represent the main motivation to create the Slovak national CART-based model that is presented in this study.

EMPIRICAL RESULTS AND DISCUSSION
In this study, the sample of 104,542 Slovacompanies is used, among them 81,268 (i.e., 77.80%) nonfinancial-distressed companies and 23,184 (i.e., 22.20%) companies in financial distress (Table 3). Their financial situation is considered in 2018, based on the mentioned criteria of a company in crisis. This dataset is randomly divided into two samples. The training sample is the first one and consists of 80% of all companies. The remaining companies form the test sample. The prediction model is developed on the training sample, while the test sample serves to quantification of its classification capability. The stop criterion for generating CART decision trees is described in the Methodology section. According to these conditions, as a first, the algorithm generates the maximum tree, and then this tree is pruned to avoid overfitting and achieve good generalizability of the final model. overall accuracy. Gini index serves as an impurity function during the generating of the tree.
In the creation procedure, 37 financial indicators (Durica et al., 2019) calculated using real financial statements from 2016 and 2017, respectively, are used. Data from 2016 is used to create a two-year model, and the data from 2017 is used for one-year prediction.

Two-years model
Using the CART algorithm, the final CART-based model with five levels of sub-groups, and six final sub-groups was developed (Figure 1). During the generating of the maximum tree, values of 35 financial ratios (of all 37) were used. However, after the automatic pruning of this tree, the final model works with values of 3 ratios only. A closer look at the tree graph shows that the algorithm evaluated the variable 10 as the "most important" variable in the first splitting criterion, where the value of this variable less than about 1 (exactly 1.00071) indicate the prosperity of the company. The value of the variable 10 higher than 1 is further divided in the next step of the tree in the division point 1.28536, where enterprises with the value 10 higher than this division point are definitively marked by the model as non-prosperous. For the second category of enterprises for which 10 ≤ 1.28536 the division proceeds in the next three steps, using the variable 1, 4 and again 10. These three ratios, 10 (Total Liabilities to Total Assets), 01 (Sales to Total Assets) and 04 (Net Income to Equity), are some of the most frequently used indicators in the prediction models (Korol, 2013).

Figure 1. Decision tree for two-years prediction.
Source: own elaboration The classification table (Table 2) shows that the quality of the developed model is pretty high on the training and the test sample, too. The classification capability is more than 87%. More than 93% of the nonfinancial-distressed companies and almost 69% of the financial-distressed companies in the test sample are classified correctly. The classification capability of the model, expressed using the ROC curve, is shown in the following figure. The value of the AUC area for this model is 0.859.

One-year model
Again, using a CART algorithm with the same settings, a decision tree was grown. For growing of maximum tree, values of 33 ratios were used. After the pruning, the final DT with three levels of sub-groups, and three final sub-groups was generated (Figure 3). In this case, the final model uses only two variables as splitting criteria. This model, like the previous one, considers 10 to be the first decision variable, also with a splitting criterion of about 1 (exactly 1.00011). Similarly, as in the previous model, a value of less than about 1 means the definitive classification of the company into the class of prosperous companies. A value of variable 10 greater than 1 is further divided in the second step of the tree at the dividing point 1.14723 of the variable 10. Companies with a value of 10 higher than this value will be classified by the model as non-prosperous. The others are further divided into two groups using the variable 4. This model is quite similar to the two-year model, but it is a bit simpler and does not work with the value of the variable 1. Instead, the values of 10 (Total Liabilities to Total Assets) and 04 (Net Income to Equity) are used.
Although the model predicting financial distress is simple, its accuracy is also high for non-financialdistressed companies, as well as expected. It is because the values of the financial ratios reflect the threatening financial distress one year in advance better than two years. Table 3 describes the classification capability of the one-year model. On the test sample, the classification capability is nearly 92%. Nearly 95% of non-financial-distressed companies in this sample were classified correctly. In the financial-distressed class, more than 81% of companies were correctly classified into this class.

Figure 3. Decision tree of the one-year model
Source: own elaboration Again, the classification ability of the model was checked also using the value of AUC. Figure 4 shows the ROC curve for this model. The AUC criterion has a value of 0.875 for this model.

Discussion
Based on the evaluation of the developed models, it can be concluded that both achieved pretty high accuracy, especially in the class of non-financial-distressed companies. As it can be expected, the two-years model achieved a little bit lower classification capability of 87.3%, while the one-year model achieved a higher classification capability of 91.9%.
In Slovakia, several prediction models were created using various methods recently. Among the most used ones belong to the Logit model created by Gulka (2016) with an accuracy of almost 80%. To model financial distress, Kovacova & Kliestik (2017) also used logistic regression and achieved an overall prediction accuracy of 86.6%. Mihalovic (2016) developed two financial distress prediction models using MDA and Logit that have an overall accuracy of 64.4% and 68.6%, respectively. The abovementioned models were created for the prediction of the financial distress situation one year in advance. Nevertheless, they achieve a lower accuracy than our model two-years model, with an overall accuracy of 87.3%. Moreover, the overall accuracy of these models mentioned above is much lower than the accuracy of both of our new models.
The quality of the models presented in this study is indicated by the ratio of correct classified companies, but also by the AUC value. It is clear, that the two-years model reached slightly lower values of these criteria. In general, however, the CART method is one of the modern machine learning methods that are based on self-learning from data able to detect mainly linear relationships very well in the data and derive classification rules that with a high percentage of success classify companies into one of two groups. This is its advantage over older methods such as MDA or Logit. Several authors in Slovakia have also been created prediction models based on the DT. In 2017, Gavurova et al. presented the two Slovak national models for the one-year prediction and also for two years prediction using MDA and the CHAID decision tree, both with the prediction accuracy of almost 87%. The CHAID algorithm-generated models with higher overall accuracy. Our model of predicting the financial distress two years in advance achieve approximately the same overall accuracy. However, the classification capability of our one-year model is higher by 5%. Karas and Reznakova (2017) created two CART-based models for Slovak companies. Using these models, the financial status of financial-distressed companies is predicted with an accuracy of 94.9% and 91.5%, respectively. But the prediction ability in the class of financially healthy companies is only 61% or 62.6%, respectively. Thus, the situation is reversed for our one-year prediction model. This is because our model achieved a better classification of financially healthy companies (almost 95%), but that was also its goal. For financial-distressed companies, our model achieved a classification accuracy of 81.5%. However, the overall accuracy of our model is 91.9%, which is higher by more than 26% in comparison with the abovementioned model of Karas and Reznakova.

CONCLUSION
Although the issue of the creation of the prediction model was in recent decades in the focus of many economists from Slovakia and also from other countries, we can claim that a widely used Slovak national prediction model still does not exist. More models have been created recently, but none of them has yet become widely recognized and practically used in Slovakia. To fill this gap, the main goal of this article was to present a new model or models created using the CART algorithm. Thus, the created models are decision trees designed for the one-year and two-year prediction of the impending financial distress situation. The database was the data of more than one hundred thousand companies accounting in the conditions of the Slovak Republic. These were data from the years 2016 to 2018. The models predict the financial status of the company in 2018, taking into account the currently valid Slovak legislation.
Although both developed models classified companies based on two financial ratios (Total Liabilities to Total Assets and Net Income to Equity) and three ratios (Total Liabilities to Total Assets, Sales to Total Assets, and Net Income to Equity), respectively, they have high overall accuracy. Anyway, the simplicity of the modes is their advantage. This is because our models are easy to interpret and they are very applicable for the real prediction of financial distress for companies with incomplete accounting data. Finally, the models achieved a high classification accuracy of 87.3% for prediction two years in advance and even 91.9% for one-year prediction.
A further direction of this research lies in the verification and, where appropriate, adjustment of the model on the newer accounting data of real Slovak companies with the ambition of constructing a generally acceptable Slovak national prediction model. Another further direction of this study should also be the creation of partial prediction models for companies that operated in some selected economic sectors or operated in some selected Slovak regions.