The measurement of human development using the Ward method of cluster analysis

The Human Development Index is one of the methods how to measure human development. It measures the level of human development both in the economic and social field. Human development is studied at the national level in most cases, yet it might be used at the regional level of a country, too. The objective of the article is to describe the potential for human development in the NUTS II regions of the Visegrad Group Plus countries (the Czech Republic, Poland, Hungary, Slovakia, and Austria and Slovenia) using the cluster analysis. The research was carried out in the period from 2004 to 2013. Initially, a research hypothesis regarding the dynamization of the human development processes in most of the regions was set, moving from a lower to a higher development potential within three groups. This hypothesis was verified by a hierarchy cluster analysis in the Ward method and was not confirmed.


INTRODUCTION
Although GDP does not include social, political, cultural and environmental aspects of development, it is the most widely used indicator for measuring the state of economy, as Stiglitz, Sen and Fitoussi (2009) or Van den Bergh (2009) claim.For this reason, many alternatives for measuring socio-economic conditions have been developed, one of the best known and most often used one is an index called the Human Development Index (Todaro & Smith, 2011).The United Nations has used this index since 1990 and the measurement of human development using this index is an alternative to the GDP/GNI per capita as a measure of human well-being.It brings a different perspective on development issues and better emphasizes the effect of other factors than just economic ones.According to Majerova and Nevima (2016), the basis of the HDI index is a greater explanatory power, which is to follow economic development or sustainable development in general.
The UN uses the HDIs primarily as nation level indicators, estimated for a country as a whole (Basu & Basu, 2005) and their constructions do not express the differences in the regions of the countries.However, regional disparities exist here, and they influence regional development, therefore, the formation and analysis of human development at the regional level were a motivation for writing this article.The issue of human development for a group of countries of the Visegrad Group Plus (hereafter V4+) at the NUTS II level was studied.The V4+ includes the Visegrad Group countries (the Czech Republic, Hungary, Poland and Slovakia), and Slovenia and Austria based on the Regional Partnership Agreement from 2001.There are 46 regions on the NUTS II level -eight in the Czech Republic, seven in Hungary, sixteen in Poland, nine in Austria, four in Slovakia and two in Slovenia.The research was done in the period from 2004, when the membership of most V4+countries began, to 2013, when the last available data at the regional level for all monitored indicators and economies were available.The approach of the United Nations in forming the HDI was adopted in selection of indicators, yet the components of each dimension had to be modified.The indicators of human development at the regional level are life expectancy at birth (dimension of health), tertiary educated people and lifelong learning (dimension of education) and GDP per capita in PPS (dimension of living standards).These components were used in a hierarchy cluster analysis in the Ward method.
Using the approach of cluster analysis in the concept of HDI at the regional level is unique -most of the studies deal with cluster analysis at the national level.Aguña and Kovacevic (2010) divided the economy into four clusters according to the HDI categorization by the United Nations.Grimm et al. (2010) used a hierarchical cluster analysis of 32 countries with respect to the inequality in the three components of HDI.Ülengin et al. (2009) or Rende and Donduran (2011) used the Self-organizing Maps for the creation of clusters.Hoeller et al. (2014) created a cluster analysis based on 12 core inequality indicators in OECD countries.Similar patterns have been identified in five groups sharing labour inequality.Bakumenko et al. (2015) created four clusters for 41 economies of the three groups of human development factors, formed by 23 indicators -satisfaction of the population with social conditions, the level of education, and demographic loads.Abad-Gonzáles and Martínez (2016) found out that the number of clusters in the field of HDI is not fixed and varied over time from three in 1990 to four in 2014, and that the countries within each category differ from the United Nations proposal.
Although some authors analysed human development at a regional level, these measurements were related to an old methodology, as in case of China between the years 1982 and 2003 (Yang & Hu, 2007), or in Kasim, Fron and Yaqub (2011) regarding HDI of Iraq in 2006.They divided the regions of the aforementioned economies into four clusters.The closest topic to our paper is the cluster analysis performed by Akócsi, Bencze and Tóth (2012).They analysed the Human Development Index of the Visegrad countries on the ground of knowledge (human) resources in the period from 2002 to 2007.The authors used 13 indicators for 35 regions according to the old methodology.A cluster analysis following the new methodology has not been published by other authors yet.
Based on the Ward method, three clusters reflecting different stages of the development potential of the monitored regions were created.These three clusters include a plurality of regions based on their inner similarities that would not be otherwise apparent at first glance.There are regions with different levels of the development potential -an above average potential, an average potential, and an under-average potential for human development.A research hypothesis about a dynamization of human development processes in most regions was set, moving from a lower to a higher development potential.It was assumed that more than a half of the monitored regions belonging to the lower group of potential to human development would shift to the higher group.It was found that the vast majority of regions have not changed their positions in the cluster in the monitored period, so the aforementioned hypothesis was not confirmed.

THE HUMAN DEVELOPMENT INDEX AT THE REGIONAL LEVEL
In the 1990s, the UN Development Programme (UNDP) published the first Human Development Report, which established the need for human development measurement, and from that time on, the beginning of using the Human Development Index has been dated.As for the human development, the formation of human capabilities in terms of improving health, increasing knowledge and skills to meet human needs and their own skills and competences, free time, job security, cultural, social and political events should be in balance.It is, therefore, necessary to examine not only the income but also other variables that point out the potential of a country much better, as well as the options that currently appear in human development (Majerova, 2012).
According to the UNDP (2015), the Human Development Index (HDI) is a summary measure of achievements in key dimensions of human development: a long and healthy life, an access to knowledge and a decent standard of living.These three dimensions have four parts -health has one part, education has two parts, and the standard of living has one part, as it was mentioned in Table 1.
The calculation method of the two dimensions has changed over time when the health indicator index is the only one that has remained unchanged due to the need to improve its explanatory power.The last change was made in 2010, which was to switch from the original additive aggregation function -the arithmetic mean of the three components, to a multiplicative function -their geometric mean (Ravallion, 2012), as shown in Equation (1).
The HDI index calculation reached the values ranging from 0 (the lowest level of human development) to 1 (the highest level of human development), and therefore they were determined for each dimension of the minimum and maximum values based on historical evidence (more in Anand and Sen, 1994).
By taking assumptions in classifying the levels of human development in regions, we adopt a methology used by Hardeman and Niikstra (2012), which constructed the EU-Regional HDI on the case of 272 EU regions using the same methodology and similar indicators as the United Nations: for health dimension they used the healthy life expectancy and infant mortality, for knowledge dimension the indicator NEET (Not in Employment, Education or Training) plus general tertiary education index were used, and for the living standard the index of net disposable household income and employment rate.
In this paper, the components of each dimension had to be modified, firstly, because we wanted to be as close as possible to the methodology of the HDI and so we excluded infant mortality and employment rate, and secondly, because of the lack of the data at the regional NUTS II level (NEET was replaced by lifelong learning).In case of net disposable income of households, which was replaced by GDP per capita, we believe, that incomes of households do not express incomes of other economic subjects that are important in making welfare of the regions.
The data used were from the regional database of Eurostat and they were converted to the number of inhabitants representing the given group.
As it was mentioned above, three components were used for the construction of the HDI of V4+ regions (thereafter Regions NUTS Human Development Index, RNHDI): -health component, -knowledge component, -the standard of living.
The health component includes the value of life expectancy at birth, which is represented by the mean number of years that a newborn child is expected to live, in relation to the current mortality conditions (agespecific probabilities of dying).
The knowledge component includes two components: firstly, tertiary educated people between 25 and 64 years of age, where the indicator is defined as the percentage of population aged 25-64 who have successfully completed tertiary studies (e.g.university, higher technical institution, etc.).This educational attainment refers to the ISCED (International Standard Classification of Education) 1997 level 5-6, which includes the first stage of tertiary education (bachelor and master, or equivalent), and the second stage of tertiary education (doctoral or equivalent).Secondly, lifelong learning in the form of a participation rate in education and training covers participation in formal and non-formal education and training.The reference period for the participation in education and training is at least four weeks.The participation rates in education and training for the age group between 25 and 64 years are presented.The data are calculated as annual averages of the quarterly EU Labour Force Survey data (EU-LFS).
The standard of living, measured through GDP per capita in PPS -Purchasing Power Standards (PPS), is a common currency that eliminates the differences in price levels between countries and regions allowing meaningful volume comparisons of GDP between them.
Apart from what was mentioned above, the reason for the selection of these indicators was the greatest explanatory power in relation to human development.The life expectancy at birth correlates positively with human development -the higher the healthy life expectancy of a region, the more developed it is.It reflects the level of health and the quality of life, and it measures the qualitative aspects of living a healthy life.The share of tertiary educated people in the productive age in the population of this age group is connected with the ability of people to reflect the needs of the knowledge of economy and to contribute to this knowledge and human development as well.Lifelong learning, in the form of participation in education and training, encompasses all learning activities undertaken throughout life (after finishing the initial education) with the aim of improving knowledge, skills and competences, within personal, civic, social or employment-related perspectives as Eurostat (2016) demonstrates.People extend their possibilities for increasing their incomes due to lifelong learning.As a dimension of health, both indicators of education are positively correlated with human development.
The implementation of GDP per capita was influenced by the opinion of Sen (1999), who considered the income (product) as a primarily mean to achieve human development.The GDP per capita reflects the economic level better than its absolute value.The indicator is measured by an artificial European currency unit, the purchasing power standard (PPS) is better than USD in PPP for our purpose.
It was also necessary to define the minimum and maximum values for each indicator in the monitored years.To determine the minima, the worst results of individual indexes from all regions of the European Union have been chosen, while for the maxima we have chosen the best ones.One exception was made regarding the GDP per capita, where the second highest value was chosen.The reason for this was obvious -the highest values of the GDP per capita are presented in the regions of Luxembourg or Inner London, and these values are extremely high -they exceed the second highest value (Hamburg) by more than 50,000 PPS.The values of the region of Hamburg were determined as maxima.The data from 2013 shown in Table 1 are examples of creation and comparison of the UNs´ Human Development Index and the EU NUTS II Human Development Index.To determine the sub-indexes, two types of calculations were used: a standardized index of life expectancy and two education indexes (2) and natural logarithmic calculation for the standard of living index (3).The value of education index IE is calculated as the arithmetic mean of the value of lifelong education index ILL and the value of tertiary education index ITE (4) where Hstand is standardized value, Hln is natural logarithm, Hs is real value, Hmin is minimum value and Hmax is maximum value.
The calculation of the total index corresponds to the new HDI approach and is calculated as the geometric mean of all the above sub-indexes, as shown in ( 5).The values of index and its sub-indexes in every NUTS II region of V4+ in the years 2004 and 2013 are shown in Appendix 1, the development of the index in the mentioned period is in Appendix 2.
For the measurement of various levels of human development in the monitored regions, we accepted the values of HDI that range in the interval of 0-1 and formed the categories of NHDI as follows: -very high regional human development, with the value of 0.800 and above -high regional human development, in the interval of 0.700-0.799-medium regional human development, in the interval of 0.550-0.699-low regional human development, below 0.550.The levels of human development are astonishing (see Appendix 1): in terms of the categorization, no region reached the very high and high level in 2013.Only six regions reached the medium level of the RNHDI: one in the Czech Republic (Prague), three in Austria (Wien, Salzburg and Vorarlberg) and one in Slovenia (Zahodna Slovenia).Two of them are capital cities, one of them is a region with a capital city.The rest of the regions (40) reached only the low regional level.
When comparing the development of RNHDI in the years 2004 and 2013 listed in Appendix 2, we can see that the regions with the best position (except Slovenian region) recorded an improvement over the years.In the Czech Republic, as the only economy, the situation in all NUTS II regions has improved, on the other hand, the situation in human development has deteriorated in all Hungarian regions.Poland recorded a 50% improvement in the region's position regarding human development, while in Austria the results worsened in five regions out of nine, which is 56 %.

THEORETICAL APPROACH TO CLUSTER ANALYSIS
According to Blashfield and Aldenderfer (1988), the cluster analysis method has a long history -the earliest known procedures were suggested by anthropologists, and later, these ideas were applied in psychology.
Cluster analysis is primarily focused on searching for similarities or differences among the examined objects.Cluster analysis provides one, empirically based, means for explicit classifying objects (Punjand Stewart, 1983).According to Everitt et al. (2011), cluster analysis techniques are concerned with exploring data sets to assess whether they can be summarized meaningfully in terms of a relatively small number of groups or clusters of objects or individuals which resemble each other and which are different in some respects from individuals in other clusters.
If the research object is a region, as in this case, it is clear that we can confirm our assumption about the most or the least developed regions in the area of human development and its modifications only by applications of cluster analyses.
Clustering analysis became one of the qualifying methods in the 20th century, the usefulness of which immediately had an impact on particularly all fields of science.The first comprehensive work dealing with cluster analysis was created by Tryon (Tryon, 1939).The main motivation for the use of clustering is uncovering hidden similarities or differences.For this reason, cluster analysis is now widely used by all scientific disciplines (for us, its most interesting use is in the field of economy, e.g.Vázquez and Sumner, 2012, Brauksa, 2013, Halásková & Halásková, 2015, Lipták et al., 2015).
If we want to formulate the principle of cluster analysis mathematically, it can be stated that it is a decomposition of a set S (k) by the objects to k certain groups of clusters C, see Equation ( 6): where The main essence of cluster analysis is to classify individual objects (in this paper they are territorial units -NUTS II) and uncover their spatial structures.Similarly to factor analysis, cluster analysis can also be regarded as a form of data reduction.However, it does not serve to a reduction of the number of variables; its primary purpose is to divide a file of units into several mutually exclusive, relatively homogeneous subsets, called clusters.The aim of the classification is, with the knowledge of cluster analysis, to reduce the dimensionality of the data file using the similarities/differences among objects.This is a very important trait for the analysis of regional disparities, which was proved by the results of this paper.Clusters are results of cluster analysis, the units of which are similar in the monitored characteristics as much as possible, while units incorporated into various clusters in ideal contrast represent the highest degree of difference (Meloun, 1994).In simple terms, they are about minimizing their differences among objects within the same cluster and vice versa, about maximizing these differences among objects of different clusters.The analysis of clusters of objects is not the nature of statistical testing, but it is a method of quantification of structural properties of a set of objects.
The basis of cluster analysis is sorting, of which we appoint two basic approaches -hierarchy cluster method and non-hierarchy cluster approach.The first one is based on using once formed clusters.Thus formed clusters are then used to create other clusters from the rest of the data file.This manner is proceeded until all elements of the data file are a part of the cluster.This type of procedure is mostly chosen for the regional analysis.
The non-hierarchy cluster approach is based on cluster search, namely on the principle of the smallest difference from the average.The procedure is advantageous only if the number of clusters that we want to achieve is determined beforehand.This may become a significant limitation in a further research, as only such number of clusters that we determined beforehand is finally formed, and for example, some extreme values may merge with the average ones.
There are seven methods in the clustering process (Caliński & Harabasza, 1974).The first two methods are based on linkages: between-groups linkage or within-groups linkage.Their application depends on good knowledge of the data file and information about the number of clusters that we want to achieve.If we ignore the total number of clusters we want to achieve, both methods are limitations in further researches.The Nearest Neighbour, the third method, is based on the shortest distance between clusters.The fourth method, the Furthest Neighbour method, searches the values in the data file that are separated by the furthest distance.The fifth method, called the Centroid Clustering method, may seem as the most ideal at first glance.It is based on the Euclidean distance between the centroids of clusters -those clusters that have the smallest distance between them are the closest.Unfortunately, it does not deal with the differences that may occur due to different weights for equally large clusters.Median Clustering as the sixth method solves the problem of weights variance that the previous method gives to differently large clusters.
The last method, the Ward method, named after its creator, focuses on the allocation of profiles to groups equally.Ward (1963) mentioned that grouping in this manner makes it easier to consider and understand relations in large collections.The principle of the method is not optimization, but minimization of heterogeneity -the purpose is to find the greatest similarity.When measuring human development and its modifications, it is necessary to look for similarities among 46 regions using this method.
One of the fundamental problems of cluster analysis is the concept of mutual similarity of objects and quantitative expression of this similarity.One of the most common ways of expressing relationships among objects is the metrics.The metric squared Euclidean distance (SED) was used for the Ward method (7), similar as : where d 2 is SED, xik is the value of k-symbol for the i observation of the variable, xjk is the minimum value of the variable x ik and n is the total number of objects.

THE REGIONS OF THE VISEGRAD GROUP PLUS IN CLUSTER ANALYSIS
Cluster analysis will be practically applied on the regions at NUTS II level of the Visegrad Group Plus countries, based on the methodology described in the previous section of this paper.The V4+ regions will be divided to clusters according to their development potential in terms of human development.The hierarchy cluster approach by means of the Ward method was used for the classification of the monitored regions, and all performed calculations were done by using SPSS software.The Ward method is not based on optimization of distances between clusters, but on optimization of the clusters´ homogeneity according to some criterion, which is the minimizing of increase in the error sums of squares of deviations from the points of the cluster centroid.The sum of squares is calculated for each possible pair of connection aggregates at each stage of this analysis.Those clusters are combined where there is a minimal increase in the error sum of squares.
The motivation and the advantage of using this method is the tendency to remove small clusters and thus forming clusters of about the same size, which is a welcome feature.This is because this method requires an expression of objects´ distance by the squared Euclidean distance.As the Ward method leads to minimization of intra-cluster dispersion, which makes the research of examined objects more accurate, its choice was the best option for our purposes.
Since the values of each variable were in different units (years, population, monetary unit), it was necessary to standardize the data.The same approach was used by Żelazny (2015), in which the level of information society in one Polish region was determined.This standardization was carried out in two steps: 1. the medium value k z _ and standard deviation k s were calculated according to Equation ( 8) and ( 9) 2. afterwards the standardization through normalization of each object in the z-score was made (the standardization z-function) by the following Equation ( 10) The results/data of statistical description are shown in Table 2.The spread varies widely for some indicators.The greatest deviation among the regions of the V4+ group corresponds to the indicator of the GDP per capita; the second one is the indicator of the tertiary education.These two components are the most heterogeneous ones.The population is more heterogeneous for the component of life expectancy and lifelong learning.Note: N -number of observations, SD -Standard Deviation Source: authors´own processing according to the program SPSS.
The subjects of cluster analysis -all NUTS II regions -have been evaluated by the metrics that was created in the program SPSS.According to the results of the Ward method and dendrogram, which is not displayed in the paper due to its size, the following three clusters were identified and shown in Table 3: -Cluster 1 is the group of regions with an above-average potential for development in terms of human development and its input parameters.
-Cluster 2 indicates the group of regions with an average development potential in terms of human development and its input parameters.
-Cluster 3 indicates the group of regions with a below-average development potential in terms of human development indicators and its input.The changes of various inputs that influence the final value of human development during the reporting period are shown in Table 3 as well.Most of the clusters remained unchanged throughout the monitored period, only six of them evolved over time.From this table, we derive whether developments in the regions when analysing the input variables are rather constant, or whether the processes lead to dynamization in the regions.
Other methods of cluster analysis were tested in the article, too.It means that in addition to the Ward method, the input data were subjected to testing in other six methods that were theoretically described.The aim of the testing was to find a similarity to the results obtained by the Ward method, and it was found that within other methods only two clusters were developed, which inadequately testify about the nature of the sample dataset.Although these methods lead to a homogenization of the practical data file, the result of homogenization is the impossibility to divide the NUTS II regions into three clusters -with an aboveaverage, an average or a below-average development potential, as it was in case of the Ward method application.
It may happen in case of methods other than the method of Ward that the changes in the development of the regions are suppressed.This means that it is not possible to capture whether the region has moved from one cluster to another over the monitored period.The other methods described, unlike the Ward method, regard the regions as static units that are not subject to structural changes, which does not allow us to analyse them deeper during the monitored period.For this reason, their implementation for this type of data is inefficient.The results of this test, using one of the methods, namely within-groups linkage, are shown in Appendix 3.
Comparative analysis of the selected methods of cluster analysis verified that the best way of categorization of our data inputs of the Human Development Index in the NUTS II regions is the categorization into three clusters.Otherwise, we would not be able to use the potential of 46 regions in terms of their next direction.This has resulted in the fact that only the Ward method is the best for the evaluation of the selected regions.It is due to the fact that this method eliminates smaller clusters and conversely produces clusters of a comparable size corresponding with a homogenization subset of the selected data file.

CONCLUSIONS
The Human Development Index has been used since 1990 and it is one of the indicators that measure socioeconomic development at the national level and thus compare the differences between economies.However, there are not only differences between the economies, but also within them.For this reason, we decided to construct the index of human development at the regional level.The modified Human Development Index (RNHDI) was created for 46 regions of the Visegrad Group Plus countries at the NUTS II level.For the purpose of this paper, the data had to be modified, but the methodology of the RNHDI remained the same as for the HDI.Three components were used -the health dimension (life expectancy at birth), the knowledge dimension (tertiary educated people and participation rate in education and training) and the dimension of the living standard (GDP per capita in PPS).
From the perspective of the RNHDI standard, only six regions reached the medium level of the RNHDI: one in the Czech Republic (Prague), three in Austria (Wien, Salzburg and Vorarlberg) and one in Slovenia (Zahodna Slovenia).The rest of the regions reached only the low regional level.When comparing the development of RNHDI in the years 2004 and 2013, we can say that the regions with the best position (except Slovenian region) recorded an improvement over the years.In the Czech Republic, as the only economy, the situation in all NUTS II regions has improved, while the situation in human development has deteriorated in all Hungarian regions.Poland recorded an improvement in the half of its regions, while in Austria there was a worsening of the results in five regions out of nine.
The above-mentioned components were used in a hierarchy cluster analysis in the Ward method in the period from 2004 to 2013.Based on a comparison with other hierarchical methods, this method is presented as the most appropriate one.Three clusters were created that included a plurality of regions based on their inner similarities that would not otherwise be apparent at first glance.At the beginning of the monitored period, the situation in various regions was as follows: the regions of Austria were very homogeneous and were placed in a group with an above-average potential for development (group 1).The Czech, Slovak and Slovenian regions were placed in the first two groups (1 and 2).The Hungarian regions (except the region with a capital city) were in the second group -with an average development potential.The Polish regions showed the lowest homogeneity and were placed in all groups (mostly group 2).
Initially, a research hypothesis was set suggesting that more than a half of the monitored regions in the lower group with a potential to human development (2 or 3) would shift to the higher group (1 or 2).Finally, only in some regions the time offset between the individual clusters is obvious.Usually, it was a situation where regions have shifted from an average to an above-average potential towards development, i.e. from cluster 2 to cluster 1.It was the case of the regions in the Czech Republic -Střední Čechy, Jihozápad, Severovýchod, Jihovýchod and one region in Slovenia -Vzhodna Slovenia.A reverse process, which led to a slowdown in the development potential of the region, was noticed, too.The shift from the group of an average potential to a below average potential was recorded only in one region, namely in the region of Poland -Pomorskie.The vast majority (i.e.forty out of forty-six) of the monitored regions did not change their positions in the cluster during the monitored period.Our hypothesis about the dynamization of most of the regions was not confirmed.
It should be emphasized that the resulting allocation of the regions into individual clusters was dependent on the number of input variables.If we reduced or added the number of input variables correlating with a modified human development, the resulting allocations of the regions would change.This challenge will be the subject of our further research -we would like to focus on a comparison of cluster analysis results with the results of development of the regional Human Development Index with extended indexes of life quality -infant mortality, health personnel, road and rail networks and number of tourist establishments.

APPENDIX 2
The change in the development of the regions in the period from 2004 to 2013 is indicated.Source: authors´ own according to the program SPSS.

Table 1
The Comparison of HDI and RNHDI Components in 2013 Source: authors´ own processing according to UNDP (2013) andEurostat (2016)

Table 2
Descriptive Statistics for Components of RNHDI

Table 3
Results of the Ward method: un/changed clusters of V4 + NUTS II regions in the period between 2004 and 2013