Determinants of education quality in the Canal del Dique y Zona Costera region of Colombia

This article analyzes the quality of secondary education in the municipalities of Canal del Dique y Zona Costera a region characterized both by the richness of its soil and by its agricultural vocation. In addition to economic and social determinants, those variables having the most significant impact are determined, a key aspect in establishing general and specific measures for improving the potential for territorial development. In a new approach, these aims were met by applying the technique known as Ordered Logit – a type of Discrete Choice model – based on the results of the Saber 11 State Secondary Education Examination Critical Reading and Mathematics tests for 35,369 students from the region for the second half of 2014 and contrasted with each student’s socioeconomic conditions. The results show that, along with the institutional problems facing the region, high levels of poverty have had an impact on the quality of the education system, producing very low levels of quality, on a scale even more serious than the Colombian average.


INTRODUCTION
The Canal del Dique y Zona Costera region is an alluvial plain situated in the north of Colombia, formed by a total of 29 municipalities, 15 of these located in the Bolivar Department -Cartagena, Arjona, Arroyohondo, Calamar, Clemencia, Mahates, Maria La Baja, San Cristobal, San Estanislao, Santa Catalina, Santa Rosa de Lima, Soplaviento, Turbaco, Turbana and Villanueva-and 14 belonging to the Atlantico Departament -Barranquilla, Campo de la Cruz, Candelaria, Juan de Acosta, Luruaco, Manati, Piojo, Puerto education quality studies and the application of discrete choice models in the education sector.The third part outlines the methodology applied in this research as well as the spatial and temporal delimitations of analysis.The fourth section is dedicated to the development of the methodological procedure in the spatial and temporal scope on which the study focuses, based on which the empirical results are obtained.And lastly, a final section sets out the main conclusions.

LITERATURE REVIEW
A large number of studies that focus on both the definition of the term educational quality and the empirical results of the evaluation of quality at the level of educational institutions have been found in the literature review.Among the first works, the literature review of Bernal, Martínez and Parra (2015) about the antecedents of the educational quality should be highlighted, in which the works published in Latin America in the last 10 years are considered, which are available in scientific databases (Dialnet, Doaj, Emagazines, Latindex, Rebiun, Recolecta, Redalyc, & Scielo).The results suggests the existence of marked differences in the use and context of the terminologies that frame the policies of educational institutions, ranging from their macro consideration (the objectives pursued by the government), to the micro level (center in the classroom, which relates to what is stated in the curriculum and the teacher's purposes), going through the meso level (the purposes that educational institutions want to achieve).The above raises the discussion on quality and, according to Avendaño, Paz, & Parada (2017), on the role played by the competent public entities, the schools, their directors, and their teachers.
In respect of the empirical studies on educational quality in Latin America, those carried out by UNESCO (2013UNESCO ( , 2018) ) should be highlighted, which focus on both determining the current state of the educational quality of the countries of the region and the challenges of this sector, in order to meet the Sustainable Development Goals by year 2030.It should be noted the work of Martínez (2015) and Eslava (2015), in addition to those proposed by the Latin American Research Initiative for Public Policies (2016), which has focused its attention on deepening the problems of educational quality in Latin America, as a complement to the debate about the public policies necessary to guarantee higher levels of human development.In the Colombian case, several studies have referred to problems of educational quality (Barrera-Osorio, Maldonado, & Rodriguez, 2012;Contraloria General de la Republica, 2014;Delgado Barrera, 2014;Guataqui, 2003;Iregui, Melo, & Ramos, 2007;Jola, 2001;Maza Avila, 2012, 2015;Turbay, 2005).
In the field of empirical research on educational quality, those works that focus on the analysis of socioeconomic factors that determine quality are highlighted; considering as criteria the educational achievement of each individual (in this case, students of an educational institution) and using Discrete Choice Models.Some works focus their attention on secondary schools, among them, those of Lauer (2003) and Liu & Koirala (2012).Also included are works focused on higher education, among them, those published by Pudney & Shields (2000), García, Alvarado & Izquierdo (2000), Smith & Naylor (2001), Ibarra & Michalus (2010) and Declercq & Verboven (2018).In Colombia, very few studies have made use of Discrete Choice Models to evaluate the determinants of educational quality at the secondary level; among them, the works of Acevedo, Zuluaga, & Jaramillo (2008), Chica, Galvis & Ramírez (2010), Estacio et. al. (2010), Rivera Vanegas (2011), Escobar & Orduz (2013), Albert, González & Mora (2013), Ariza, Acosta, & Altamar (2016) and Avendaño, Paz & Prada (2017).Most of them agree that the age of the individual, the place of residence and the income of the father or mother have a substantial impact in educational achievement.

METHODOLOGY
This in-depth examination of the socioeconomic determinants of secondary education quality in the Canal del Dique y Zona Costera region employed an analytical technique known as ordered logit, which is included in the discrete choice models and which takes as its base the data from the second part of 2014 from the Examen del Estado de la Educación Media Saber 11 [Saber 11 State Secondary Education Exam], prepared by the Instituto Colombiano para la Evaluación de la Educación [Colombian Institute for Educational Assessment, abbreviated as ICFES in Spanish] and whose objectives include verifying the degree to which competencies have been developed for students who are completing or have completed their final year of secondary education (ICFES, 2014).The Saber 11 exam -previously called ICFES-was created in 1968 and has undergone several reforms throughout its history.The latest modification was carried out in the second half of 2014 for the purpose of adapting to the guidelines of the Plan Decenal de Educación 2006-2016[Ten-year Plan for Education 2006-2016], which set out to organize, implement and consolidate a system of monitoring and assessment of the education sector taking into account students' achievements and difficulties as well as access, enrollment and permanence in the system and the efficiency of those entities responsible for the provision and quality of services (ICFES, 2014).The exam is currently made up of 5 tests: Critical Reading, Mathematics, Natural Science, Social and Civic Studies and English, with the possible scores for each test running from zero (lowest score) to one hundred (highest score) 1 .
[Scores] for the first two tests (Critical Reading and Mathematics) will be taken as the basis for analysis, due to their cross-cutting nature in the education process and to the fact that they integrate generic competencies into higher education.

Discrete choice models: Ordered Logit
The ordered logit model forms part of the discrete choice models which allow for the establishment of a relation between an ordinal qualitative dependent variable and other independent variables, be these categorical or no.This type of model proves appropriate where the objective does not consist of predicting the mean aggregate behavior but rather analyzing those factors determining the probability that an individual economic agent chooses a [certain] course of action or obtains a result from within a setgenerally finite-of possible options (Rodríguez Donate & Caceres Hernandez, 2007).This characteristic calls for coding as a prior step to modeling, a process by which the alternatives to the variables are transformed into codes or quantum values, susceptible to modeling using econometric techniques (Medina Moral, 2003).Where the dependent variable is an ordinal qualitative, the use of a simple regression model is not recommended, as this would be based on the assumption that the differences between each category are the same, when in reality the value representing each alternative simply indicates the order established between them (Greene, 1998, cited in: Bujosa Bestard & Rossello Nadal, 2005) and which may therefore result in the estimators being outside the variable parameters.
1 According to the methodological guidelines proposed by ICFES (2014) and which came into force for the second half of 2014, the previous results are complemented by two new indicators: the first, known as Puntaje global de la escala histórica [Global score on a historical scale] is a scoring comparable between different applications of the exam and which is valued on a scale from zero to five hundred; the second, known as Puesto [Position] corresponds to the "thousandth" position at which the student is ranked on the global index and refers to the result obtained using the population that took the exam in a particular application as a reference.Therefore, the first position is occupied by 0,1% of test takers with the highest results, while the 0,1% [of students] with the lowest results are at the one thousandth position.
The equation represented by the ordered logit model is the following (Borra Marcos, Gomez García, & Salas Velasco, 2007): Where: Y* is the latent variable (unobserved), X represents the vector of explanatory variables for the observed individuals, while β represents the impact of the quantified regressors and ε represents random disturbance.
The latent variable, in turn, is subdivided into a set of ordered intervals (Rodriguez Donate & Caceres Hernandez, 2007) observed in reality and which respond to the innate characteristics of the decision or individual (Sanchez Trujillo & Gomez Cabrera, 2008) expressed as follows: = ,   *  >  −1 .
Where: µ represents the thresholds of each possible response to Y, which must be calculated at the same time as β.The probabilistic model which determines choice is defined as follows: If this expression is converted to a logit model, the following equation is obtained: The ordered logit model is based on the parallel lines assumption.This signifies that the β parameters calculated in each equation for each of the thresholds or categories are the same.However, it may occur that the parallel lines assumption does not hold for one or all of the explanatory variables, which are therefore inefficient.Faced with this situation, which may be confirmed by applying a Brant test (1990) or a Likelihood-ratio test (Wolfe & Gould, 1998), a generalized ordered logit model is used (Chica Gomez, Galvis Gutierrez, & Ramirez Hassan, 2010).
The generalized ordered logit model is based on the premise of the existence of diverse options or alternatives which can be ordered.The procedure, as outlined by Fu (1998), consists of calculating simultaneous M -1 regressions in which the dependent variable may be dichotomized.In this model -as opposed to that of ordered logit -the coefficients associated with the different explanatory variables may vary between regressions.Their calculation is carried out using the maximum likelihood estimation method, based on the function of accumulated density of logistic distribution (Bustamante & Arroyo Mina, 2008).The expression is as follows: Where: M indicates each of the categories in the model.Each model therefore implies the combination of categories, according to the order followed by the same.For example, if the model contains M=3, then category 1 is compared with categories 2 and 3, while category 2 is compared both with 1 and with 3. In this sense, positive β values indicate a high probability of being in a higher category, while negative values indicate that increments in the associated explanatory variable increase the probability of remaining in the current category or descending to a lower one (Fullerton, 2009;Small, 1987).
It may be the case, however, that not all but some of the model's explanatory variables violate the parallel lines assumption.In this instance, which may be confirmed by applying the Wald test (Williams, 2006), a model adapted to such a situation may be obtained, showing the coefficients of the variables which did not violate the assumption of parallel lines together with those which did.In this case, a model that partially violates the said assumption would be obtained (Long, 1997;cited in: Kaplan & Prato, 2012): Where:   represents the coefficients of the Xi explanatory variables which did not violate the parallel lines assumption, while    represents the coefficients of the Zi explanatory variables which violated it.

Variables considered in the model
In order to analyze the socioeconomic determinants of education quality in the Canal del Dique y Zona Costera region, calculations were performed with two models based on the results of two of the five tests included in the Saber 11 exam: Critical Reading and Mathematics.These two tests were chosen due to the implications that language and mathematics have on the development of linguistic, communicative and logical-mathematical reasoning competencies in young people, which will be useful in solving their day-to-day problems and which will facilitate their successful integration into society.Moreover, both areas of knowledge are integrated into the generic competencies covered in higher education and will serve to ensure that future professionals graduate sufficiently prepared for responding to the demands of the current labor market regardless of the program they pursue, the methodology of the same or the institution where they are enrolled (Ministerio de Educación Nacional [State Ministry of Education], 2009).
In light of the aforementioned, the level obtained by each student in the test under assessment -Critical Reading level and Mathematics level-is taken as a dependent variable by each model.The categories are based on the ordering established by the ICFES in assessments carried out before 2014: low (0-30 points); middle (30.01-70 points); and high (70.01-100 points).In this case, the various levels in the scoring of the Critical Reading and Mathematics tests obtained by each student are expressed by a limited discrete qualitative variable.On the other hand, those variables reflecting the socioeconomic conditions of the students taking the test, collected by the ICFES at the time of the student's enrollment months before the test date were considered explanatory -independent -variables: .The categories defined are: less than 1 SMMLV; between 1 and less than 2 SMMLV; between 2 and less than 3 SMMLV; between 3 and less than 5 SMMLV; between 5 and less than 7 SMMLV; between 7 and less than 10 SMMLV and 10 or more SMMLV.The calculation was carried out using version 12 of the Stata econometric program, traditionally used for calculating this type of model.The steps for calculating each model are the following:  Step 1: calculation of the ordered logit model, under the null hypothesis of parallel lines. Step 2: application of Brant and maximum likelihood estimation tests for verifying the parallel lines assumption.The null hypothesis is the non-existence of parallel lines, with a 95% confidence level. Step 3: where a null hypothesis is rejected, a generalized ordered logit will be calculated. Step 4: a Wald test is applied in order to determine the explanatory variables for which the parallel lines assumption does not hold. Step 5: calculation of the adapted generalized ordered logit model, retaining those parameters for which the parallel lines assumption holds and replacing those for which it does not.

EMPIRICAL RESULTS AND DISCUSSION
This section discusses the results of the socioeconomic determinants of education quality for the Canal del Dique y Zona Costera region for the Critical Reading and Mathematics tests.For the calculation, Barranquilla and Cartagena were excluded from the analysis so as to avoid possible bias in the results produced, on the one hand, due to accumulation of the highest proportion of students, and on the other hand, due to their being home to the top educational establishments in the entire region.Inconsistent data from some of the variables -130 entries in total-were also excluded from the base.After this purging, the base was left with a total of 7,441 entries, equivalent to the number of students who took the Saber 11 exam in the second half of 2014 (Table 1).At municipal level, Sabanalarga (14,66%), Arjona (12,48%), Turbaco (10,67%) and Maria La Baja (8,49%) are the municipalities with the highest number of students included in the model to be calculated.In addition, the output tables have been organized so as to systematize the coefficients for each variable, the standard error, the odds ratio3 (or probability of moving from one level of scoring to another) and the significance level.For their interpretation, it is necessary to bear in mind that the coefficients show comparison between all levels under consideration -low, middle and high.For this reason, the sign for each coefficient indicates the direction understood by its reading.As such, a positive coefficient indicates that the highest values in the explanatory variable show a greater probability that the student is in a higher test category, while a negative coefficient indicates that the highest values in the explanatory variable increase the likelihood of being in a lower category (Williams, 2006).The odds ratio, for its part, shows the probability of being in a higher or lower category, depending on the sign (positive or negative) of the coefficient for the variable under study and based on the assumption that the rest of the explanatory variables remain constant.
In the case of the Critical Reading test, an ordered logit model was calculated.Subsequently, the Brant test (1990) was applied in order to determine whether any variable in the model violates the principle of parallel lines.The Likelihood-ratio test posited by Wolfe & Gould (1998), was also employed.The results of both tests indicated acceptance of the null hypothesis, thus proving the non-existence of differences in the coefficients/probabilities among the models that may arise, in accordance with the categories of dependent variables4 .
The results of the parameters calculated in the ordered logit model for the Critical Reading test demonstrate that age has a negative effect on the model.In fact, an additional year of age in the individual taking the test decreases the probability of obtaining a better result in the test by 8.4%.It is necessary to point out that both the father's occupation and the SISBEN level (on the increase) also drastically reduce the possibility of achieving a better performance.In particular, a student whose father is a small businessman or whose family has a SISBEN level greater than 3 has a lower probability of achieving a better performance in the Critical Reading test, at 88.1% and 92.5% respectively.
On the other hand, living in an urban area increases the probability of a student obtaining better results by 29.9%.This is understandable, since by living in urban areas-(department capital), families have a greater offering of educational institutions than those living in rural areas, such that they may have the opportunity to enroll their children at an institution with a higher quality of education.A curious case occurs with the variable for socioeconomic strata, since students situated in stratum 2 have a 156.5% higher probability of a better performance.However, being situated in stratum 4 increases probability by 1,588.9%.This last result may be explained by the following: a student whose house is in stratum 4 likely has access to better facilities, among others, higher quality educational institutions.
It is worth noting that in the model's calculations a student's gender was not statistically significant.This result concurs with those obtained by Rivera (2011),which may indicate that the system does not discriminate between men and women, at least in the case of this test.The same was true for parents' level of education, level of overcrowding and monthly household income (Table 2).This may be due to the fact that these variables had highly homogeneous characteristics among students.For example, at regional level, the majority (52.76%) of parents had secondary education (completed or not completed) the average of the people/bedrooms relation was 2.2 (SD = 0.91) and household incomes were less than the prevailing minimum legal monthly salary (56.65%).According to the other model proposed, in the case of the Mathematics test an ordered logit model was also calculated.However, the results of the Brant and Likelihood-ratio tests indicate a violation of the parallel lines assumption5 .Because of the aforementioned, and in accordance with that proposed by Fu (1998), a generalized ordered logit model was calculated for the said test.As it has been previously noted, it may be the case that the failure of the assumption does not apply to all variables but rather only certain of these.In order to determine the said variables, the Wald test was applied, allowing for the observation, variable by variable, as to whether or not the parallel lines assumption holds (Williams, 2006).The test revealed that the assumption does not hold as regards the explanatory variables for gender and parents' level of education.The other variables maintain the same coefficient by level.For this reason, the model was adjusted so that those coefficients for which the assumption does not hold remain stable in the model.The analysis, which produces results both for the low and middle levels, is carried out based on the Odds Ratios.
The results indicate that gender is not significant at low levels of quality.However, being a man increases the probability of obtaining a high level in the Mathematics test by 2.18 times, as opposed to obtaining an average or lower level.This means that there is no discrimination when viewed from a low level of quality.However, at a medium level of quality, this begins to be latent.On the other hand -and as with the Critical Reading tests-an additional year of age as regards the student taking the test reduces the probability of obtaining a better result by 0.91 times.The results for socioeconomic stratum and SISBEN level are also similar to those obtained in the aforementioned test.As regards the first variable, the fact of belonging to stratum 2 increases the probability of obtaining a better result by 1.85 times, while the increase in probability is 4.84 times for those in stratum 4. With respect to the SECOND, belonging to a SISBEN level apart from the lowest levels (one, two or three) reduces the possibility of achieving a better performance in this test by 0.07 times.
For this model, parents' level of education was significant in 3 of its 10 categories, increasing the probability of better test results for students whose father or mother has spent a greater number of years in education.In particular, students whose parents who have completed a technical and/or technological degree or who have taken several semesters of a university degree program have an increased possibility of achieving a greater performance in the Mathematics test (3.01 and 6.99 times, respectively).Likewise, from mid-level upwards, where one of the parents has completed a university degree, the probability of obtaining a higher level in the said test increased by 8.71 times.This result may be explained by the fact that parents who have a higher level of education are able to provide more support for their children's education, from choosing the best educational institution to solving homework difficulties.
As opposed to the Critical Reading test, the Mathematics test, for its part, revealed that an increase in parents' level of income was indeed a significant factor in increasing the quality of results.In this case, a monthly family income of between 2 and less than 3 SMMLV, or between 5 and less than 7 SMMLV increases the probability of obtaining a high level by 2.63 and 6.54 times respectively.It is worth noting that neither the student's location nor the father's occupation or the level of household overcrowding were determining factors for this model (Table 3).Source: Prepared by the authors, based on ICFES-CHECK data for the second half of 2014.

CONCLUSIONS
The primary objective of this article was the identification and systematization of the socioeconomic determinants of secondary education quality in the Canal del Dique y Zona Costera region, based on the Critical Reading and Mathematics tests taken by 7,441 students in 2014.The results indicate that problems with institutional structures faced by the region have affected the quality of the education system, resulting in very low levels of quality, on a scale even more serious than the national average.Specifically, the Critical Reading and Mathematics tests show results which are lower than the national results, being more unfavorable for those municipalities of Canal del Dique belonging to the Bolivar department.This indicates poor student competencies with respect to those expected at this level of education, placing them at a disadvantage as regards their interaction with the region's production system and therefore their support in the promotion of local development.
As regards the analysis of the determinants of quality, the results of the ordered logit model, as applied in the determination of the variables which explain the results obtained in the Critical Reading and Mathematics tests6 of the Saber 11 exam, reveal that in the case of the former, the variables for age, father's occupation (in particular that of small businessman) and SISBEN level higher than 3 have a negative effect on the model as they lower the probability of a student obtaining a better result in the test.On the other hand, the probability of a student obtaining better results increases if he or she lives in an urban area and/or belongs to a higher socioeconomic stratum -strata 2 and 4-.In the case of the Mathematics test -and as with the Critical Reading test-the higher age of a student or belonging to another SISBEN level apart from the lowest levels reduces the probability of that student scoring in a higher level, while being a man and/or belonging to strata 2 or 4 increases the said probability.Parents' level of education, for its part, contributes to an increase in the probability of placing in a high level in the Mathematics scoring, this being likewise observed in higher levels of income (between 2 and less than 3 SMMLV, or between 5 and less than 7 SMMLV).
As applied to the Saber 11 exam Critical Reading and Mathematics tests, the results of the ordered logit model demonstrate that, for the former, age, father's occupation and level of SISBEN lower the probability of a student achieving an outstanding performance in the tests, while the said probability increases if the student lives in an urban area or belongs to a higher socioeconomic stratum (2 and 4).As for the latter test, an older student or one who belongs to another SISBEN level apart from the lowest levels, reduces the probability of scoring in a higher level, while being a man and/or belonging to strata 2 or 4 increases the said probability.Parents' level of education, for its part, contributes to an increase in the likelihood of scoring in a higher level of mathematics level, with the same being true for higher levels of income (between 2 and less than 3 SMMLV, or between 5 and less than 7 SMMLV).
The problems of secondary education quality analyzed in this article -added to those existing in terms of enrollment and relevance-reaffirm the poor capacity of the regional education system to train the human capital required for strengthening its production system and drive local development in a sustainable way.For this reason, the formulation of public policies which effectively contribute to redirecting the region on the path to development is needed with the greatest of urgency.Hence the contribution of this article towards in identifying the most significant determinants impacting on the quality of secondary education must also be noted, so as to be able to establish -alongside general policiesother more specific measures and actions with a greater guarantee of success.


Gender (Dummy): masculine or feminine. Age: student's number of years old. Location (Dummy): this variable allows for the identification of students as living in rural or urban areas. Socioeconomic stratum (Dummy): this classification was established by Article Nº 102 of Ley 142 de 1994 [Law 142 of 1994] for residential buildings to which public services are provided for the purpose of focusing in on subsidies and the collection of contributions.There are six socioeconomic strata: low-low; low; middle-low; middle; middle-high; and high. Parents' level of education (Dummy): identifies the highest degree of education obtained by one of the parents.There are 10 defined categories: primary not completed; primary completed; secondary (high school diploma) not completed; secondary (high school diploma) completed; technical or technological education not completed; technical or technological Education completed; university education not completed; university education completed; and postgraduate. Father's occupation (Dummy): establishes the current occupation of the father in the family.There are 12 Overcrowding: corresponds to the quantitative variable that is the result of dividing the number of people currently living in a residence by the total number of bedrooms in the residence. Monthly household income: allows for determination of a household's level of income, measured in Salarios Mínimos Mensuales Legales Vigentes [Prevailing Minimum Monthly Legal Salaries, abbreviated as SMMLV in Spanish] 2 defined categories: businessman; small businessman; employee in director or general manager position; management-level employee; technical or professional-level employee; assistant or administrative-level employee; laborer or operator; independent professional; freelance worker; homemaker; pensioner; and other activity or occupation. SISBEN Level (Dummy): allows for identification of level in the Sistema de Identificación de Potenciales Beneficiarios de Programas Sociales [System of Identification of Potential Beneficiaries of Social Programs, abbreviated as SISBEN in Spanish].The categories are as follows: level 1; level 2; level 3; classified at other SISBEN level; and not classified by SISBEN.

Table 1
Municipal distribution of data collected for calculation of ordered logit model

Table 2
Ordered logit model results for Critical Reading testin Canal del Dique y Zona Costera Region Prepared by the authors, based on ICFES data for second half of 2014.

Table 3
Results of generalized ordered logit model for Mathematics test in Canal del Dique y Zona Costera Region