An Empirical Case of Education Policy Implementation in Serbian VET

Purpose: Education policy implementation is as important as policy design. This study applies a literature-based, multi-dimensional framework for success factors and barriers to vocational education and training (VET) reform implementation in the case of a new dual VET law in Serbia. We use the framework to assess factors related to implementation, then relate these factors to actual implementation progress to determine how factors relate to progress. In this application of the framework, we examine whether implementation success requires high scores in every dimension. Methods: This is a mixed methods study. We conduct document analysis of key resources related to the structure and intention of the reform. We also statistically analyze a dataset of two rounds of interviews conducted during the pre-and early-implementation phases. These interviews include key stakeholders from the public and private sectors, and from national, regional, and local-level actors. We examine how the framework's dimensions and determinants relate to implementation progress. Results: The implementation of the law is moving forward in Serbia, making this a successful case of progress in policy implementation. Despite this progress, the factors for implementation are not all strong. We find that the content dimension of the framework is a


Introduction
The science describing the best policies and practices in vocational education and training (VET) has moved forward dramatically in recent years, to the point that policymakers can make evidence-based choices about wide-ranging topics from learning styles in VET (e.g., Jossberger et al., 2010;Morris, 2018) to the connection between learning and work (e.g., Bolli et al., 2018;Rintala & Nokelainen, 2020) or VET for international development (e.g., Li & Pilz, 2021;McGrath et al., 2020). Research continues to demonstrate the value of innovative programs and strategies, and impact-oriented politicians are converting findings into policy (e.g., Gulikers et al., 2018;Zancajo & Valiente, 2019). One of the next challenges for VET research is providing evidence to support the implementation of new policies. Currently, many jurisdictions are engaged in implementing new or changed VET policies. Although VET is a special case of education policy, it is the majority program at the uppersecondary level in numerous countries 1 . Evidence-based implementation is particularly challenging in this field given the limited research on VET reform implementation Fluitman, 1999;Holmes, 2009) compared to the more extensive evidence supporting reform and policy design (e.g., Ceric et al., 2020;Dumbrell & Smith, 2013;Gillis, 2020). Therefore, this study sets out to investigate whether key factors from the literature related to VET reform implementation apply in practice. Our aim is to begin applying a quantitative lens on the implementation process so we can make comparisons and derive lessons that apply across contexts. This strand of research should eventually help policymakers avoid costly implementation pitfalls and failures. 193

Caves, Oswald-Egg
We focus on the implementation process of a new dual VET 2 law in Serbia. This case is useful because various actors are involved in the implementation process, including representation from the education and employment systems (Bolli et al., 2018). Furthermore, the president of Serbia is invested in the implementation process because he is convinced by evidence that dual VET has important advantages over school-based VET. This high-level interest and the choice to start the reform by drafting and passing a law makes the Serbian case an example of a top-down implementation case, making the implementation process more easily observable and traceable.
To analyze the implementation of the new Law on Dual Education (LDE) in Serbia, we use a dataset of stakeholder interviews and combine it with field study notes. We specifically investigate whether each dimension of the VET implementation framework supported the implementation of the law, then evaluate whether all dimensions were necessary for implementation progress. We find that the key determinants in three of the framework's five dimensions are drivers of implementation progress, determinants in one dimension are unclear, and the final dimension appears to be a barrier to implementation progress. Despite this, we observe implementation progress over the period of study. From this, we argue that it is not necessary for all key determinants in all dimensions to be drivers of implementation progress for the implementation process to proceed.

Theory
The evolution of policy implementation research is typically described in three generations (Goggin et al., 1990;Pülzl & Treib, 2017). First, in the 1970s, came case studies of specific policy implementation processes that identified individual variables like hierarchical complexity (Pressman & Wildavsky, 1973) or conflict among implementers (Jeffrey, 1978). Second, through the 1980s, came a wave of discourse on top-down compared to bottom-up implementation, as well as the first analytical frameworks for implementation (i.e., Mazmanian & Sabatier, 1983). The third and ongoing generation of implementation research has focused on developing theoretical frameworks that can be operationalized, tested, and applied across policy fields (Nilsen, 2015). In VET specifically, implementation research tends to take the form of standalone case studies. These studies examine the implementation of new programs (e.g., Brodie et al., 1995;Schmees, 2020), practices (e.g., Runhaar & Sanders, 2013;Tudor, 1991), or policies (e.g., Dalby & Noyes, 2018;Zancajo & Valiente, 2019). These case studies typically describe individual VET implementation processes and draw conclusions about why they succeed or fail (e.g., Marhuenda-Fluixá et al., 2019). However, it is difficult to aggregate the results of many case studies into a consistent set of findings-an issue in VET research even outside the sub-field of implementation (e.g., Gessler & Siemer, 2020;Scheuchet al., 2021). Without frameworks or efforts to place individual case findings in a broader context, the literature demonstrates very little development over time and fails to coalesce around any shared findings or key ideas . Caves et al. (2021) review the available literature on VET policy implementation and attempt to capture and organize the lessons of the many available case studies. They code the success factors and barriers reported in each study into a framework based on Najam's (1995) 5C Protocol. These factors are the determinants-independent variables that are barriers or drivers to implementation-that comprise the determinant framework (Nilsen, 2015). The framework's key determinants are the most important implementation drivers, which come up frequently and are consistently important for implementation success. We focus on the key determinants in each dimension with the most supporting evidence in the existing literature according to Caves et al. (2021).
The framework has been applied in other VET implementation case studies. Bolli et al. (2020) operationalize the framework to predict reform up-scaling in Nepal, finding in their pre-implementation analysis that the planned reform scores highly in all dimensions. Vieira et al. (2021) apply the framework to a VET program for in-service teachers in Albania, arguing that the in-service training approach is sustainable because it meets the criteria in all five dimensions and especially prioritizes context fit, commitment from the Ministry of Education, and the inclusion of universities as key actors. Although the framework has been applied to indicate why policy implementation should succeed, it needs further testing to determine if it captures the right factors and to examine how key determinants and dimensions interact to affect implementation progress.
To examine how the framework relates to implementation progress, we develop detailed hypotheses for each dimension based on its key determinants. Figure 1 shows the key determinants in the framework by dimension. 195 Caves, Oswald-Egg Note: This figure shows the theoretical framework of policy implementation-adapted to VET reform implementationconsisting of five dimensions with its two to three key determinants. Figure 1: VET Implementation Framework (adapted from Caves et al., 2021) In the Content dimension, the key determinants are strategy and accountability. Najam (1995) defines this dimension as "the Content of the policy itself-What it sets out to do (i.e., goals); how it problematizes the issue (i.e., causal theory); how it aims to solve the perceived problem (i.e., methods)" (Najam, 1995, p. 4). Strategy is a broad determinant, covering whether there is a sense of clarity, strategy or vision in the reform as opposed to confusion, short-termism, or a feeling that things are unclear. Accountability is the presence of quality assurance measures, regulations, and accountability as opposed to the lack thereof.
Hypothesis 1 (H1): The Content dimension-represented by the determinants strategy and accountability-is necessary for implementation to proceed.
The Context dimension covers "The nature of the institutional Context-The corridor (often structured as standard operating procedures) through which policy must travel, and by whose boundaries it is limited, in the process of implementation" (Najam, 1995, p. 4). Its key determinants are coordination and context fit. Coordination is the orderly organization of activities from multiple actors toward the reform's goals, efficiency, and good management as opposed to bureaucracy. Context fit is the appropriateness of the project and process for the institutions, culture, and other context factors of the target area, as opposed to a mismatch or bad fit. 196

Implementation in Serbian VET
Hypothesis 2 (H2): The Context dimension-represented by the determinants coordination and context fit-is necessary for implementation to proceed.
The third dimension, Commitment, entails "The Commitment of those entrusted with carrying out the implementation at various levels to the goals, causal theory, and methods of the policy" (Najam, 1995, p. 4). Its key determinants are political will and cooperation. Political will is general demand for the reform among stakeholders like political leaders, teachers, students, and parents. This is opposed to supply-side reform that may be met with disinterest and opposition from the public and leaders. Cooperation is willingness to work together both within and across institutions, as opposed to conflict.
Hypothesis 3 (H3): The Commitment dimension-represented by the determinants political will and cooperation-is necessary for implementation to proceed.
Capacity is "The administrative Capacity of implementers to carry out the changes desired of them" (Najam, 1995, p. 4). Its key determinants are personnel, finances, and research. Personnel is the people needed to carry out the work of implementation, both quantitatively in terms of their availability and numbers and qualitatively in terms of their specific skills and knowledge. Financial resources are money to hire new people, make new materials, develop new processes, and communicate information. Finally, information covers research evidence informing policymaking, general information on best practices, and reform evaluation.
Hypothesis 4 (H4): The Capacity dimension-represented by the determinants personnel, finances, and information-is necessary for implementation to proceed.
The last dimension, Clients, includes "The support of Clients and Coalitions whose interests are enhanced or threatened by the policy, and the strategies they employ in strengthening or deflecting its implementation" (Najam, 1995, p. 4). Caves et al. (2021) focus on the engagement of individual actor levels and types to differentiate this dimension from the others, which focus on the different kinds of engagement by interest groups and institutions.
The key determinants in this dimension are employers, intermediaries, and educators. Employers reflect engagement with actors from the employment system for VET design, delivery, and updating. Intermediaries are industry associations, trade unions, and other facilitating bodies. Educators are actors from the education system including education governance, school leaders, and teachers (as well as teachers' unions).

Hypothesis 5 (H5): The Clients dimension-represented by the determinants employers, intermediaries, and educators-is necessary for implementation to proceed.
Finally, all of the above hypotheses imply that all dimensions are necessary for implementation progress. Therefore, we develop a hypothesis to that effect to test whether all dimensions are in fact required for the implementation process to continue.
H6: Every dimension-Content, Context, Commitment, Capacity, and Clients-is necessary for implementation to proceed, as is every key determinant.
These are the specific hypotheses we test using empirical data from the Serbian case. The next section describes that case in detail.

Serbian Case
The reform in question is the implementation of a VET law in Serbia called the Law on Dual Education. VET is the majority upper-secondary program in Serbia, serving 75% of students in each cohort (ETH Zürich, 2017). VET in Serbia is provided through upper-secondary VET schools where students spend a total of three or four years (depending on their occupational profile). Before the new law was introduced, upper-secondary VET in Serbia comprised only VET schools delivering school-based VET. After the law's introduction, VET schools could deliver school-based VET under the previous law and/or dual VET under the new law, the latter regulating work-based learning.
In the school-based model that existed before the new law, students learned general education content (30-40%) and vocational content (55-65%) comprising both vocational theory and vocational practice. The curriculum also included some elective subjects (5%). Practical skills were delivered almost entirely through "professional practice" in school-based workshops with only very little (infrequent and inconsistent) work-based learning, so VET in Serbia was a school-based program. This program was defined primarily by the Law on Secondary Education and its associated bylaws and rulebooks.
The Law on Dual Education was introduced in 2017 to add a dual VET program to Serbia's upper-secondary landscape 3 . Education governance in Serbia is highly centralized and both the drafting and implementation of the law were top-down processes with limited consultation. Unlike school-based VET, dual VET is the specific type of VET where students spend at least 25% of total program time in work-based learning as opposed to school-based learning (Organisation for Economic Co-operation and Development [OECD], 2017).
The overall goal of the new law was to increase VET quality and relevance to eventually improve major national issues with very high youth unemployment and high rates of young people not in employment, education, or training (NEET), especially among young people with upper-secondary qualifications 4 . Although the law was added in the context of Serbia's efforts to enter the European Union, the political dialogue we observed focused on dual VET as a measure to improve VET's perceived low quality. Other education-system measures like the development of a National Qualifications Framework were more directly tied to EU entry.

Content of the Law on Dual Education
The new law formalized and nationally regulated dual VET at the upper secondary level, as opposed to the existing program where VET schools and companies could collaborate on an ad hoc basis without specific regulation. Although the previous law governing schoolbased VET allowed for work-based learning, it was uncommon and usually in very small amounts. Overall, work-based learning experiences in the existing school-based program were extremely inconsistent, there were no skill-or competency-related outcomes assigned to the work-based learning experience, and there was no formal oversight for student safety or quality assurance while at work. Moreover, students were not protected by contracts or a requirement to have trainers or supervisors or compensated if they made productive contributions to their training companies.
The law on dual VET set out to address those issues, implementing a new apprenticecompany matching process and regulations covering how much time should be spent on workplace learning, students' remuneration and non-monetary compensation, companies' participation in career guidance and counseling, training certification for companies, instructor licensing for in-company trainers, and contracts for both the student-company and school-company relationships (Serbian Government, 2017).

Implementation of the Law
The Ministry of Education 5 was charged with overseeing implementation along with the Chamber of Commerce 6 , which represents companies. The Ministry of Education's regional offices, called Regional School Administrations, were required to support the Ministry of Education in implementation. Similarly, the Chamber of Commerce used its own regional offices to provide ground-level support. The Chamber of Commerce is responsible for certifying companies and for training and licensing in-company instructors. Each school and company must follow the new processes and regulations when implementing dual VET, although both could choose whether (and for how many occupations and students) to implement dual VET or stick with the existing school-based program. 199

Caves, Oswald-Egg
Serbia promulgated the new law in 2017, and full-scale implementation began in the 2019-2020 school year. Some international donor organization-led pilots had been underway since 2013 through organizations like development agencies from Germany and Austria. In the first year, schools could continue offering VET under the old regulations with no changes. Because the old regulations allowed for unregulated work-based learning, this created competition between VET programs during the 2019-2020 school year, resulting in low uptake (Renold et al., 2020a). Essentially, schools could still collaborate with companies to offer small amounts of work-based learning without contracts, company certification, licensed instructors, curricula, or-most importantly-compensation for students. This made the old school-based program not only much easier but also much cheaper for companies (Bolli et al., 2021), who were unwilling to switch to a regulated and paid model when they were not required to do so. Schools were unwilling to switch because they would lose full control in the new more cooperative model and because they perceived increased workplace learning as a threat to teachers' importance (Renold et al., 2020b). Therefore, 2020 amendments to the Law on Dual Education and the Law on Secondary Education drastically reduced and limited the work-based learning allowed under school-based VET. This helped drive increased participation in dual VET in the 2020-2021 school year .
Thus far, implementation has led to significant changes in the key processes outlined by the law. The number of classes of dual VET in the Ministry of Education's enrollment plans have steadily increased, and nearly every key tenet of the law began to translate from goals to action (Renold et al., 2020a;Renold et al., 2021). 33 occupational profiles were available in the 2018-2019 school year, and 54 were available by the 2021-2022 school year. However, implementation is not perfect: Requirements like company accreditation, instructor licensing, student remuneration, and training contracts are all near 70% in the second year of implementation. However, in schools and occupational profiles that are still using school-based VET, those indicators are typically less than 10%, ranging between 0% and 44% (for schoolcompany contracts, which were already common although not previously required; Renold et al., 2020a). Therefore, we treat this implementation case as one that is making progress.

Materials and Methods
Our goal is to categorize each dimension as driving implementation progress or acting as a barrier based on the data related to the determinants in each dimension. To do so, we revise documents and field notes and evaluate interview data. In the following, we will describe the data sources and the methods to classify the dimensions.

Documents and Field Notes
Relevant documents for our analysis are reform documents such as the law itself, its bylaws, and research on the reform (Renold & Oswald-Egg, 2017). The content and set up of these documents contain valuable information for our study. Additionally, we review field notes made during four visits to Serbia over a period of 1.5 years plus virtual visits and meetings for another 1.5 years. These include our individual field notes taken during the visits and an internal summary document written immediately after each visit. The field notes capture our observations in meetings and discussions with leaders and stakeholders in Serbia's VET sector.

Interview Dataset
We use a dataset of stakeholder interviews carried out in late 2018 and early 2019 (referred to as 2019 wave) and in early 2020 (referred to as 2020 wave). The data was collected for a series of policy reports on implementation progress (Renold et al., 2019(Renold et al., , 2020a(Renold et al., , 2020b. Our involvement during data collection for the reports was in the interview design 7 , data cleaning, and data analysis. We did not carry out the interviews, that was done in Serbian by a local think tank in Serbia. Interviews were in person or over the phone. However, for evaluation of the answers, the interviews were translated into English and this is the version we got for our analysis in this study. For this study we apply that existing dataset to the task of testing the implementation framework. The interview data captures the overall attitude of deeply involved stakeholders toward implementation and their specific concerns, challenges, and opportunities with the law overall and with specific issues. The interview questions were structured for stakeholders though there were some differences in questions among the different stakeholder groups. For this study, we draw a specific set of questions from the interview dataset to address our hypotheses. Table 1 summarizes the interview questions we used for each hypothesis. 201 Caves, Oswald-Egg Responses to each interview question include quantitative data in yes or no answers and a five-point Likert scale, depending on the type of information needed. It also includes optional qualitative open responses. We primarily use the quantitative responses, and include qualitative responses to support and deepen our discussion of results. We draw 212 interviews based on the items we need to test our hypotheses. Table 2 summarizes the interviewed subjects we use by stakeholder group. Subjects in the school and company categories are representatives, sampled to balance geographic representation. Five schools began piloting dual VET in 2013, 84 implemented it in 2017-2018 (one year before full implementation), and 247 schools were not participating at that time according to the Serbian Ministry of Education. The 2019 sample includes principals and program coordinators from three pilot schools (six interviews total), as well as those that started offering dual VET upon implementation. The data also includes interviews with the principals of 19 schools that did not participate in dual VET in its first year. In the 2020 sample, the data includes 26 schools in dual VET and 12 schools not in dual VET. Of the 600 companies currently involved in education, 18 companies appear in the 2019 dataset along with eight non-participating companies. In 2020, the dataset includes 11 companies in dual VET and seven not involved. 202

Implementation in Serbian VET
For the government and chamber of commerce, the sample of interviewed subjects represents the full population of dual-VET-related actors, who were interviewed in both 2019 and 2020. Despite them being the same subjects in both years, we count them as independent observations because they were asked a year apart. We do the same with the three respondents from trade unions and five respondents from international donor groups in both sample years. To reduce social desirability bias, interviewers were external to the project and the government and we ensure respondents' anonymity.

Analytical Methods
To examine H1, which classifies the Content dimension according to its key determinants strategy and accountability, we examine reform documents (e.g., rulebooks, bylaws, communications) and our own field notes from a series of visits to Serbia before and during the implementation period. We apply directed content analysis (Hsieh & Shannon, 2005) to analyze this dimension, using existing research or theory to form our expectations for coding. Our codes specifically search for the key determinants relevant for H1 in the Content dimension-strategy and accountability. Therefore, we examine the notes and documents looking for evidence of procedures and guidance for reform implementers (strategy) and evaluation procedures about the implementation of the new law (accountability). Our analysis is straightforward: If the documents contain information specifying procedures for both strategy and accountability, this dimension is a driver for implementation. If one key 203 Caves, Oswald-Egg determinant is not covered, the Content dimension is unclear. If both key determinants are not covered, this dimension is a barrier.
To examine H2-H5, which cover the dimensions Context, Commitment, Capacity and Clients, we apply the existing stakeholder interview dataset. We still use some directed content analysis to provide illustrative quotations from the interviews, but we rely more heavily on the quantitative element of the data. The statistical analyses are simple, mainly relying on descriptive statistics and inferential analysis (analysis of variance) to test whether differences among stakeholder groups are statistically significant. Three key determinants (coordination and context fit in the Content dimension, cooperation in the Context dimension) feature questions where participants were asked whether they agree with a statement about that determinant. In these cases, we use the percentage agreement and identify the determinants as drivers or barriers depending on how many interviewees agreed that the determinant was present.
For other key determinants, we have five-point Likert scale data. In every case, five points is the maximum desirable score (adequate resources, strong political will, etc.) and one point is the minimum undesirable score. We examine overall averages and each individual actor group's score to determine whether scores are significantly different from the maximum and minimum. Specifically, we use the standard deviation to calculate a >95% confidence interval (two standard deviations each way from the mean), and if the maximum five-point score falls within that range it is not different from the maximum. If the minimum five-point score falls within the range it is not different from the minimum. If these distributions on average and for all actor groups include the maximum and not the minimum, we code the determinant as a driver. If the distributions include both or neither of the maximum and minimum, we code the determinant as unclear. We also code determinants as unclear if we observe variation across actor groups. If actor groups are generally significantly different from the maximum but not the minimum, the determinant is a barrier.
For H6, we assess our findings from all dimensions together. We find support for H6 if all dimensions are classified as drivers based on their key determinants. If some of the dimensions are unclear or are barriers, we reject H6.

Results
We present results by dimension, addressing each hypothesis in turn. We start with the content dimension followed by the context dimension, the commitment dimension, the capacity dimension and the client's dimension. We conclude by considering our findings overall in terms of the framework.

Content
The determinants under Content are strategy and accountability. The strategy determinant covers the clarity and quality of the plan for implementation. The law and its bylaws describe a series of interdependent critical moments throughout the implementation process. For example, companies need to be certified by the Chamber of Commerce and must have trained, licensed instructors before they can host students, which means the Chamber of Commerce had to develop certification and licensing processes in addition to implementing them and training every interested company and instructor. Other processes are similarly complex, requiring multiple steps and a variety of actors. For example, schools match students with companies according to a new process requiring every company in a given occupation to interview every student in that occupation-with their parents-before both sides submit priorities and the school program coordinator allocates students. Although the end goal is relatively clear, the implementation strategy is not fully articulated, and is made more difficult because the project is complex, large-scale, and rapid. According to our field notes, VET leaders in the Ministry of Education and Chamber of Commerce have engaged quickly and deeply with the project, but have very large workloads of process development, process implementation, new partner engagement, and legal interpretation. Many of the key processes required to make dual VET work are not articulated in the law and must be created as needed. We found that stress levels were very high despite a strong, shared desire to make everything work. The strategy determinant in this case captures the large challenges associated with making major changes and innovating under pressure. Therefore, although the law and its bylaws outline a clear goal and stakeholders' effort is very high, the implementation strategy is a barrier in this case.
Accountability is the second determinant in the Content dimension, and is mainly addressed by the bylaw on the evaluation of institutions. This defines evaluation procedures and indicators, mainly focused on school performance. The accountability measures internal to dual VET are adequate, but they are undermined because the program itself is optionalexisting workplace learning under the school-based VET programs could continue, despite being less regulated and not requiring remuneration. Because implementers can choose whether or not they wish to follow the law in delivering dual VET, the accountability of the law is weak. Therefore, the accountability determinant is also a barrier to implementation.
The two determinants under the Content dimension are strategy and accountability, and both are barriers in this case. Therefore, we categorize the Content dimension as a barrier to implementation in this case. H1 states that Content is a necessary condition for implementation progress, but implementation is progressing even though Content is not a success factor. Therefore, we reject H1. 205 Caves, Oswald-Egg

Context
This is the first dimension we address using interview data directly related to its determinants-coordination and context fit. Table 3 summarizes data on these determinants by respondents' actor group.
Overall, 75% of respondents agreed that all the relevant actors and institutions were coordinated for implementation. Non-participating companies, non-participating schools, and Trade Unions drove the negative responses, while the Ministry and National Bodies, Regional Chamber of Commerce, and both participating companies and VET schools were all optimistic. Most respondents (85%) affirmed that the law fit with the needs of Serbian students and companies, addressing context fit. The trade unions, non-participating schools, and non-participating companies are least-though still mostly-convinced. The qualitative data shows interesting disagreement on who dual VET works for, with comments like "It totally fits with the needs of students, but companies aren't satisfied, sometimes they don't see the benefits from dual VET" (interview wave 1, respondent 81, answer 27) contrasting with "The initiative for the adoption of the law…came from companies" (interview wave 1, respondent 87, answer 27). These two positions are not necessarily polar opposites, but they do illustrate the variation in the way interviewees perceive the way the law fit into Serbia's context. Overall, however, the consensus appears to be that the law is a good fit for Serbia.

Implementation in Serbian VET
The two determinants under this dimension are both strong. Interview respondents generally agree that the law is a good fit for the Serbian context, also reporting that coordination among institutions for dual VET is generally sufficient. However, it is important to note the variation within these apparently strong numbers: Even though most interviewees believed the law fit the Serbian context and coordination was sufficient, there was a non-negligible sample of voices-especially among non-participating schools and companies-that disagreed. Overall, though, it appears that the Context dimension's determinants represented a success factor for implementation. Therefore, we find support for H2.

Commitment
Two questions in the interview dataset directly address political will and cooperation. Table 4 shows institutions' willingness to implement by actor group. On a five-point Likert scale, the average response was a 4.5. The highest scores came from the Ministry and National Bodies, the national Chamber of Commerce, regional Chamber of Commerce offices, and the schools and companies that already participate in dual VET. The lowest came from trade unions and non-participating schools and companies, but none were below the three-point neutral threshold. With a standard deviation of 0.9, the lowest score within two standard deviations would still be a relatively neutral 2.7. Non-participating schools and companies had some of the lowest scores and relatively large standard deviations, meaning that responses within the insignificant range span nearly the full one-to-five-point distribution. However, no score distribution included the lowest score without also including the highest score and all distributions included the highest possible score. Participating stakeholders showed strong commitment, like this participating school interviewee who said, "We are prepared to cooperate with everybody who recognizes the importance of dual VET and who wants to improve it" (interview wave 1, respondent 164, answer 51). The question for cooperation asked whether respondents believed their organizations were willing to cooperate with other organizations for implementation. The response to this question was overwhelming, with 99% of respondents saying yes (see Table 4 Cooperation). The only actor groups that had any "no" answers at all-and even then, a small minority-were Regional School Administrations and non-participating companies. In their comments, many respondents noted that they were already cooperating or beginning to cooperate with external partners. Some saw their own roles as very important, including a regional Chamber of Commerce respondent who states that the organization "educates, informs and motivates all the participants" (interview wave 1, respondent 175, answer 51). 207

Caves, Oswald-Egg
Both determinants in this dimension are very strong. Willingness to implement is high among affected stakeholders, and nearly all actors report that their institution would cooperate with others for implementation. We categorize the Commitment dimension as a driver dimension for implementation in this case. Therefore, we find support for H3.

Capacity
The interviews asked specifically about each of the three key resource types in the Capacity dimension: Personnel, finances, and information. Table 5 shows results by resource type and actor. Overall, interviewees reported that they had most of the resources they needed to implement, with personnel at 4.2 on a five-point Likert scale, finances at 3.6, and information at 4.3. Based on two standard deviations, no resource score was significantly different from fully adequate (five points) and all are significantly different from completely inadequate (one point). Under personnel, regional Chamber of Commerce offices and the participating schools and companies were better resourced than most, and the Ministry and National Bodies reported a lack of personnel along with the Regional School Administrations. Finances scored the lowest overall, with the Regional School Administrations and non-participating schools feeling the least financially prepared. Participating companies reported the highest financial resources. Finally, there was a top-down trend in information and research resources, with the government and the national Chamber of Commerce reporting having more adequate evidence and 208 Implementation in Serbian VET information than Regional School Administrations and non-participating companies. The exceptions were participating schools and regional Chamber of Commerce, who reported higherthan-average information resources. Based on two standard deviations, no actor's situation for personnel or information was significantly different from five and all were significantly different from one-all actors were roughly adequate for these two capacity dimensions. For finances, the situation was the same except for Regional School Administrations whose two-standarddeviation range fell below the maximum at the top and below the minimum at the bottom. This actor was significantly different from adequate financial resources, but it was the only such deviation among all actor groups and all three resource types.  Some qualitative responses indicated more complexity. Although resources were generally adequate in the government, one respondent stated that "We don't have enough employees so we are overloaded by tasks" (interview wave 1, respondent 84, answer 43). Regional School Administrations, who had the lowest scores, reported planning to hire a new person for implementation, stating "We have a lot of other duties" (interview wave 1, respondent 72, answer 43). More specifically, one Regional School Administration stated that "We have information but we don't have materials" (interview wave 1, respondent 66, answer 47).
Qualitatively-although not quantitatively-different regional branches of the Chamber of Commerce stated very different resource levels. While one region stated, "[National Chamber of Commerce] staff provides all necessary information and materials" (interview wave 1, respondent 169, answer 47), another contradicted, "Training through work is a new system so there is not enough distribution of material, but constant work, information exchange, and promotion of the dual VET system" (interview wave 1, respondent 175, answer 47). The Cham- 209 Caves, Oswald-Egg ber of Commerce was a very important actor and was new to the system, so their having sufficient resources was important.
Participating schools, participating companies and regional Chambers of Commerce had the highest overall capacity. However, even among these more prepared actors there was still some demand for further resources. One school leader stated that personnel was adequate, saying, "On particular profiles yes, but on some profiles existing teaching staff need more support" (interview wave 1, respondent 12, answer 41). However, schools that already participated in dual VET were generally confident, like the respondent who simply asserts, "We have qualified and motivated human resources" (interview wave 1, respondent 106, answer 41).
The quantitative data indicates that capacity is generally close to sufficient in all three determinants of personnel, finances, and information for stakeholders participating in the new law. Only Regional School Administrations lack capacity, and even then, only in the financial-resource category. However, the qualitative responses uncover some concerns and potential weak points. It is unclear whether the Capacity dimension is a barrier or support to implementation progress in this case. Out of an abundance of caution, we can neither reject nor find support for H4.

Clients
The key determinants in the Clients dimension are employers, intermediaries, and educators. The interview data includes questions for each actor group on their own motivation to participate-covered under commitment-as well as a question asking each actor to report whether the others are willing to implement the law. We aggregate actor groups into the three determinant categories. Educators are represented by schools (both participating and not), the government, and Regional School Administrations. Participating and non-participating companies go into the "employers" category. Finally, the Chamber of Commerce, regional Chamber of Commerce offices, and international donors are intermediaries. We leave out trade unions because they fall under their own minor determinant in the original framework. Table 6 shows self-reported and peer-reported engagement by aggregated group. 210  In self-reported engagement, the overall average was 4.5 out of a five-point Likert scale, indicating strong engagement. Intermediaries have the highest self-reported engagement, at 4.9. No group's score was significantly different from five points and all were significantly different from one. Government and intermediary respondents explained their high engagement by stating that they were required to enact the new law, with comments like "[we participate] because it is in the scope of our work" (interview wave 2, respondent 26, answer 52). Employers were not similarly required, and their motivations were more diverse (e.g., "We need the competent employees" (interview wave 2, respondent 85, answer 52), "to support the local community" (interview wave 2, respondent 81, answer 52), "to influence the development of young people"). Peer-reported engagement was also strong, at 4.3 points. Employers had lower peer-reported engagement (3.6), but were significantly different from one point and not different from five. The other two groups were considered highly engaged by their peers (4.5 for both) and were also not significantly different from five points and significantly different from one point. One comment highlights these public-private differences: "The state/government is willing to support dual education and this is positive, but the way decisions are made…causes resentment among other actors, especially teachers and unions" (interview wave 2, respondent 37, answer 90). Even if all scores and two-standard-deviation ranges were high enough to indicate general commitment, the differences in level may create friction.

Implementation in Serbian VET
Because the average score and all component scores were not significantly different from maximum engagement and were significantly different from minimal engagement, we categorize the Clients dimension as a driver dimension for implementation in this case. This indicates support for H5. 211 Caves, Oswald-Egg 5.6 All Dimensions Figure 2 summarizes the results, with black used to show barrier dimensions and white for driver dimensions. Grey dimensions are unclear. We find that Content is a barrier, Context, Commitment, and Clients are driver dimensions, and Capacity is unclear. With this information, we turn to our sixth hypothesis and examine how configurations of determinants and dimensions may explain implementation progress.
Given that implementation is progressing in this case while Content is a barrier dimension and Capacity is unclear, it is not necessary for all five dimensions to be driver dimensions or even neutral. It is also not necessary for all 12 key determinants to be supportive of implementation-strategy, accountability, and finances are either unclear or barrier determinants. This indicates that some individual dimension or determinant, or some combination or configuration thereof, is a sufficient condition for implementation.

Figure 2: Summary of Results
Note: This figure summarizes our main results, where in the case of Serbia we find the dimensions in white to be drivers, the one in black to be a barrier, and the one in grey to be unclear for implementation of the new VET law.
It is impossible given a single case study to be specific, but these results point to a set of possibilities for how dimensions and determinants could relate to implementation progress. Based on these results, it is possible that Clients, Context, Commitment, or some combination thereof is a sufficient condition for reform. Similarly, at the determinant level it may be possible that the key is some individual or combination of context fit, political will, coordination, cooperation, personnel, information, employers, intermediaries, and educators. Finally, it may be that instead of some specific factor being necessary or sufficient, implementation progress requires some threshold quantity of dimensions or determinants. The evidence presented here could show that implementation can proceed if only one dimension is a barrier, as long as two or more are drivers, or as long as the median dimension is at least neutral.
Despite that fact that not all dimensions are success factors, implementation is progressing in this case. Therefore, we find that all dimensions are not necessary for implementation progress, nor are all determinants. As a result, we reject H6.
The discussion section turns from the specific hypotheses to the overarching question of why some implementation efforts in VET succeed and others do not, and whether this framework helps answer that question.

Discussion
The ultimate goal of this research agenda is to understand why some efforts at education policy implementation progress while others do not. Other research is focused on the nature of implementation and what it means to successfully implement a policy, but we do not take up those issues. Instead, we focus on the factors related to progress or lack thereof. In this discussion, we consider how case-specific features may play a role in implementation progress, identify some of the ways dimensions and determinants might relate to one another, and relate our findings to the broader context and literature of education policy implementation. A single empirical case is limited by definition, and some condition of this case may be part of the sufficient conditions for its implementation progress. The reform discussed here is a top-down formal process that started with a law. We observe the impact of this in the interview data, when respondents state things like "We have to -it is in our job description to implement MoESTD decisions" (interview wave 1, respondent 69, answer 51). Therefore, it may be that the implementation drivers in this case are commitment, clients, and a top-down reform, and a bottom-up reform would have a completely different configuration like requiring every dimension or sufficiency from just one unrelated to commitment and clients.
Even if commitment and clients are indeed sufficient for implementation progress across cases, the implications of that finding vary significantly by case. For example, a bottom-up reform cannot use legal obligation to proceed, so would need to develop relationships with specific actors and engender commitment through different-and probably more time-consumingmeans. A case implementing VET for the first time may lack the intermediaries that play a key role in the Clients dimension, so that project may have to build intermediaries first and then engage with them.
That challenge raises the as-yet-unanswered question of how determinants and dimensions relate to each other. The ideal endpoint of research like this would be a weighted framework of dimensions and key determinants that can serve as a guide for improving the chan- 213 Caves, Oswald-Egg ces of implementation progress. At this point, however, it is not clear how items interact. There may be a weighting scheme that can show which items are more and less important, but the solution may also be a configurational approach that provides implementers with a few possible ways to combine items and reach their goals. This study takes a more configurational approach-focusing on sufficiency and necessity-but the real mechanism could be more additive.
Empirical research in the broader field of education policy implementation has already highlighted similar determinants to the ones we use here. For example, Morris and Scott (2003) identify inertia, cynicism, lack of coordination, and low capacity as barriers in Hong Kong. In the Philippines, communication gaps, a convoluted network of linkages, weak coordination, and low accountability lead to corruption and failed implementation (Reyes, 2009). Time pressure, top-down reform, lack of financial and human resources, and poor management have all been barriers in the UK (Baird & Lee-Kelley, 2009) Other implementation factors are more complex, or even positive. Little (2011) characterizes political will as a double-edged sword following her analysis of a reform in Sri Lanka. Reeves and Drew (2012) point out the dynamic nature of policy implementation, highlighting successive recontextualizations from political will to policy plan, then plan to practical action, and finally from action to dissemination. In a relatively rare example of a case study recounting successful policy implementation in education, Salazar-Morales (2018) credits the improvement of Peru's Ministry of Education to successful long-term planning and political consensus, among other factors. Bolli et al. (2020), applying the same framework for the upscaling of a dual VET program in Nepal, find that the dimensions Commitment, Capacity, Clients and Context are all drivers whereas Content is a barrier. These studies indicate that we can interpret our findings broadly across both contexts and education policy subfields.
Overall, our findings reinforce that VET implementation is not necessarily a linear or clean process. Even in a case where progress is being made in a measurable way, there have been hurdles, disagreements, and amendments along the way. Despite that, though, implementation of the new program continues. Our challenge as researchers is to find a framework or theory that can lift the key elements of this implementation process out of its context, disentangling what is context-or project-specific from the consistent patterns that might apply generally. By focusing on how the progress of this project's implementation fits into an empirical framework, we do lose some of the rich detail on what makes this process special. However, we do that to gain broader insight into the features that might make this project common.

Conclusion
In our policy implementation case, we find that this project's content is a barrier to implementation, its capacity is unclear, and context, commitment, and clients are driver dimensions. Given that implementation is progressing as of this writing, this indicates that not all dimensions are necessary for implementation progress, and that some or some combination of dimensions or determinants is sufficient. The case further develops intriguing possibilities that dimensions and determinants vary in their importance for implementation progress, that some threshold proportion of enabling factors is sufficient, and that context-specific factors may also be part of the configuration driving implementation. Implementing policy changes is a crucial step on the progression from evidence-based policy design to improved outcomes. Existing frameworks and checklists are largely untested, limiting their utility for both theory and practice. This study tests one framework of key implementation determinants. Although the determinants we analyze are VET-specific and our determinant-level findings apply mainly to VET, the dimensions we investigate apply to policy implementation in general. Therefore, our findings and method are applicable in all types of education policy implementation.
This study makes three main contributions to the literature. First, we find that we can begin to quantify a reform's implementation factors, and this process shows great promise for cross-case comparison and possibly even operationalization into a measurement tool. Second, we find initial evidence for the success of even imperfect implementation processes-implementation can proceed even if some determinants or dimensions are not success factors. In this case, most of the key determinants are success factors, but we cannot draw any conclusions about the threshold. Finally, we develop some potential explanations-to be further examined in future research-for how the determinants of education policy implementation may interact, combine, or configure to drive implementation progress.
The main limitations of this study are its simplified application of the framework, its non-causal nature, and its single-case scope. We begin testing the framework at the dimension level using only the key determinants. The framework includes other, less-important determinants that we do not assess, and further studies designed to examine those would make important contributions. Another interesting avenue for further research is the relative weighting of key determinants and dimensions, which we do not address. The analyses we present use a relatively large interview dataset with both qualitative and quantitative data, but they are descriptive, not causal. If there is social desirability bias it would overestimate the dimensions' contribution to implementation, meaning that even less is required for the implementation to progress. Finally, we report results by stakeholder group, treating every group equally and every interviewee within the group equally, but this gives more weight to individuals in smaller groups (e.g., the government) and less to those in larger groups (e.g., firms). The sample represents each group, and in doing so requires more responses from larger and more diverse groups.
Like all case study research, we describe only one case. While this case is a useful starting point, further work replicating this approach should investigate implementation processes with different structures, but the Serbian case is a very useful starting point. Further research and additional cases of both ongoing and historical reforms can fill in the gaps and begin to establish patterns. The value of this study lies in replication and we look to future research to determine whether individual determinants or dimensions are necessary, sufficient, or correlated with implementation progress. We will continue to follow this case to determine whether eventual implementation outcomes follow a similar pattern to progress during the implementation phase.