Javascript is required
Anh, T. N. (2023). Education policy analysis methods. In International Conference on “Global Changes and Sustainable Development in Asian Emerging Market Economies (pp. 137–160). Cham: Springer Nature Switzerland. [Google Scholar] [Crossref]
Bao, S. & Han, B. (2022). Review and reflection on the 30 years of disciplinary evaluation system from the perspective of policy tools. Res. High. Educ. Eng., 3, 117–123. [Google Scholar]
Bian, Z. (2025). Human capital and socialism builders: a happy marriage? Analysing the construction of ‘high-level talent’ in Chinese higher education policy. High. Educ., 90(5), 1329–1346. [Google Scholar] [Crossref]
Chan, C. K. Y. (2023). A comprehensive AI policy education framework for university teaching and learning. Int. J. Educ. Technol. High. Educ., 20(1), 38. [Google Scholar] [Crossref]
Chen, J. & Wan, Z. (2016). Reflections on the modernization of educational governance system and governance ability. Educ. Res., 37(10), 25–31. [Google Scholar]
Fan, Q. (2023). From virtue ethics to legal prohibitions: Construction of the concept of a teacher’s moral anomie and implementation of its system. Peking Univ. Educ. Rev., 21(4), 26–41. [Google Scholar]
Guo, X., He, Y., & Liu, X. (2025). A study on the choice preference for discipline evaluation policy tools in China—Based on the analysis of policy documents from 1985 to 2023. J. Grad. Educ., 4, 93–100. [Google Scholar]
Krippendorff, K. (2018). Content Analysis: An Introduction to Its Methodology. Sage publications. [Google Scholar]
Li, X. (2025). The game in the gap: An investigation into the types of “local policies” in higher education governance and the research on generation logic. China High. Educ. Res., 41(7), 48–55. [Google Scholar] [Crossref]
Liang, H., Zhao, K., & Li, J. (2025). Responding to the new research assessment reform in China: The universities’ institutional hybrid actions. J. High. Educ. Policy Manag., 47(4), 473–489. [Google Scholar] [Crossref]
Liu, A. & Sun, M. (2025). From voices to validity: Leveraging large language models (LLMs) for textual analysis of policy stakeholder interviews. AERA Open, 11, 23328584251374595. [Google Scholar] [Crossref]
Liu, J. (2025). Quality assurance in Chinese higher education: Policy reforms and institutional challenges. Dev. Humanit. Soc. Sci., 1(3), 144–161. [Google Scholar] [Crossref]
Ma, J. & Yang, T. (2024). Research on the control of discretionary power in punishing ethical misconduct of university teachers. Fudan Educ. Forum, 22(2), 66–73. [Google Scholar]
McDonnell, L. M. & Elmore, R. F. (1987). Getting the job done: Alternative policy instruments. Educ. Eval. Policy Anal., 9(2), 133–152. [Google Scholar] [Crossref]
Mehta, S. D., Paul, S., Awiti, E., Young, S., Zulaika, G., Otieno, F. O., Phillips-Howard, P. A., Mason, L., & Bhaumik, R. (2025). Evaluation of large language models within GenAI in qualitative research. Sci. Rep., 15(1), 34993. [Google Scholar] [Crossref]
Mei, W. & Symaco, L. (2022). University-wide entrepreneurship education in China’s higher education institutions: Issues and challenges. Stud. High. Educ., 47(1), 177–193. [Google Scholar] [Crossref]
Qin, T. & Yu, C. (2024). Research on the control of discretionary power in punishing ethical misconduct of university teachers. J. High. Educ. Manag., 18(4), 100–113. [Google Scholar] [Crossref]
Rothwell, R. (1983). Innovation and firm size: A case for dynamic complementarity; or, is small really so beautiful? J. Gen. Manag., 8(3), 5–25. [Google Scholar] [Crossref]
Ruan, J., Cai, Y., & Stensaker, B. (2024). University managers or institutional leaders? An exploration of top-level leadership in Chinese universities. High. Educ., 87(3), 703–719. [Google Scholar] [Crossref]
Schmidt, V. A. (2008). Discursive institutionalism: The explanatory power of ideas and discourse. Annu. Rev. Polit. Sci., 11(1), 303–326. [Google Scholar] [Crossref]
Schneider, A. & Ingram, H. (1990). Behavioral assumptions of policy tools. J. Polit., 52(2), 510–529. [Google Scholar] [Crossref]
Sheng, B. (2003). Regulation in higher education: reconstruction of the relations among government, higher education institutions and the society. J. High. Educ., 2, 47–51. [Google Scholar]
Tai, R. H., Bentley, L. R., Xia, X., Sitt, J. M., Fankhauser, S. C., Chicas-Mosier, A. M., & Monteith, B. G. (2024). An examination of the use of large language models to aid analysis of textual data. Int. J. Qual. Methods, 23. [Google Scholar] [Crossref]
Wang, Z. & Yin, H. (2022). The root of the dilemma, clarification of understanding, and practical ways of fostering virtue through education for teachers in colleges and universities. J. High. Educ. Manag., 16(2), 75–82. [Google Scholar] [Crossref]
Wen, C., Clough, P., Paton, R., & Middleton, R. (2025). Leveraging large language models for thematic analysis: A case study in the charity sector. AI Soc., 41, 731–748. [Google Scholar] [Crossref]
Xia, J., Zhang, M. M., Zhu, J. C., & Fan, D. (2024). Reconciling multiple institutional logics for ambidexterity: Human resource management reforms in Chinese public universities. High. Educ., 87(3), 611–636. [Google Scholar] [Crossref]
Xiao, Y. & Liu, Z. T. (2024). Why do university faculties neglect teaching: based on the analysis of university teaching policy text. China High. Educ. Res., 40(2), 62–69. [Google Scholar] [Crossref]
Xu, X., Rose, H., & Oancea, A. (2021). Incentivising international publications: Institutional policymaking in Chinese higher education. Stud. High. Educ., 46(6), 1132–1145. [Google Scholar] [Crossref]
Zhang, H. E., Wu, C., Xie, J., Lyu, Y., Cai, J., & Carroll, J. M. (2025). Harnessing the power of AI in qualitative research: Exploring, using and redesigning ChatGPT. Comput. Hum. Behav. Artif. Hum., 4, 100144. [Google Scholar] [Crossref]
Zhao, K., Li, J., & Liang, H. (2025). Reconfiguring power: A field analysis of China’s research evaluation reform. High. Educ. [Google Scholar] [Crossref]
Zhuang, T., Liu, B., & Hu, Y. (2022). Legitimising shared governance in China’s higher education sector through university statutes. Eur. J. Educ., 57(1), 33–48. [Google Scholar] [Crossref]
Search
Open Access
Research article

A Large Language Model-Driven Framework for Policy Instrument Analysis: An Empirical Study of University Teachers’ Ethical Misconduct Governance Texts

Chunlei Qin1,
Xiangping Zhang2,
Jing Li3*,
Yanchun Zhu3,
Wei Zhang4
1
School of Economics, Central University of Finance and Economics, 100081 Beijing, China
2
Office of Human Resources, Central University of Finance and Economics, 100081 Beijing, China
3
Business School, Beijing Normal University, 100875 Beijing, China
4
School of Information, Central University of Finance and Economics, 100081 Beijing, China
Education Science and Management
|
Volume 3, Issue 4, 2025
|
Pages 209-218
Received: 10-10-2025,
Revised: 11-30-2025,
Accepted: 12-14-2025,
Available online: 12-29-2025
View Full Article|Download PDF

Abstract:

To address the limitations of traditional policy instrument analysis—such as labor-intensive coding, high subjectivity, and time-consuming procedures—this study develops a policy instrument analysis framework that integrates large language models (LLMs) and proposes a LLM-driven analytical workflow comprising six stages: case repository construction, policy instrument selection, content element generation, clause-level coding, reliability and validity testing, and quantitative analysis. Using governance texts on teachers’ ethical misconduct from 27 universities specializing in finance and economics as the empirical context, the study employed DeepSeek-R1 to identify policy instruments, classify content elements, perform clause-level coding, and conduct two-dimensional cross-tabulation analysis. The results indicate that these governance texts exhibit pronounced regulatory, procedural, and accountability-oriented characteristics, while also revealing a structural imbalance marked by strong front-end norm construction and relatively weak back-end remedial mechanisms. Overall, the proposed framework improves the efficiency and consistency of policy text analysis and provides a novel technical pathway for methodological innovation in education policy research.
Keywords: Large language models, Policy instrument analysis, Teachers’ ethical misconduct, University governance texts

1. Introduction

The development of Chinese higher education is deeply embedded in a governance structure characterized by state leadership, tiered coordination, and multi-actor collaboration (R​u​a​n​ ​e​t​ ​a​l​.​,​ ​2​0​2​4). At the national level, competent authorities such as the State Council and the Ministry of Education have continuously issued top-level policy designs that articulate the developmental goals and institutional framework of higher education through plans, regulations, and guiding documents, thereby providing strategic direction for university reform and development (C​h​e​n​ ​&​ ​W​a​n​,​ ​2​0​1​6). On this basis, individual universities translate and recode national policy requirements in light of their institutional missions, regional development needs, and the characteristics of their faculty and student bodies, formulating internal rules and operational provisions that enable effective alignment between national policy and institutional practice (L​i​u​,​ ​2​0​2​5). This process reflects a distinctive governance logic in Chinese higher education: the combination of top-down strategic guidance and bottom-up practical adaptation (S​h​e​n​g​,​ ​2​0​0​3).

To deepen theoretical understanding of this governance logic, strengthen universities’ institutional design capacity, and improve the adaptability and effectiveness of top-level policy implementation, scholars have examined policy areas such as personnel system reform (B​i​a​n​,​ ​2​0​2​5; X​i​a​ ​e​t​ ​a​l​.​,​ ​2​0​2​4), university charters (Z​h​u​a​n​g​ ​e​t​ ​a​l​.​,​ ​2​0​2​2), educational evaluation (X​i​a​o​ ​&​ ​L​i​u​,​ ​2​0​2​4), discipline assessment (B​a​o​ ​&​ ​H​a​n​,​ ​2​0​2​2; G​u​o​ ​e​t​ ​a​l​.​,​ ​2​0​2​5; L​i​a​n​g​ ​e​t​ ​a​l​.​,​ ​2​0​2​5), research evaluation (X​u​ ​e​t​ ​a​l​.​,​ ​2​0​2​1; Z​h​a​o​ ​e​t​ ​a​l​.​,​ ​2​0​2​5), and entrepreneurship education (M​e​i​ ​&​ ​S​y​m​a​c​o​,​ ​2​0​2​2). Using qualitative approaches such as policy instrument analysis, critical discourse analysis, and content analysis, this body of research has explored how national regulations and policies are implemented at the university level, focusing on institutional logic, policy instrument preferences, implementation mechanisms, and legitimacy construction. Such studies provide important empirical support for improving the alignment between policy content elements and policy instruments in university governance and for enhancing the completeness and coherence of institutional policy systems (L​i​,​ ​2​0​2​5). However, these research processes typically require repeated cross-level comparison and interpretive analysis between national policy texts and university regulations, involving concept extraction, category construction, category merging, and manual coding. This not only raises research costs but also increases the difficulty of identifying new concepts, distinguishing between adjacent categories, and maintaining coding consistency (K​r​i​p​p​e​n​d​o​r​f​f​,​ ​2​0​1​8), while making results vulnerable to researchers' subjective judgment (S​c​h​m​i​d​t​,​ ​2​0​0​8).

In recent years, large language models (LLMs) have shown clear advantages in processing large volumes of text and extracting complex information through contextual semantic representation, pattern recognition, semantic induction, and cross-task transfer (M​e​h​t​a​ ​e​t​ ​a​l​.​,​ ​2​0​2​5). Existing studies suggest that, when operating under researcher-defined task rules, LLMs can assist with concept extraction, theme induction, candidate category generation, pre-coding, and consistency review, thereby reducing repetitive manual labor and improving the efficiency and coverage of text analysis (T​a​i​ ​e​t​ ​a​l​.​,​ ​2​0​2​4).

Although existing studies on educational policy translation have generated important insights into institutional logic and implementation mechanisms, they have paid relatively limited attention to methodological development—especially technical improvements to content analysis and policy instrument analysis—and a systematic research pathway has yet to emerge (A​n​h​,​ ​2​0​2​3; C​h​a​n​,​ ​2​0​2​3). Against this background, this study introduces LLMs into policy instrument analysis and content analysis to construct a human-machine collaborative workflow. The aim is to improve the efficiency of concept extraction and thematic summarization, mitigate subjective bias in traditional manual coding, and provide new technical support for methodological refinement in education policy analysis.

2. Related Research

Exploring the translation of national policy into university institutional arrangements and the corresponding organizational response is a core issue in educational research (M​e​i​ ​&​ ​S​y​m​a​c​o​,​ ​2​0​2​2; X​i​a​ ​e​t​ ​a​l​.​,​ ​2​0​2​4). Studies on areas such as discipline assessment, research evaluation, teaching reform, and entrepreneurship education have drawn on institutional complexity theory (L​i​a​n​g​ ​e​t​ ​a​l​.​,​ ​2​0​2​5), Bourdieu’s field theory (Z​h​a​o​ ​e​t​ ​a​l​.​,​ ​2​0​2​5), and related perspectives, while employing policy analysis tools (B​a​o​ ​&​ ​H​a​n​,​ ​2​0​2​2; G​u​o​ ​e​t​ ​a​l​.​,​ ​2​0​2​5; X​i​a​o​ ​&​ ​L​i​u​,​ ​2​0​2​4), critical discourse analysis (B​i​a​n​,​ ​2​0​2​5), and semi-structured interviews (M​e​i​ ​&​ ​S​y​m​a​c​o​,​ ​2​0​2​2; X​u​ ​e​t​ ​a​l​.​,​ ​2​0​2​1). These studies have examined policy instrument preferences, typological structures, and content elements in policy formulation, thereby revealing the decision-making patterns and institutional logic underlying policy design. For example, G​u​o​ ​e​t​ ​a​l​.​ ​(​2​0​2​5​) used the McDonnell and Elmore policy instrument framework to analyze 52 policy texts on Chinese higher education discipline assessment from 1985 to 2023 and found that command instruments were dominant, whereas capacity-building and symbolic-persuasive instruments were underused; policy content elements were unevenly distributed, with relatively limited attention to assessment objectives and methods; and mismatches between policy instruments and content elements appeared across stages. B​a​o​ ​&​ ​H​a​n​ ​(​2​0​2​2​), drawing on a policy instrument classification framework based on the nature of governmental power resources, analyzed 40 policy texts on discipline assessment and found that these policies focused heavily on evaluation subjects, procedures, and methods, while neglecting evaluation content and outcomes. They therefore proposed improving the fit between policy content elements and policy instruments and strengthening the completeness and systematicity of discipline-assessment policy content. X​i​a​o​ ​&​ ​L​i​u​ ​(​2​0​2​4​) constructed a two-dimensional framework of policy instruments and teaching elements to analyze 170 teaching-related policies and examine the policy logic behind the marginalization of teaching by university faculty. They found that coercive instruments predominated, while incentive-based and capacity-building instruments were relatively scarce. Teaching evaluation lacked incentive-based instruments, teaching content lacked organizational-development instruments, and teaching methods lacked capacity-building instruments. The study argued that imbalances within policy instruments and weak alignment between instruments and teaching elements jointly contribute to situations in which teachers are unwilling or unable to teach, and it recommended increasing the use of incentive-based instruments, ensuring the full-process supply of teaching policy instruments, and improving the fit between policy instruments and the elements of teaching activities.

Most of the above studies rely on policy instrument analysis, supplemented by word-frequency analysis, thematic analysis, and manual content coding, to extract content elements and build a policy instrument-content element framework. From the perspective of instrument selection preferences and content elements, this line of inquiry examines how state-level policy orientations are translated into university development positioning, construction goals, and reform pathways. In practice, however, researchers often need to conduct multi-level comparison and interpretive analysis across national policy texts, local supporting documents, and internal university regulations, while repeatedly performing concept extraction, category construction, text segmentation, and manual coding around policy objectives, policy instruments, implementation mechanisms, and institutional arrangements. Such qualitative work is highly iterative and interpretation-dependent: category systems require continual refinement through induction and revision, while coding outcomes are easily affected by subjective judgment and frequently face problems of blurred category boundaries and weak cross-text consistency (K​r​i​p​p​e​n​d​o​r​f​f​,​ ​2​0​1​8). At the same time, semantic variation and interpretive flexibility across multi-level policy structures further increase the difficulty of concept identification and category delineation (S​c​h​m​i​d​t​,​ ​2​0​0​8).

In recent years, with rapid advances in text understanding, semantic clustering, summarization, classification, and structured information extraction, LLMs have shown strong potential for policy text analysis. On the one hand, given a researcher-defined theoretical framework or codebook, LLMs can support large-scale pre-coding and assisted classification. On the other hand, in exploratory analysis they can help identify high-frequency concepts, generate candidate categories, and detect semantic relations among similar categories (W​e​n​ ​e​t​ ​a​l​.​,​ ​2​0​2​5; Z​h​a​n​g​ ​e​t​ ​a​l​.​,​ ​2​0​2​5), thereby substantially improving the efficiency and coverage of content analysis (M​e​h​t​a​ ​e​t​ ​a​l​.​,​ ​2​0​2​5).

Overall, existing research still concentrates on explaining institutional logic and policy implementation mechanisms, with primary attention to how policies are understood, reinterpreted, and enacted in different organizational contexts. Most studies remain at the level of method selection and application, with insufficient attention to the intelligentization of analytical tools and methodological innovation, and relatively limited effort devoted to improving policy analysis methods themselves (C​h​a​n​,​ ​2​0​2​3). A systematic pathway has not yet been established for using technological means to improve the precision of concept extraction, the stability of category construction, and the consistency of coding (A​n​h​,​ ​2​0​2​3; C​h​a​n​,​ ​2​0​2​3). Accordingly, integrating large language models with policy instrument analysis and content analysis is becoming an important frontier in education policy research (L​i​u​ ​&​ ​S​u​n​,​ ​2​0​2​5).

3. Developing Large Language Model-Driven Framework for Policy Instrument Analysis

To address the shortcomings of traditional policy instrument analysis, this study proposes a framework that integrates LLMs into the analytical process. As shown in Figure 1, the framework consists of six components: case repository construction, analytical framework selection, content element generation, clause-level coding, reliability and validity testing, and quantitative comparison. It uses raw policy texts, policy instrument frameworks, content element categories, clause-level coding results, and research conclusions from existing studies as supervision and calibration resources to adapt a general-purpose LLM to policy instrument analysis tasks. Through consistency checks, reliability and validity testing, and manual review, the framework iteratively updates the dictionary, trigger-phrase library, policy instrument category table, and content element set. The calibrated model is then applied to new policy texts, with human review retained to identify institutional logic, preferences in policy instrument selection, and the distributional structure of content elements.

Figure 1. Framework flowchart
3.1 Case Repository Construction

Let an educational policy task T be defined as T = {G, L, U}, where G denotes the set of national-level policy texts on a given topic, L denotes the corresponding set of policy texts issued by local educational authorities, and U denotes the set of internal university institutional texts formulated in response to national policies and local guidance.

Based on the existing literature, a policy case repository Pcase is constructed by annotating texts according to themes such as personnel systems, teaching evaluation, discipline assessment, and research evaluation. For each study or set of policy texts that has already been analyzed, the following information is recorded.

Basic attributes of policy texts, including document ID, policy level, topic, publication date, and the original policy text.

The policy instrument framework PT adopted in the existing literature and its subcategories are identified to establish a policy instrument category table. For policy text i, the policy instrument framework is represented as PTi = {PTi1, PTi2, …, PTim}; each subcategory PTij is represented as PTij = {pti1, pti2, …, ptin}. For example, B​a​o​ ​&​ ​H​a​n​ ​(​2​0​2​2​) adopted the McDonnell and Elmore five-part policy instrument framework to analyze the ninth discipline-assessment policy document. In their coding scheme, the command instrument included three subcategories, such that PT9 = {Mandate, Inducements, Capacity-building, System-changing, Hortatory}, and PT9(mandate instruments) = {behavioral requirements, rule formulation, punitive provisions}.

Content element categories are extracted and summarized from existing studies on specific types of policy texts to form a content element set CE. For example, X​i​a​o​ ​&​ ​L​i​u​ ​(​2​0​2​4​) proposed that the content element set for teaching-related policy documents can be expressed as CE = {teaching content, teaching methods, teaching support, teaching evaluation}.

The coding experience from prior studies is stored in the format ‘policy text number-policy clause-policy content element-policy instrument-policy sub-instrument’ as EP, providing a reference for subsequent pre-coding and manual review.

A trigger-phrase library B is constructed around expressions associated with policy instruments. For any policy instrument PTi, its trigger-phrase set is denoted B(PTi) = {ts1, ts2, …, tsr}. For example, capacity-building instruments often appear in expressions such as ‘issue guidelines’, ‘conduct training’, ‘build platforms’, and ‘promote capacity enhancement’. A corresponding trigger-phrase set may be represented as B(PTi) = {issue annual discipline-research guidelines, conduct regular frontier training, integrate resources to build an interdisciplinary platform, promote high-level discipline development}.

3.2 Policy Instrument Choice

Given a national policy text giG, the LLM extracts its expressions of the policy purpose and objectives to form a semantic representation S(gi). For each case pk Pcase, the corresponding semantic representation S(pk) is extracted, and the semantic similarity sim(S(gi),S(pk)) is calculated.

On the basis of similarity scores, a candidate case set P1 containing thematically similar cases is retrieved from Pcase. Drawing on the policy instrument frameworks used in these cases, the LLM outputs candidate frameworks PT* together with reasons for their suitability. If the target text reflects logics such as resource provision, environmental shaping, and demand inducement, the three-part framework (R​o​t​h​w​e​l​l​,​ ​1​9​8​3) is prioritized. If the text primarily reflects governance approaches such as command, incentives, capacity building, system change, and symbolic persuasion, the five-part framework (M​c​D​o​n​n​e​l​l​ ​&​ ​E​l​m​o​r​e​,​ ​1​9​8​7) is more appropriate. If the text emphasizes how different instruments influence the behavior of target groups, the five-category framework (S​c​h​n​e​i​d​e​r​ ​&​ ​I​n​g​r​a​m​,​ ​1​9​9​0) is better suited.

Let PT0 denote the initial list of policy instrument categories derived from existing cases, and PT* the candidate categories proposed by the LLM for the new task text. After manual review and calibration, the task-adapted category list PT1 is obtained: PT1 = f(PT0, PT*, Revise), where Revise represents revisions made by the researcher in light of the consensus in prior studies, the characteristics of the new task text, and theoretical fit. Once PT1 is determined, the coding experience EP and trigger-phrase library B are used to specify the corresponding sub-instrument set PT1_sub.

3.3 Generation of Content Element Sets

For a given policy text d, the text is segmented by title, chapter structure, and semantic paragraphs to produce a set of structural units Seg(d), where Seg(d) = {segd1, segd2, …, segdq}. Based on text structure, governance functions, and category experience drawn from the case repository, the LLM then generates a candidate set of content elements for each structural unit, denoted CE*(d), where CE*(d) = {ced1, ced2, …, cedt}.

The element cedj typically falls into dimensions such as governance objects, policy objectives, responsible actors, resource allocation, implementation procedures, supervision and evaluation, use of results, and safeguard mechanisms. Guided by the principles of mutual exclusivity, completeness, and contextual appropriateness, the researcher then merges and revises CE*(d). After multiple rounds of consolidation, the final content element table for task T is formed as CE (T), where CE (T) = {CE1, CE2, …, CEm}.

3.4 Clause Coding

Let the set of clauses in policy text d be C(d) = {c1, c2, …, cn}. For each clause cj, both content elements and policy instruments are identified.

Using the predefined policy instrument category list PTi, the content element set CEi, the sub-instrument set PTi_sub, and the trigger-phrase library B, the LLM generates pre-coded results for each clause. For clause cj, the lexical feature vector can be represented as W(cj) = {wj1, wj2, …, wjs}, where wjk may include tool-related words or phrases such as ‘special fund support’, ‘included in performance evaluation’, ‘prohibited’, ‘must’, ‘encouraged’, and ‘conduct training’. Keyword similarity and trigger-sentence similarity are then calculated as simkey(cj, PTi_sub) and simsent(cj, B(PTi)), and combined into a matching score for clause cj with respect to a given policy instrument: Score(cj, PTi)=αsimkey(cj, PTi_sub)+βsimsent(cj, B(PTi)). The matching score between clause cj and a content element is denoted Score(cj, CEk)). The highest-scoring categories are taken as the candidate policy instrument and principal content element for that clause. The LLM’s pre-coded output is thus expressed as Code *(cj) = [CEk, PTi, ptis, Ej, Score(cj, CEk)], where ptis denotes the policy sub-instrument and Ej the model-provided rationale. Two researchers then independently review Code*(cj). If a clause has high matching scores for two or more policy instruments, the highest-scoring one is treated as the primary instrument.

3.5 Reliability Testing

A test set is randomly sampled 10% from the full set of coded units, and the following measures are calculated: R1 = Consistency(LLM, Human1), R2 = Consistency(LLM, Human2), R3 = Consistency(Human1, Human2).

Here, R1 and R2 measure the consistency between the LLM’s pre-coding and human judgments, while R3 measures inter-coder consistency between the two human coders. If R1, R2, and R3 all meet the preset threshold, this indicates that the domain-adapted LLM has satisfactory usability and that the category definitions are sufficiently clear.

3.6 Quantitative Analysis

Let freq(PTi)denote the frequency with which PTi appears in the text corpus. The usage frequency of each type of policy instrument is then calculated. Under a specific policy instrument framework, comparing the proportions of instrument use across the G, L, and U levels makes it possible to identify preferences in instrument selection during policy translation and to clarify the institutional logic through which universities implement national policy documents.

By integrating the horizontal dimension of policy instrument types with the vertical dimension of content elements, a two-dimensional cross-analysis of ‘policy instruments-content elements’ is conducted. Given a policy instrument category PTi and a content element category CEj, a cross-frequency matrix M = [mij]k×q is constructed.

Here, mij represents the frequency with which PTi and CEj co-occur across all coded clauses. Comparing the rows and columns of matrix M reveals the usage pattern of policy instruments for each content element and helps evaluate the degree of compatibility between policy content elements and policy instruments.

4. Empirical Analysis

4.1 Data Preparation and Case Database Construction

The governance of teachers’ ethical misconduct is a key institutional safeguard for implementing the fundamental task of fostering virtue and cultivating talent in higher education (Q​i​n​ ​&​ ​Y​u​,​ ​2​0​2​4). Existing studies have mainly approached this issue from legal and sociological perspectives, using content analysis, case studies, and related methods to focus on conceptual definitions of misconduct, accountability mechanisms, and the control of discretionary power (F​a​n​,​ ​2​0​2​3; M​a​ ​&​ ​Y​a​n​g​,​ ​2​0​2​4). However, much of this work remains at the level of textual interpretation, and important issues—including the identification of misconduct, the allocation of responsibilities, the operational requirements of governance, and modes of implementation—remain insufficiently clarified (W​a​n​g​ ​&​ ​Y​i​n​,​ ​2​0​2​2). This study therefore applies the framework proposed above to examine the implementation trajectory of teacher ethics governance from the perspectives of policy instrument selection and institutional logic, with a view to improving institutional design for the governance of teacher ethics and conduct.

Using the keywords ‘education policy’, ‘policy instruments’, and ‘higher education policy’, we first collected relevant studies from China National Knowledge Infrastructure, retrieved the corresponding policy texts, and constructed a case repository consisting of 15 documents, with five documents for each of the three major framework categories. DeepSeek-R1 was then invoked through Python to determine the policy instrument framework, identify sub-instrument categories, and conduct policy coding, while two education researchers independently completed manual coding. The resulting consistency scores—R1 = 0.86, R2 = 0.91, and R3 = 0.94—all exceeded the preset threshold, indicating that the adapted LLM performed well and that the category definitions were reasonably clear. We then searched the official websites of the Ministry of Education and finance-and-economics universities using the keywords ‘violations of teacher ethics and conduct’ and ‘teacher ethics and conduct’, collecting policy documents on the handling of such violations from 53 universities as well as national and provincial sources, including 2 national-level documents, 12 provincial- and municipal-level documents, and 28 university-level documents.

4.2 Establishing the Policy Instrument Framework

The two national institutional documents were uploaded to DeepSeek-R1 to extract their policy purposes, objectives, and core governance expressions, yielding the following semantic representation: ‘Under the guidance of top-level national policies, universities translate and localize these policies to build a high-quality faculty governance system centered on fostering virtue and cultivating talent, anchored in standards for teacher ethics and conduct, and combining accountability with institutionalized enforcement’. We then compared this representation with each case in the repository and calculated semantic similarity scores.

The similarity analysis showed that the target policy was semantically closest to two types of policy texts—discipline assessment and teaching evaluation—with similarity scores of 0.93 and 0.87, respectively. The text repeatedly contained expressions related to rule formulation, organizational arrangement, authoritative control, incentive and constraint mechanisms, and procedural oversight. Accordingly, the discipline-assessment case (G​u​o​ ​e​t​ ​a​l​.​,​ ​2​0​2​5), with the higher similarity score of 0.93, was selected as the benchmark. In G​u​o​ ​e​t​ ​a​l​.​ ​(​2​0​2​5​), the initial policy instrument category list was PT0 = {mandate, inducements, capacity-building, system-changing, Hortatory}. By contrast, the candidate tool categories generated by the LLM for the teacher ethics policy text were PT* = {authoritative regulations, procedures, organizational structures, disciplinary measures, supervision and accountability}. After manual review and calibration, the final category list was defined as PT1 = {Norm Specification, Procedural governance, Organizational arrangements, Disciplinary constraints, Supervision and accountability}.

The corresponding sub-instrument set PT1_sub was then specified by combining the sub-tool content extracted from the benchmark case with the word-segmentation results from DeepSeek-R1 and the LDA topic-mining results, yielding PT1_sub = [{explicit prohibition, violation identification, handling}, {reporting, investigation, verification, review, service of documents, appeal, reconsideration}, {working leading group, faculty affairs department, secondary units, division of responsibilities among relevant functional departments}, {criticism and education, public censure, suspension from teaching, transfer from post, revocation of qualifications, disciplinary action, revocation of teaching qualifications}, {primary responsibility, direct responsibility, accountability for dereliction of duty}].

4.3 Generating Content Elements

Each university policy text on teacher ethics and conduct was segmented by titles, chapter structure, and semantic paragraphs to generate a set of structural units. Based on text structure, governance functions, and category experience from the case repository—in which content elements of discipline-assessment policy texts mainly include evaluation entities, evaluation objectives, evaluation content, evaluation procedures, evaluation methods, and the publication and use of results—DeepSeek-R1 generated candidate content elements for each structural unit. These candidates were then consolidated and revised by the researchers according to operational principles. After multiple rounds of merging, the final content element table was defined as CE = {Policy objectives and basic principles, Definition of ethical misconduct, Organizational structure and division of responsibilities, Reporting reception and investigation mechanisms, Deliberation, decision-making, and Procedural safeguards, Disciplinary actions and qualification restrictions, Remedies, review, and resolution of disciplinary Actions, supervision, accountability, and implementation of responsibilities}.

4.4 Coding and Reliability Testing

Using the established policy instrument framework PT1, the sub-instrument set PT1_sub, and the content element table CE, DeepSeek-R1 performed clause-level tokenization and sentence-pattern extraction, calculated lexical and syntactic similarity, assigned content elements, policy instruments, and sub-instruments, identified trigger words, and produced both justifications and confidence scores for its classifications.

A validation set was randomly sampled from the full set of coded units, and R1, R2, and R3 were calculated accordingly. Because all three values exceeded 0.8, the policy instrument analysis method incorporating DeepSeek-R1 was deemed to have satisfactory validity.

4.5 Cross-Analysis of Policy Instruments and Content Elements

A two-dimensional cross-analysis of policy instruments and content elements was conducted for governance texts on teachers’ ethical misconduct from 27 universities specializing in finance and economics, producing a total of 1,469 coded units (see Table 1).

Table 1. Frequency distribution of the two-dimensional interaction between policy instruments and content elements

Content Elements\Policy Instruments

Norm Specification

Procedural Governance

Organizational Arrangement

Disciplinary Constraint

Supervision and Accountability

Total

Policy objectives and basic principles

337

4

8

3

4

356

Definition of teachers’ ethical misconduct

349

1

4

1

2

357

Organizational structure and division of responsibilities

3

2

64

0

3

72

Reporting reception and investigation mechanisms

11

141

16

0

0

168

Deliberation, decision-making and procedural safeguards

2

60

10

0

0

72

Disciplinary measures and qualification restrictions

22

0

3

163

0

188

Remedy, review, and termination of sanctions

12

64

14

14

2

106

Supervision, accountability and responsibility implementation

32

1

6

2

109

150

Total

768

273

125

183

120

1469

Note: $p$ < 0.0012, $Cramér’s V$ = 0.802

The results show that, in the policy instrument dimension, definition of norms was the most frequently used instrument type (768 instances, 52.28%), followed by procedural governance (273 instances, 18.58%) and disciplinary constraints (183 instances, 12.46%), whereas organizational arrangements (125 instances, 8.51%) and supervision and accountability (120 instances, 8.17%) appeared less frequently. This pattern suggests that the governance texts exhibit a distinct rules-first logic: they first establish clear normative boundaries for teacher ethics through a large number of provisions; they then rely on procedural governance to institutionalize identification, deliberation, and disciplinary handling; and finally, they supplement this structure with disciplinary constraints, organizational arrangements, and supervision and accountability to form a closed governance loop. In this sense, the governance of teachers’ ethical misconduct has developed into a regime centered on normative construction and procedural regulation.

In the content element dimension, Definition of ethical misconduct (357 instances, 24.30%) and Policy objectives and basic principles (356 instances, 24.23%) accounted for the largest shares. These were followed by Disciplinary actions and qualification restrictions (188 instances, 12.80%), Reporting reception and investigation mechanisms (168 instances, 11.44%), and Supervision, accountability, and implementation of responsibilities (150 instances, 10.21%), while Organizational structure and division of responsibilities and Deliberation, decision-making, and procedural safeguards each appeared 72 times (4.90%). This indicates that the sampled universities have concentrated primarily on defining forms and boundaries of misconduct and articulating institutional principles, while comparatively limited attention has been devoted to organizational coordination, procedural support, and follow-up remedial arrangements.

Further two-dimensional cross-analysis (see Table 2) shows a statistically significant association between policy instrument types and the distribution of content elements (χ² = 3783.02, df = 28, p < 0.001; Cramér's V = 0.802), indicating that the sampled universities have developed a relatively stable configuration structure between content elements and instrument types in texts governing teachers’ ethical misconduct. Specifically, the five most prominent combinations were ‘Definition of teachers’ ethical misconduct—Norm specification’ (349 instances, 23.76%), ‘Policy objectives and basic principles—Norm specification’ (337 instances, 22.94%), ‘Disciplinary measures and qualification restrictions—Disciplinary constraint’ (163 instances, 11.10%), ‘Reporting, acceptance, and investigation mechanisms—Procedural governance’ (141 instances, 9.60%), and ‘Supervision, accountability, and implementation of responsibilities—Supervision and accountability’ (109 instances, 7.42%). Together, these five combinations accounted for 74.81% of all coded units. This pattern indicates that governance texts in finance-and-economics universities largely proceed along the logic of ‘behavioral definition → procedural handling → disciplinary enforcement → accountability’, and thus display strong regulatory, procedural, and accountability-oriented features. At the same time, content related to Remedy, review, and termination of sanctions occupies only a small share of the overall texts (106 instances in total, or 7.21%). Although these provisions are associated with procedural governance (64 instances), their frequency is far lower than that of front-end normative and disciplinary combinations. This distribution reveals a clear structural tendency: strong front-end norm construction but weak back-end safeguards. In other words, while the governance texts have developed a relatively complete regulatory system for defining misconduct, handling procedures, and disciplinary constraints, they remain comparatively underdeveloped in back-end supportive arrangements such as remedial rights, review and appeal procedures, and the lifting of disciplinary actions. This weakness may affect both perceived fairness and the sustainable operation of the overall governance system.

Table 2. Major instrument—element combinations and their proportions (top five)

Rank

Instrument–Element Combination

Frequency

Percentage (%)

Cumulative (%)

1

Definition of teachers’ ethical misconduct—Norm specification

349

23.76

23.76

2

Policy objectives and basic principles—Norm specification

337

22.94

46.70

3

Disciplinary measures and qualification restrictions—Disciplinary constraint

163

11.10

57.80

4

Reporting reception and investigation mechanisms—Procedural governance

141

9.60

67.40

5

Supervision, accountability and responsibility implementation—Supervision and accountability

109

7.42

74.82

5. Conclusions

Policy instrument theory is an important approach for analyzing the institutional logic of educational policies and the preferences reflected in instrument selection. It has been widely applied in studies of institutional translation involving personnel reform, discipline assessment, and curriculum development, but it also suffers from several limitations, including labor-intensive coding, time-consuming content analysis, and strong subjectivity in concept definition and category consolidation. LLMs, with their strong capabilities in natural language understanding and large-scale text processing, have attracted growing attention in information collection, processing, and analysis. In response to the limitations of traditional policy instrument analysis, this study proposes an LLM-integrated analytical framework and tests it empirically using governance documents on teachers’ ethical misconduct from 27 universities. The findings indicate that governance in finance-and-economics universities unfolds along the main axis of ‘behavioral definition → procedural handling → disciplinary enforcement → accountability’, exhibiting pronounced regulatory, procedural, and accountability-oriented characteristics. At the same time, however, the overall structure still reflects strong front-end norm construction and relatively insufficient back-end remedial support. The empirical results suggest that the proposed framework is both feasible and effective for analyzing the institutional logic of educational policy texts. Future research will expand the number of policy texts analyzed and extend the framework to fields such as energy, public health, and traffic safety to further test its generalizability.

Author Contributions

Conceptualization, C.Q. and J.L.; methodology, X.Z.; software, Y.Z.; validation, W.Z., X.Z., and Y.Z.; formal analysis, C.Q.; investigation, J.L.; resources, X.Z.; data curation, X.Z.; writing—original draft preparation, J.L.; writing—review and editing, J.L.; visualization, X.Z.; supervision, X.Z.; project administration, C.Q.; funding acquisition, C.Q. All authors have read and agreed to the published version of the manuscript.

Funding
This work was supported by the National Natural Science Foundation of China (Grant No. 72574254), the MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Grant No. 24YJAZH228), and the Theoretical Research Project on Party Building and Ideological and Political Work at Central University of Finance and Economics (Grant No. DJA25001).
Data Availability

The data used to support the research findings are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References
Anh, T. N. (2023). Education policy analysis methods. In International Conference on “Global Changes and Sustainable Development in Asian Emerging Market Economies (pp. 137–160). Cham: Springer Nature Switzerland. [Google Scholar] [Crossref]
Bao, S. & Han, B. (2022). Review and reflection on the 30 years of disciplinary evaluation system from the perspective of policy tools. Res. High. Educ. Eng., 3, 117–123. [Google Scholar]
Bian, Z. (2025). Human capital and socialism builders: a happy marriage? Analysing the construction of ‘high-level talent’ in Chinese higher education policy. High. Educ., 90(5), 1329–1346. [Google Scholar] [Crossref]
Chan, C. K. Y. (2023). A comprehensive AI policy education framework for university teaching and learning. Int. J. Educ. Technol. High. Educ., 20(1), 38. [Google Scholar] [Crossref]
Chen, J. & Wan, Z. (2016). Reflections on the modernization of educational governance system and governance ability. Educ. Res., 37(10), 25–31. [Google Scholar]
Fan, Q. (2023). From virtue ethics to legal prohibitions: Construction of the concept of a teacher’s moral anomie and implementation of its system. Peking Univ. Educ. Rev., 21(4), 26–41. [Google Scholar]
Guo, X., He, Y., & Liu, X. (2025). A study on the choice preference for discipline evaluation policy tools in China—Based on the analysis of policy documents from 1985 to 2023. J. Grad. Educ., 4, 93–100. [Google Scholar]
Krippendorff, K. (2018). Content Analysis: An Introduction to Its Methodology. Sage publications. [Google Scholar]
Li, X. (2025). The game in the gap: An investigation into the types of “local policies” in higher education governance and the research on generation logic. China High. Educ. Res., 41(7), 48–55. [Google Scholar] [Crossref]
Liang, H., Zhao, K., & Li, J. (2025). Responding to the new research assessment reform in China: The universities’ institutional hybrid actions. J. High. Educ. Policy Manag., 47(4), 473–489. [Google Scholar] [Crossref]
Liu, A. & Sun, M. (2025). From voices to validity: Leveraging large language models (LLMs) for textual analysis of policy stakeholder interviews. AERA Open, 11, 23328584251374595. [Google Scholar] [Crossref]
Liu, J. (2025). Quality assurance in Chinese higher education: Policy reforms and institutional challenges. Dev. Humanit. Soc. Sci., 1(3), 144–161. [Google Scholar] [Crossref]
Ma, J. & Yang, T. (2024). Research on the control of discretionary power in punishing ethical misconduct of university teachers. Fudan Educ. Forum, 22(2), 66–73. [Google Scholar]
McDonnell, L. M. & Elmore, R. F. (1987). Getting the job done: Alternative policy instruments. Educ. Eval. Policy Anal., 9(2), 133–152. [Google Scholar] [Crossref]
Mehta, S. D., Paul, S., Awiti, E., Young, S., Zulaika, G., Otieno, F. O., Phillips-Howard, P. A., Mason, L., & Bhaumik, R. (2025). Evaluation of large language models within GenAI in qualitative research. Sci. Rep., 15(1), 34993. [Google Scholar] [Crossref]
Mei, W. & Symaco, L. (2022). University-wide entrepreneurship education in China’s higher education institutions: Issues and challenges. Stud. High. Educ., 47(1), 177–193. [Google Scholar] [Crossref]
Qin, T. & Yu, C. (2024). Research on the control of discretionary power in punishing ethical misconduct of university teachers. J. High. Educ. Manag., 18(4), 100–113. [Google Scholar] [Crossref]
Rothwell, R. (1983). Innovation and firm size: A case for dynamic complementarity; or, is small really so beautiful? J. Gen. Manag., 8(3), 5–25. [Google Scholar] [Crossref]
Ruan, J., Cai, Y., & Stensaker, B. (2024). University managers or institutional leaders? An exploration of top-level leadership in Chinese universities. High. Educ., 87(3), 703–719. [Google Scholar] [Crossref]
Schmidt, V. A. (2008). Discursive institutionalism: The explanatory power of ideas and discourse. Annu. Rev. Polit. Sci., 11(1), 303–326. [Google Scholar] [Crossref]
Schneider, A. & Ingram, H. (1990). Behavioral assumptions of policy tools. J. Polit., 52(2), 510–529. [Google Scholar] [Crossref]
Sheng, B. (2003). Regulation in higher education: reconstruction of the relations among government, higher education institutions and the society. J. High. Educ., 2, 47–51. [Google Scholar]
Tai, R. H., Bentley, L. R., Xia, X., Sitt, J. M., Fankhauser, S. C., Chicas-Mosier, A. M., & Monteith, B. G. (2024). An examination of the use of large language models to aid analysis of textual data. Int. J. Qual. Methods, 23. [Google Scholar] [Crossref]
Wang, Z. & Yin, H. (2022). The root of the dilemma, clarification of understanding, and practical ways of fostering virtue through education for teachers in colleges and universities. J. High. Educ. Manag., 16(2), 75–82. [Google Scholar] [Crossref]
Wen, C., Clough, P., Paton, R., & Middleton, R. (2025). Leveraging large language models for thematic analysis: A case study in the charity sector. AI Soc., 41, 731–748. [Google Scholar] [Crossref]
Xia, J., Zhang, M. M., Zhu, J. C., & Fan, D. (2024). Reconciling multiple institutional logics for ambidexterity: Human resource management reforms in Chinese public universities. High. Educ., 87(3), 611–636. [Google Scholar] [Crossref]
Xiao, Y. & Liu, Z. T. (2024). Why do university faculties neglect teaching: based on the analysis of university teaching policy text. China High. Educ. Res., 40(2), 62–69. [Google Scholar] [Crossref]
Xu, X., Rose, H., & Oancea, A. (2021). Incentivising international publications: Institutional policymaking in Chinese higher education. Stud. High. Educ., 46(6), 1132–1145. [Google Scholar] [Crossref]
Zhang, H. E., Wu, C., Xie, J., Lyu, Y., Cai, J., & Carroll, J. M. (2025). Harnessing the power of AI in qualitative research: Exploring, using and redesigning ChatGPT. Comput. Hum. Behav. Artif. Hum., 4, 100144. [Google Scholar] [Crossref]
Zhao, K., Li, J., & Liang, H. (2025). Reconfiguring power: A field analysis of China’s research evaluation reform. High. Educ. [Google Scholar] [Crossref]
Zhuang, T., Liu, B., & Hu, Y. (2022). Legitimising shared governance in China’s higher education sector through university statutes. Eur. J. Educ., 57(1), 33–48. [Google Scholar] [Crossref]

Cite this:
APA Style
IEEE Style
BibTex Style
MLA Style
Chicago Style
GB-T-7714-2015
Qin, C. L., Zhang, X. P., Li, J., Zhu, Y. C., & Zhang, W. (2025). A Large Language Model-Driven Framework for Policy Instrument Analysis: An Empirical Study of University Teachers’ Ethical Misconduct Governance Texts. Educ. Sci. Manag., 3(4), 209-218. https://doi.org/10.56578/esm030401
C. L. Qin, X. P. Zhang, J. Li, Y. C. Zhu, and W. Zhang, "A Large Language Model-Driven Framework for Policy Instrument Analysis: An Empirical Study of University Teachers’ Ethical Misconduct Governance Texts," Educ. Sci. Manag., vol. 3, no. 4, pp. 209-218, 2025. https://doi.org/10.56578/esm030401
@research-article{Qin2025ALL,
title={A Large Language Model-Driven Framework for Policy Instrument Analysis: An Empirical Study of University Teachers’ Ethical Misconduct Governance Texts},
author={Chunlei Qin and Xiangping Zhang and Jing Li and Yanchun Zhu and Wei Zhang},
journal={Education Science and Management},
year={2025},
page={209-218},
doi={https://doi.org/10.56578/esm030401}
}
Chunlei Qin, et al. "A Large Language Model-Driven Framework for Policy Instrument Analysis: An Empirical Study of University Teachers’ Ethical Misconduct Governance Texts." Education Science and Management, v 3, pp 209-218. doi: https://doi.org/10.56578/esm030401
Chunlei Qin, Xiangping Zhang, Jing Li, Yanchun Zhu and Wei Zhang. "A Large Language Model-Driven Framework for Policy Instrument Analysis: An Empirical Study of University Teachers’ Ethical Misconduct Governance Texts." Education Science and Management, 3, (2025): 209-218. doi: https://doi.org/10.56578/esm030401
QIN C L, ZHANG X P, LI J, et al. A Large Language Model-Driven Framework for Policy Instrument Analysis: An Empirical Study of University Teachers’ Ethical Misconduct Governance Texts[J]. Education Science and Management, 2025, 3(4): 209-218. https://doi.org/10.56578/esm030401
cc
©2025 by the author(s). Published by Acadlore Publishing Services Limited, Hong Kong. This article is available for free download and can be reused and cited, provided that the original published version is credited, under the CC BY 4.0 license.