Generative AI for Promoting Higher Level of Thinking in the Educational Setting: An Analysis Using Lee’s Model of Thinking Levels
Abstract:
The advent of AI chatbots such as ChatGPT has revolutionized the field of education by offering convenient information accessibility, although accompanied by worrying concerns about the cultivation of critical thinking skills. Nevertheless, there is a lack of extensive research regarding the extent to which learners can develop critical thinking skills in a certain discipline through the utilization of ChatGPT. This research aims to evaluate the capability of ChatGPT to demonstrate critical thinking in its responses, particularly in the domain of cybersecurity. Its objective is to conduct a complete assessment of the analytical capacities of ChatGPT, considering its growing integration into educational settings. In this connection, ChatGPT was presented with a series of inquiries with increasing levels of complexity within the intricate realm of cybersecurity. The responses were subjected to analysis using Lee’s Model of Thinking Levels, which involved categorizing them into “recall”, “rationalization”, or “reflectivity”. The findings suggested that ChatGPT exhibited a prominent level of critical thinking skills, especially in the authentic contexts.1. Introduction
The landscape of education is fundamentally reshaped by the emergence of new generative artificial intelligence (GAI) technologies, with systems like ChatGPT at the forefront (Grassini, 2023). The emergence of generative AI signifies a basic departure from conventional retrieval-based models. In contrast to the conventional approach of searching and extracting existing data, generative models possess the capability to generate entirely original content that is specifically customized to a certain prompt. However, concerns about overreliance on AI-generated content remain owing to its lack of depth, nuance, and accuracy across different domains (Alqahtani et al., 2023).
The cultivation of critical thinking skills, encompassing the capacity to examine information, question assumptions, evaluate evidence, and formulate well-founded conclusions, is widely acknowledged as a primary competency that education should foster to facilitate students’ academic and professional success (Mahanal et al., 2019). Prior studies explored the impact of technology like AI tutors and learning analytics systems on critical thinking (Alam, 2021; Zawacki-Richter et al., 2019). However, research focused specifically on generative chatbots like ChatGPT is still emerging (Exintaris et al., 2023). An exhaustive assessment is necessary for determining the level of thoughtfulness and cognitive capabilities exhibited by generative artificial intelligence tools across different domains. With the popularity of ChatGPT and other similar tools, students are increasingly relying on them in their studies and academic tasks. Therefore, it is important to determine the amount of critical thinking exhibited in the responses generated by these tools (Rusandi et al., 2023; van den Berg & du Plessis, 2023). This evaluation seeks to determine whether these tools can effectively assist students or learners in acquiring a thorough understanding of various subjects, while also facilitating the development of their critical thinking skills and the ability to apply acquired knowledge in real-world contexts.
Hence, evaluating the capacity of ChatGPT to reason and promote critical thinking level is an open research question. In this study, we aim to explore the potential of ChatGPT, a generative AI tool, in fostering critical thinking skills. The proposed research question was: “Can ChatGPT demonstrate high levels of critical thinking in its responses?” This question stemmed from the need to understand if ChatGPT could promote critical thinking. To address this, we undertook a thorough evaluation of the cognitive levels in the responses generated by ChatGPT using cybersecurity as a case study, employing Lee’s Model of Thinking Levels (Lee, 2005) as a methodological framework for evaluating thinking levels. Our approach involved analyzing these responses with Lee’s Model of cognitive development and critical thinking, thus allowing us to assess the depth, complexity, and nuance of the tool’s output. In this work, we examined the field of cybersecurity as a complex area and analyzed the responses provided by ChatGPT to questions of varying difficulty levels, ranging from basic to intermediate and advanced.
This research made several unique contributions to the current literature on generative AI and critical thinking in higher education. First, it focused specifically on evaluating the ability of ChatGPT, a state-of-the-art generative AI tool, to promote critical thinking. This study employed a systematic methodology using Lee’s Model of Thinking Levels to assess ChatGPT’s responses across different question categories and cognitive complexities.
This study contributes to existing literature in three specific ways. First, unlike broader studies on generative AI in education, this work applied Lee’s Model of Thinking Levels as a focused analytical framework to evaluate the cognitive depth of ChatGPT-generated responses. Second, the study examined ChatGPT’s responses across three levels of cybersecurity prompts: basic, intermediate, advanced, and real-world scenario-based questions. Third, by comparing response quality across these prompt categories, the study provided exploratory evidence on the types of prompts that were more likely to elicit rationalization and reflectivity from ChatGPT. These contributions distinguished the present work from general discussions of AI in education by offering a structured and model-based analysis of its critical-thinking potential.
The paper consists of five primary sections. The paper commences with the Introduction, wherein an in-depth overview of the research and its significance is presented. Next, the Literature Review section thoroughly examines prior research and pertinent scholarly literature to define the research background. The Methodology section provides an all-inclusive overview of the research strategy. The Response Analysis component of this study offers the research results and engages in a detailed discussion, to provide valuable insights and interpretations of the findings. The study ultimately ends with a Conclusion section.
2. Literature Review
Critical thinking is the cognitive capacity to systematically evaluate and analyze information logically, question underlying assumptions, assess the strength of evidence, and arrive at well-founded and valid conclusions (Pithers & Soden, 2000). The concept of critical thinking has been explored and developed by various disciplines, including philosophy, education, and psychology (Pithers & Soden, 2000). This cognitive talent empowers students to surpass conventional memorization of facts and attain a more profound understanding of concepts across various academic fields (Vargas Alfonso, 2015). The practice of reflecting on one’s ideas could enhance students’ learning and professional efficacy, enabling them to go beyond just information intake (Jony et al., 2017). Therefore, the reflection process could encourage further learning, promote self-analysis, and facilitate the resolution of real-world problems.
Research indicated that the capacity to engage in critical thinking, such as question assumptions, assess evidence, and employ logical reasoning was crucial for students to excel in their academic pursuits and effectively apply their knowledge in practical situations (Jony et al., 2017). Several studies showed that the development of critical thinking ability was an essential component of higher education, as it was believed to contribute to improved academic performance (Ramos, 2018).
The education sector is undergoing significant transformation due to recent advancement in Generative AI tools, such as ChatGPT (Lo, 2023; Rahman & Watanobe, 2023; Shanto et al., 2023). Recent studies have started to delve into the intersection of generative AI, like ChatGPT, and education. For example, Malinka et al. (2023) examined the use of ChatGPT in university settings, to highlight its potential as an educational assistant and its implications for academic integrity. In their study, Montenegro-Rueda et al. (2023) conducted a systematic review of applying ChatGPT in education by discussing its positive and negative impacts. These studies provided foundational insights into the utility and challenges posed by ChatGPT in educational contexts. The advent of this technological innovation has undoubtedly provided students with an opportunity to actively participate in academic research, readily seek support for their educational tasks, and conveniently obtain educational resources (Shanto et al., 2024). Studies by Liu et al. (2023) and Tubella et al. (2024) touched upon the potential of ChatGPT as a tool for personalized learning experiences and the need for responsible integration of AI in education.
Despite the advantages, concerns regarding ChatGPT’s responses in diverse domains have emerged. ChatGPT was built upon a robust dataset collection and could generate human-like answers (Fuchs, 2023), yet it raised questions about academic integrity and the development of critical evaluation skills among students (Sweeney, 2023). Recent studies have focused on using ChatGPT-generated responses as prompts to cultivate critical thinking and metacognition in educational settings (Exintaris et al., 2023). There is an ongoing debate about the role of GAI tools like ChatGPT in developing critical thinking skills, with some researchers arguing that AI could complement learning when used ethically and responsibly (Rusandi et al., 2023). The effectiveness of GAI tools in stimulating critical thinking and higher cognitive skills depended on their design and contexts of use, with potential applications ranging from material generation to fostering analytical skills (van den Berg & du Plessis, 2023).
The potential applications of AI tools like ChatGPT in designing educational strategies to enhance critical thinking are manifold. These tools can be leveraged to create dynamic and interactive learning environments that challenge students to engage with complex ideas and scenarios. For instance, educators could use ChatGPT to generate a series of increasingly complex questions or contexts within a specific subject area, hence prompting students to analyze, evaluate, and synthesize information at progressively higher cognitive levels (Ahmed et al., 2024). Furthermore, ChatGPT could be employed to simulate dialogues representing different viewpoints on contentious issues. It could encourage students to critically examine various perspectives and formulate well-reasoned arguments (Hatmanto & Sari, 2023). In problem-based learning scenarios, AI-generated content could present students with multifaceted and real-world problems, requiring them to apply critical thinking skills to develop innovative solutions. Furthermore, these AI tools could be used to provide instant and personalized feedback on students’ responses; they could highlight areas where deeper analysis or more rigorous logic was required (Shanto et al., 2024). By integrating AI-generated content and interactions into the design of curriculum, educators could create more engaging, challenging, and personalized learning experiences that specifically target the development of critical thinking skills (Mejia & Sargent, 2023).
Given the widespread use of ChatGPT among students, there is a demand for thorough investigations to assess the quality and influence of its generated responses. While existing literature highlighted the potential and challenges of adopting ChatGPT in education, there is a noticeable gap in research specifically addressing its influence on critical thinking skills. This study aims to bridge this gap by methodically evaluating ChatGPT’s responses in educational contexts and determining their efficacy in promoting critical thinking.
Several studies employed Lee’s Model of Thinking Levels to measure and evaluate learning tools in education settings. For instance, Jony et al. (2017) used a wiki-based reflection method to enhance deeper thinking levels among students in higher education. They applied Lee’s Model to evaluate the progression of thinking levels while engaging in wiki-based learning environments. Similarly, this study aims to measure the thinking level of ChatGPT’s responses using the popular Lee’s Model of Thinking Levels (Lee, 2005).
3. Methodology
The methodology diagram depicted in Figure 1 outlines an organized strategy for conducting the research. It began with the crucial stage of “Model Selection”, in which models for evaluating the level of critical thinking were analyzed to determine the model that would be used. Following this, an “Initial Prompt and Response Generation” procedure was initiated. ChatGPT was provided with an initial query related to cybersecurity and it produced an initial response. Subsequently, the research progressed to the “Taking Responses from ChatGPT on Different Categories of Prompts” phase, in which additional prompts were used to collect responses encompassing various categories of questionnaires, all based on the initial response. The investigation then focused on “Evaluating ChatGPT Responses Using Lee’s Model”. In this phase, a well-established evaluation framework known as Lee’s Model was implemented to systematically assess the level of thinking demonstrated by ChatGPT’s responses. The findings were then synthesized in “Analyzing the Results”. This crucial phase entailed a thorough analysis of the data collected during the evaluation, to yield valuable insights into the performance and capabilities of ChatGPT in responding to different levels of cybersecurity questions. Since this study focused on analyzing ChatGPT’s responses to various cybersecurity-related prompts, the methodology was designed to assess the level of critical thinking demonstrated by ChatGPT itself, using Lee’s Model of Thinking Levels as a framework for evaluation.
This study employed the well-known Lee’s Model of Thinking Levels. Lee (2005) categorized thinking levels into three levels: recall, rationalization, and reflectivity. He proposed a framework to evaluate the levels of thinking in the educational context. Table 1 illustrates Lee’s Model of Thinking Levels, together with a coding scheme for measuring the levels of thinking.
Level of Thinking | Degree of Level | Lee’s Model | Description of Lee’s Model |
Level 1 | Lowest | Recall | Repeat the same information |
Level 2 | Intermediate | Rationalization | Think logically about the materials (the information) |
Level 3 | Highest | Reflectivity | Think critically beyond the scope of the materials (knowledge) |
On another note, Table 2 depicts an alternative model, known as Bloom’s Taxonomy of Educational Objectives (Adams, 2015), which shared similarities with Lee’s Model of Thinking Levels. The concept under consideration encompassed six distinct stages of cognitive thinking.
Level of Thinking | Degree of Level | Bloom’s Taxonomy | Description of Bloom’s Taxonomy |
Level 1 | Lowest | Knowledge | Repeat the same information |
Level 2 | Intermediate | Comprehension | Understand the content |
Level 3 | Application | Put learning (or expertise) to use in an entirely new context | |
Level 4 | Analysis | Deconstruct the content (knowledge) | |
Level 5 | Synthesis | Combine the contents to reconstruct them | |
Level 6 | Highest | Evaluation | Provide an independent rationale outside the content |
Finally, Table 3 compares the degrees of levels between Lee’s Model of Thinking Levels and Bloom’s Taxonomy of Educational Objectives. Although Bloom’s Taxonomy consists of six levels of thinking, when considering the degrees of thinking levels, both models are the same. Lee’s Model integrates levels 2 through 5 of Bloom’s Taxonomy model into a unified level termed Rationalization. Therefore, this work exclusively focused on the utilization of Lee’s Model for experimental purposes.
Level of Thinking | Degree of Level | Bloom’s Taxonomy | Lee’s Model |
Level 1 | Lowest | Knowledge | Recall |
Level 2 | Intermediate | Comprehension | Understanding the content |
Level 3 | Application | Rationalization | |
Level 4 | Analysis | ||
Level 5 | Synthesis | ||
Level 6 | Highest | Evaluation | Reflectivity |
While Lee’s Model was the primary framework used in this study to evaluate ChatGPT’s responses, the inclusion of Bloom’s Taxonomy served to provide a broader context and to highlight the similarities between the two models (Jony et al., 2017). Bloom’s Taxonomy is a well-established and widely recognized framework for categorizing educational goals and objectives into different levels of cognitive complexity. It is important to note that either of the two models could be used to evaluate ChatGPT’s responses, as they both provide a framework for assessing different levels of thinking. By opting for Lee’s Model, the study could more easily categorize ChatGPT’s responses into three levels, thus enabling a straightforward analysis and interpretation of the results. Moreover, using a single model throughout the study ensured consistency in the evaluation process and facilitated a more focused discussion of ChatGPT’s critical thinking abilities.
To assess ChatGPT’s ability to demonstrate critical thinking, this study employed a data collection method focusing on cybersecurity-related questions and scenarios. Cybersecurity was chosen as the focus of this study due to its inherent complexity and all-inclusive nature, rendering it an ideal candidate for testing ChatGPT analytics capabilities.
The selection and design of the various categories of prompts were guided by several key criteria to ensure a complete and rigorous evaluation of ChatGPT’s critical thinking abilities. To begin the data collection process, we used ChatGPT to give us an in-depth overview of cybersecurity, including its key features, potential improvements, and key concepts, by asking the initial prompt question. Based on this initial description, we enquired ChatGPT basic-level questions and obtained the answers. These questions focused on common best practices of cybersecurity, its definitions, and protective measures that a knowledgeable individual should be able to address. We then moved to intermediate questions, derived from previous responses, to assess the understandability of the model and the ability to obtain additional information for higher-level questions. These inquiries delved into the differences between various cybersecurity domains, the functioning of specific security measures, and the role of human factors in maintaining cybersecurity. Lastly, we prompted several cybersecurity context-based real-world scenario problems to ChatGPT, focusing on problem-solving abilities and the ability to demonstrate reflective thinking beyond pre-existing information.
For reproducibility, all prompts were submitted to ChatGPT in a single-session experimental setup. Each prompt was entered separately, and the corresponding response was collected for analysis. The responses were generated once for each prompt and were not manually modified before evaluation. No additional prompt engineering techniques, follow-up corrections, or external tools were used during the generation of responses. Since the study focused on evaluating naturally generated responses, model parameters such as temperature or randomness were not manually controlled. The prompts were organized into three categories based on increasing cognitive complexity: Basic conceptual questions, intermediate/advanced conceptual questions, and real-world scenario-based questions.
Table 4 demonstrates the prompt patterns employed for the purpose of gathering answers from ChatGPT. It includes a description of each prompt category, along with corresponding samples of the prompts.
Initial prompt: Describe cybersecurity in terms of its most important concepts, aspects, and prospects.
Prompt Category | Description of Prompt | Sample of Prompt |
First category of prompt | Basic-level questions on cybersecurity | What are the best practices for creating strong passwords? |
What is phishing and how can you avoid it? | ||
How can you protect your personal information online? | ||
Second category of prompt | Advanced to intermediate-level questions on cybersecurity | Explain the difference among network security, endpoint security, cloud security, application security, and physical security. |
How does two-factor authentication work and why is it more secure than passwords alone? | ||
How can organizations ensure that their employees are practicing good cyber hygiene? | ||
Third category of prompt | Real-life scenario-based questions on cybersecurity | You are working on a sensitive project for your company and you need to access a file on a shared drive. You notice that the file has been accessed by someone you do not recognize. What should you do? |
You are the IT manager for a small business. You are concerned about the security of your company’s network. What steps can you take to improve the security of your network? | ||
You are using public Wi-Fi to connect to the Internet. You need to log into your bank account. Is it safe to do so? |
Figure 2 below shows the responses generated by ChatGPT based on the initial prompts; Figure 3 shows the responses of ChatGPT on different prompt categories.
For the purpose of evaluating the responses generated by ChatGPT about the given articles, this study employed Lee’s Model. Table 5 presents an analysis of the responses generated by ChatGPT, via applying Lee’s Model and assigning a weight to each response. The analysis of the weights assigned to various thinking levels of ChatGPT was undertaken from multiple perspectives to address the research question: “Can ChatGPT promote critical thinking?” By examining ChatGPT’s responses across different categories of prompts and evaluating them using Lee’s Model, the study aims to fully assess the ability of the tool to stimulate critical thinking. Higher weights were given to responses that demonstrated more advanced levels of thinking (e.g., reflectivity), while lower weights were assigned to responses that exhibited lower levels of thinking (e.g., recall). For example, if ChatGPT consistently achieves high weights for reflectivity across different prompt categories, it suggests that the tool can promote critical thinking.
Evaluation of ChatGPT’s Response | Level of Thinking | Weights |
If ChatGPT’s response recalls the same content of the provided article | Recall | 1 |
If ChatGPT’s response rationalizes its thinking with the provided content | Rationalization | 2 |
If ChatGPT’s response reflects its own thinking beyond the given content | Reflectivity | 3 |
To ensure the reliability and validity of the content analysis, two researchers independently rated ChatGPT’s responses using Lee’s Model of Thinking Levels. Each researcher carefully reviewed ChatGPT’s responses to the various prompts across all three categories (basic-level questions, advanced to intermediate-level questions, and real-life scenario-based questions). Using Lee’s Model as a guideline, the researchers assigned weight to each response. After both researchers completed their independent evaluations, they compared their assigned weights for each response.
To assess the inter-rater reliability of the content analysis, Cohen’s kappa was calculated. Cohen’s kappa is a statistical measure that considers the possibility of agreement occurring by chance, thus providing a more robust measure of inter-rater reliability compared with simple percent agreement (Gisev et al., 2013). It is mathematically defined as:
Here, Po represents the observed agreement between the raters, which is the proportion of cases where both raters assigned the same weight to a given response. Pe, on the other hand, represents the probability of agreement occurring by chance, considering the distribution of weights assigned by each rater.
The Cohen’s kappa coefficient (k) ranges from -1 to 1, to provide a measure of inter-rater reliability. A k value of 0 indicates that the observed agreement is equal to the agreement expected by chance alone, suggesting no real agreement between the raters beyond chance. Conversely, a k value of 1 signifies a perfect agreement between the raters, where all assigned weights are identical.
The resulting Cohen’s kappa value for the content analysis was 0.92, indicating a near-perfect agreement between the ratings of the two researchers. By employing a well-defined evaluation process and assessing inter-rater reliability using Cohen’s kappa, the content analysis conducted in this study maintained scientific rigor and completeness.
4. Results and Discussion
This section presents a discussion of the findings and outcomes obtained from the multi-level question-and-answer experiments conducted with ChatGPT.
Figure 4 presents the ChatGPT’s responses evaluated by Lee’s Model of Thinking Levels. Based on this model, the results of how ChatGPT thought about questions in the first category revealed an interesting pattern (as shown in Figure 4a). For a vast majority of issues, ChatGPT routinely reached the pinnacle of thinking, which was called “reflectivity”. This meant that ChatGPT’s answers tended to go beyond the given information, which showed an independent and reflective way of thinking, especially in the context of cybersecurity. Interestingly, the answers provided by ChatGPT to questions 1, 3, and 6 were at the level of “rationalization”. This meant that ChatGPT’s answers made sense within the context of the given information, considering a more grounded and content-driven line of thought.
Questions in the second category were about concepts of cybersecurity at an advanced to intermediate level; as such, ChatGPT mostly showed reflective thinking (as shown in Figure 4b). For most of these questions, ChatGPT’s answers showed the highest level of knowledge by containing original ideas that went beyond what was given in the cybersecurity material. For questions 8 and 11, ChatGPT reached the “rationalization” level. This shows that ChatGPT is good at drawing conclusions and making new links in the cybersecurity field. However, it may not always be able to reach the highest benchmark of critical thinking for all intermediate-level questions.
The results of questions in the third category, based on real-world scenarios on cybersecurity, are shown in Figure 4c. In this case, ChatGPT consistently performed the highest level of thinking available, i.e., reflection. In particular, the answers of the system went beyond what was given in the scenarios and included original thoughts and conclusions. This meant that when cybersecurity questions were asked in real-world situations, ChatGPT could use critical thinking skills to analyze the problems in depth and produce answers or suggestions that were not limited to the situations themselves. ChatGPT has mastered the deepest level of critical analysis for this type of applied and context-driven questions because it can go beyond the given information to produce innovative ideas to solve real-world cybersecurity problems.
The overview of the results of the experiments is shown in Figure 5. The first sub-graph presented in Figure 5a displays the typical results for the three categories of questions. A mean score of 2.5 indicates a regular evaluation or performance in the first group. When looking at the second group, the average rating rose to 2.67, which indicates a little higher level of performance than the first. Finally, the third category averaged a 3, which indicates a higher performance or evaluation level than the other two. The other sub-graph in Figure 5b showing how each question was graded on a scale from 1 to 18 points also gives a clear visual representation of the data.
The third category, which included scenario-based cybersecurity questions founded on real-world scenarios, received the highest average grade of 3. One explanation for this stronger performance is that scenario-based prompts provide richer context and clearer conditions for the problem. Unlike basic conceptual questions, which often lead to definition-based or factual responses, real-world scenarios require ChatGPT to consider risks, consequences, actions, and context-specific recommendations. This may encourage responses that go beyond recall and demonstrate reflectivity. Moreover, as a conversational language model, ChatGPT may perform better when prompts include practical situations, roles, and decision-making contexts. However, the first group of questions, which dealt with the most fundamental aspects of cybersecurity, was rated the lowest score of 2.5, suggesting that they were more straightforward to answer. The questions belonging to the second group, which included cybersecurity concepts from the intermediate level, obtained an average score of 2.67. This number suggests that ChatGPT can delve into deeper aspects when responding to these questions. The responses generated by ChatGPT for the third-category questions repeatedly demonstrated its ability to perceive real-world circumstances and establish connections, hence facilitating and achieving the highest level of thinking.
There was variation in ChatGPT’s performance across different question categories, despite its continuously high degree of critical thinking, especially in complicated scenarios in real life. Although these results highlighted ChatGPT’s promise as an instructional tool, they also emphasized the demand for individualized approaches to fully leverage its capacity to foster critical thinking in its users. Its potential as an effective teaching tool was featured by its persistent demonstration of reflective and critical thinking, especially in situations that replicated the complexities of real life. Overall, the constant display of critical and reflective thinking by ChatGPT, especially when faced with complex real-world situations, strongly indicates its capacity to foster critical thinking skills among learners.
Besides, it is important to acknowledge the notable limitations associated with this narrow focus on a particular domain. Therefore, the findings should be interpreted as exploratory rather than fully generalizable. Since the analysis was based on a limited number of prompts within a single domain, i.e., cybersecurity, the results might not represent ChatGPT’s performance across all academic subjects or all types of critical-thinking tasks. Future studies should extend this analysis to multiple disciplines, larger prompt sets, and different generative AI models to validate the broader applicability of the findings. If ChatGPT has access to a larger volume of trained data and information in a specific domain, its performance and ability to generate responses will likely demonstrate an improved capability for critical thinking. Consequently, it could more efficiently help users engage in critical analysis within that domain. Nevertheless, the potential of the system may be greatly constrained in the absence of a sufficient amount of domain-specific data. To address these limitations and further advance our understanding of the potential of AI in education, several avenues for future research should be considered. Future research could explore the integration of ChatGPT and other generative AI tools into specific educational interventions and curricula. Qualitative research methods, such as interviews and focus groups with educators and students, could provide valuable insights into the experiences and perceptions of those engaging with AI tools like ChatGPT in educational settings.
5. Conclusions
This study aims to evaluate the cognitive levels manifested in the responses generated by ChatGPT, using Lee’s Model of Thinking Levels as a framework. The research focused on the domain of cybersecurity and analyzed ChatGPT’s responses to questions of varying complexity, ranging from basic to intermediate and advanced levels, as well as real-world scenarios. The findings suggested that ChatGPT demonstrated a prominent level of cognitive ability, with a considerable proportion of its responses falling into the “Reflectivity” category of Lee’s Model. This indicated that ChatGPT could generate responses that went beyond the mere recall of information and exhibited a degree of logical thinking and critical analysis.
The analysis of ChatGPT’s responses using Lee’s Model provided insights into the ability of the model to generate output that resembled various levels of cognitive complexity. Educators could use ChatGPT as a tool to generate prompts and questions that encourage students to engage in higher levels of thinking, such as rationalization and reflectivity. By carefully designing learning activities that incorporate ChatGPT-generated content, teachers could create opportunities for students to practise critical thinking skills and engage in meaningful discussions. However, it is crucial for educators to critically evaluate the responses generated by ChatGPT and other AI tools to ensure that they are accurate, relevant, and conducive to fostering deep learning and understanding. Moreover, policymakers could consider the potential of language models like ChatGPT in shaping educational policies and initiatives. By recognizing the capabilities and limitations of these models, policymakers could make informed decisions about how to integrate them into educational systems, while also addressing concerns related to equity, accessibility, and the responsible use of AI in education.
As generative AI tools become increasingly prevalent in education, evaluating the quality and thoughtfulness of their responses is necessary to understand their potential impact on student learning (Shanto et al., 2025). Educators should carefully assess the responses provided by these tools to determine the extent to which they could offer meaningful support for developing critical thinking skills. The insights gained from this study could inform our understanding of AI capabilities in demonstrating critical thinking, potentially guiding future research on how such AI models might be leveraged in educational contexts to support the development of critical thinking skills in students.
Not applicable.
The authors declare no conflicts of interest.
