The post-COVID period, especially after the release of ChatGPT in November 2022, has been characterised by immense interest in applications of artificial intelligence (AI) in different areas of the economy, with education technology being an oft-cited example.
Summarising the potential for AI in education is difficult for many reasons. For one, ‘artificial intelligence’ is a broad category that sums up many distinct use cases with variable degrees of promise. More importantly, many developments are far too recent to have been subject to field experiments in LMICs and the underlying technologies have themselves been developing too rapidly to fully evaluate their potential. It is, however, still useful to characterise a (non-exhaustive) set of uses for which AI could be potentially transformative.
The first set of uses are those where the core interventions already exist but may potentially be made more productive through AI embedded in the software back-end. For example, the use of AI tools might sharply improve the degree of personalisation, the accuracy of diagnosing sources of student errors, or the depth of feedback provided to students by personalised adaptive learning platforms. If so, it is possible that this class of interventions will become even more promising over time, without requiring education systems to fundamentally discover new modes of implementation. Indeed, many such applications are already in use: for example, Khan Academy has AI-enabled chatbots (called ‘Khanmigo’) which personalise feedback based on student responses.
Similarly, AI-enabled chatbots are also common in other routine applications of EdTech surveyed in previous sections, e.g. interventions where detailed information about students and schools is used to correct potential errors in applications or otherwise provide detailed personalised information to parents could also benefit from integrating.[1] They may also enable many interventions that would have been infeasible a decade ago. For example, Google’s Read Along app, also powered by AI, allows students to read alongside the app with feedback, requiring merely a regular smartphone or Android tablet; this has substantial potential for incorporation in routine assessments (and is being used in LMIC education systems, given language support for multiple languages).
The second set of interventions are those that use large language models directly as the basis for intervention. Two recent papers provide some indication of the potential for this class of interventions. Ferman et al. (2021) provide an early example of a trial set up to evaluate the potential for AI tools to provide feedback to students. They focus on evaluating the effect of a pedagogical programme that provided feedback on writing skills to students, using an automated writing evaluation (AWE) system. Specifically, in a sample of 178 Brazilian secondary schools (19,000 students), they randomly assign schools to one of three groups: (i) a Control group, (ii) 55 schools with access to the AWE system, and (iii) 55 schools where students received feedback from the AWE system and a human grader. The principal task being trained for was the argumentative essay component of Brazil’s National Secondary Education Exam (ENEM), the second largest university entrance exam in the world. They find effects of ∼0.09σ on the test scores on the essay component, which was nearly identical across groups. This suggests both that the AI-based feedback was helpful but also that adding human review did not measurably lead to greater gains.
De Simone et al. (2025) present intriguing short-term evidence from Nigeria on another application of large language models to teach students. Specifically, they test the effect of providing a publicly available LLM (Microsoft CoPilot, incorporating ChatGPT-4) to support secondary school students in learning English in an after-school programme in urban Nigeria. After a six-week intervention with 12 sessions of 90-minutes each, they report treatment effects of 0.24σ on English skills as well as positive effects on digital skills and AI knowledge. These treatment effects were larger for students with higher initial performance and higher socioeconomic status.
While the evidence in Ferman et al. (2021) and De Simone et al. (2025) provides grounds for optimism about the potential for AI use in education, there are also likely to be risks: much like EdTech more generally, the precise intervention design of programmes incorporating AI is likely to be crucial. Bastani et al. (2025) provide a clear example of such potential trade-offs. In a large field experiment in Turkey, when given no restrictions on using GPT, students used it to take short-cuts for answering questions during practice tests, but this comes at the cost of longer-term skill acquisition. Students given access to a base GPT did worse than the control group in exams that did not have access to GPT. Effects were more encouraging in a further treatment arm where students were provided an AI tutor which provided teacher-designed hints instead of giving away answers: the effect on practice sessions were more positive and there were no negative effects on the no-GPT final exam. However, students in this group did no better than the control group without GPT. The study points to a key challenge with AI in overcoming the temptation to use it to make learning easier. For it to have a positive impact on learning, students must actively engage with AI to correct misconceptions and complement their learning rather than substitute for it.
Evidence on AI use in education is, understandably, still at a nascent stage. We expect this evidence base to grow substantially in the near future.
For full reference list see the end of the conclusion chapter.
Contact VoxDev
If you have questions, feedback, or would like more information about this article, please feel free to reach out to the VoxDev team. We’re here to help with any inquiries and to provide further insights on our research and content.