Scaling evidence-based programmes is as much art as it is science

Scaling evidence-based education programmes requires identifying the non-negotiable components that drive impact and adapting everything else to fit within government systems. Sustainable gains depend on embedding the underlying principles into policy and institutional practice – not just replicating programme features.

In global development today, the call to ‘scale what works’ has become almost axiomatic (Piper et al. 2025, Al-Ubaydli et al. 2021). Evidence-based programmes are being mainstreamed into government systems with the hope that rigorous evaluations will translate into improvements at scale. However, for anyone working with public service delivery systems, achieving scale is rarely smooth. It is an exercise in balancing what evidence prescribes with what systems can realistically sustain. It is therefore as much about creatively adapting to institutional realities as it is about identifying what works through rigorous evidence.

The literature on scaling evidence-based programmes has highlighted an inherent tension: staying true to what the evidence tested, while adapting the programme enough to fit within constrained government systems and run at scale. As work on scaling has emphasised for over a decade, translating evidence into sustainable impact requires continuous adaptation to local realities, iterative learning, and sustained partnerships across governments, researchers, implementers, and funders (Duflo 2017,Muralidharan and Singh 2025, Carter et al. 2021). A resounding focus of the scaling community's efforts has been to better understand and navigate these realities, acknowledging that translating evidence into sustainable impact is often difficult and messy.

Over the past five years, ASPIRE, J-PAL South Asia's initiative to support governments in scaling evidence-based programmes, has built a strong education portfolio working with eight state governments and multiple ecosystem partners. Through these partnerships, we have worked with researchers, governments, and implementing organisations to improve education outcomes while generating insights into how evidence can be translated into sustainable impact at scale.

Identifying the non-negotiable components for impact

For programmes scaled through government systems, randomised evaluations play a critical role in identifying the foundational components that can drive long-term impacts. Every Child Counts, for instance, nurtures children’s innate numerical and spatial reasoning for 4–6 years old. The programme has been adapted in India and evaluated by J-PAL-affiliated researchers through three randomised evaluations in Delhi between 2013 and 2019, in partnership with Pratham and the Department of Education, Delhi. These studies found significant improvements in children’s ability to interpret and work with numbers and shapes, and to meaningfully engage with mathematical symbols, with gains sustained up to one year after the intervention.

However, as education programmes move across contexts, they must adapt to differences in classrooms, curricula, and resource constraints, and integrate evidence into existing systems as part of routine practice. This makes it vital to identify the foundational elements that drive impact. Implementation research has increasingly highlighted that while some intervention components are core and must be preserved to maintain effectiveness, others can and should be adapted to suit context and delivery realities (Muralidharan and Singh 2025, Carroll et al. 2007). Clearly defining these core components helps ensure that as programmes are adapted to work across diverse contexts, they remain effective while staying anchored in the theory of change that produced the original results.

The scaling journey of the Every Child Counts curriculum illustrates this well. As the programme expanded from controlled pilots to large-scale use in Andhra Pradesh and Maharashtra, adapting the learning materials for scale became necessary. The original set of game cards tested through multiple randomised evaluations were extensive and difficult to repurpose at scale, prompting the team to streamline the materials while preserving the elements that drove learning.

To do this, researchers reduced the number of cards in each game while carefully maintaining the original balance of content, levels of difficulty, and progression of learning. This process ensured that children continued to engage with the same cognitive challenges as in the evaluated model, even though the volume of material was reduced for large-scale use.

By clearly identifying the non-negotiable elements, concepts covered, progression of difficulty, and exposure to key numerical and spatial tasks, the programme could adapt its materials for scale without diluting impact. The broader lesson is that when core elements are clearly defined along with researchers, systems can create structured space for adaptation, enabling interventions to scale while remaining anchored in the evidence that underpins them.

Bringing the programme in line with ‘systems’

Once the core elements are defined, the first step towards scale is building alignment – in vision and in systems. Public systems operate with existing priorities, processes, and constraints, and evidence-based programmes can only survive at scale if their core logic fits this reality. Richard Kohl, of the Scaling Community of Practice, defines this as a scaling from a systems perspective (SPS) focusing on scaling interventions within existing systems.

Consider the Gender Equity Curriculum, a state-led programme in Punjab that leverages the school curriculum to improve adolescents’ gender attitudes, aspirations, and behaviours. The programme, designed by NGO Breakthrough and evaluated by J-PAL affiliates, led students, particularly boys, to enact more gender-equitable behaviour.

The state of Punjab has adapted this as part of their public-school curriculum for adolescents across standards 6 to 8. The interactive curriculum, designed by Breakthrough with 28 modules, underwent a series of adaptations to be integrated into the mainstream curriculum in Punjab. The process began by understanding the Department of Education's existing perceptions and processes regarding how gender is currently taught and understood in schools. Researchers looked for gaps in classroom content and teaching practices, while also learning about local cultural norms. These insights were then used to tailor the curriculum to the state's specific context in the form of introductory supplementary textbooks for Social Studies and English Literature. After teacher feedback, the curriculum was mapped to the main textbook content. After the integration within the main textbook, students who only read about ‘early man’ as part of their social studies now also did a fun activity on what contributions the ‘early woman’ – his equal counterpart – made.

Here, programme delivery was crucial, as an unbiased delivery of the curriculum by the teacher is almost a necessary condition for delivering this curriculum. Regular and refresher training, customised to the state, was conducted for the teachers through a train-the-trainer model and further cascaded to all 45,281 teachers across the state through District Institutes of Education and Training.

These alignment choices were necessary to ensure the programme fit within how the system already operates. By adapting the programme, customising core concepts to the state’s cultural context, integrating it into schoolbooks, and ensuring regular teacher training, the state created conditions for durability. Scaling thus depends less on preserving every feature of a programme and more on aligning with systems to ensure the impact is delivered and sustained.

Embedding ‘big bets’

The true marker of success for any evidence-based scale-up lies in its ability to reshape how systems think and operate, while keeping true to the original fidelity. Sustainable impact happens when governments and institutions internalise the principles behind an intervention and embed them into policy, institutional practices, and broader approaches to education service delivery.

The trajectory of Teaching at the Right Level (TaRL) illustrates this shift. At its core, TaRL is built on a simple but powerful ‘big bet’: targeted instruction based on a child’s current learning level rather than their grade. What began as a classroom-level remediation strategy has now informed system-wide thinking. In India, this broader principle is reflected in India’s National Education Policy (NEP) and initiatives like NIPUN Bharat Mission, which organise the system around measurable learning competencies rather than uniform grade-level delivery. This shifts the question from ‘Which grade is this child in?’ to ‘What can this child do, and what do they need next?’

When this kind of core idea is embedded, the programme transcends its original design. Systems can adapt how they are delivered to fit their context, while preserving the underlying mechanisms that drive impact.

Align incentives and evidence to sustain fidelity

Scaling programmes requires coordination across multiple actors, each with different priorities. When these incentives are misaligned, even strong evidence and well-designed programmes can weaken at scale. Sustaining fidelity, therefore, requires making incentives explicit and aligning them with what drives impact, so they naturally reinforce the integrity of the intervention.

Several recent efforts to scale evidence-based remote learning and tailored instruction programmes in Karnataka illustrate this challenge. These initiatives, drawing on evidence from phone-based tutoring and targeted instruction models such as ConnectEd, sought to integrate remote learning approaches within government systems while responding to state priorities around foundational learning recovery and caregiver engagement. As the programme evolved, different stakeholders were driven by different priorities: the government focused on maximising reach and involving caregivers and parents during remote math tutoring, researchers were interested in maintaining fidelity at scale while testing novel ways to make the programme more efficient, and funders emphasised cost-effectiveness and operational feasibility.

ASPIRE, as a scaling organisation working closely with all stakeholders, played a catalytic role in designing EdLabs as a model to bring multiple partners together. This involved working with teams to simplify parts of the programme, jointly reviewing progress, and taking programmatic decisions together with the government – with a clear eye on what could realistically be scaled. Similar initiatives are being taken to Labs by Innovations for Poverty Action (IPA) and the Jacobs Foundation, among others.

In practice, sustaining this shared commitment is a collaboration – a collective effort where researchers, implementers, and governments align incentives and expertise to adapt responsibly. When the challenge is to improve learning for millions of children, identifying what works is not enough. As India works towards becoming a developed nation by 2047, the real task is to pair effective interventions with a clear understanding of the mechanisms behind them so they can be scaled consistently. This is central to translating gains in learning into broader economic progress.

Authors’ note: This piece is a reflection of the ongoing work on programmes being led by the authors.

References

Al-Ubaydli, O, J A List, and D Suskind (2021), "The scale-up effect in early childhood and public policy: Why interventions lose impact at scale and what we can do about it," in J A List, D Suskind, and L H Supplee (eds), The Scale-Up Effect in Early Childhood and Public Policy, 1–16, Routledge.

Carroll, C, M Patterson, S Wood, A Booth, J Rick, and S Balain (2007), "A conceptual framework for implementation fidelity," Implementation Science, 2(40).

Carter, S, I Dhaliwal, S Friedlander, and C Walsh (2021), "Forging collaborations for scale: Catalyzing partnerships among policy makers, practitioners, researchers, funders, and evidence-to-policy organizations," in J A List, D Suskind, and L H Supplee (eds), The Scale-Up Effect in Early Childhood and Public Policy: Why Interventions Lose Impact at Scale and What We Can Do About It, 370–389, Routledge.

Duflo, E (2017), "Richard T. Ely Lecture: The economist as plumber," American Economic Review, 107(5): 1–26.

Muralidharan, K, and A Singh (2025), "Adapting for scale: Experimental evidence on technology-aided instruction in India," Unpublished manuscript.

Piper, B, L Benveniste, and N Angrist (2025), "A clarion call for efficient and sustainable solutions to achieve foundational learning," What Works Hub for Global Education.