high stakes exam in Brazil

A trade-off between informativeness and equality in high-stakes standardised testing

Article

Published 20.08.24

Evidence from Brazil's transition to a national college admission exam shows that higher-stakes testing widened socioeconomic test score gaps, yet also improved the exam's ability to predict students' college success.

Economist Charles Goodhart famously stated, “When a measure becomes a target, it ceases to be a good measure.” This reflects a fundamental challenge in talent selection, where candidates have strong incentives to manipulate signals of their quality, especially for highly desirable positions (Goodhart 1975, Frankel and Kartik 2019, 2022). Educators, too, worry that high-stakes standardised testing may enable wealthy students to game the system rather than selecting students with the best academic potential.

Leveraging Brazil’s transition to a national college admission exam, we analyse the impact of increasing the stakes of a standardised test on inequality and informativeness. Consistent with the common criticism of admission tests, we find that socioeconomic gaps in performance expanded on higher-stakes assessments. Yet we find that higher stakes increased the informativeness of exams. This implies that decision-makers face a fundamental tradeoff between equality and informativeness in identifying talent.

Institutional context: High-stakes college admission exams

Our research (Reyes, Riehl and Xu 2024) examines these issues in the context of college admissions by studying the rollout of a national standardised admission exam for elite Brazilian universities. From 2009 to 2017, Brazil's system of highly selective federal universities transitioned from using their own admission exams to a common test called the ENEM. Importantly, the ENEM was also used to measure high school quality, so many high school seniors took it regardless of its role in college admissions. This created a natural experiment where, depending on location and cohort, some students took the ENEM as a low-stakes school quality assessment, while others took it as a high-stakes university entrance exam.

Our empirical strategy exploits variation in exam stakes across states and cohorts in a difference-in-differences design. We link administrative records from the ENEM to nationwide college enrollment and labour market data, allowing us to examine how the increase in stakes affected two important outcomes: 1) test score gaps between advantaged and disadvantaged students; and 2) the predictive power of test scores for individuals' academic potential.

Increasing the stakes of exams led to greater inequality in test scores

We find that gaps in average ENEM scores between private and public high school students expanded by roughly 10% (relative to the mean gaps in pre-adoption cohorts) when federal universities adopted the ENEM in admissions. This increase was driven by private school students earning higher scores on the high-stakes exam. Racial and other socioeconomic test score gaps expanded by a similar magnitude (Figure 1, Panel A). The magnitude of these effects implies a significant increase in the selectivity of university programmes accessible to private school students, suggesting that higher-stakes exams increased inequality in college access.

To examine the role of test preparation in driving these results, we use two measures of students' test prep activity. First, we identify "prep schools" by cross-referencing our sample of high schools with lists of institutions using test-oriented curricula from leading test prep companies. Second, we use a variable from the ENEM questionnaire that indicated whether students took an entrance exam preparation course. We find that the increase in ENEM stakes led to larger test score gaps between students who did and did not engage in test prep (Figure 1, Panel B), suggesting that access to test preparation resources played a crucial role in widening the private/public test score gap when the ENEM stakes increased.

Figure 1: Effects of ENEM adoption on gaps in average ENEM score

Effects of ENEM adoption on gaps in average ENEM score

Notes: This figure shows the impact of ENEM adoption on various gaps in average ENEM scores. Panel A shows impacts on demographic test score gaps. “High-income” individuals are defined as those with a family income greater than or equal to twice the minimum wage. Panel B shows impacts on test score gaps between students who did and did not engage in test prep activities.

The increase in stakes made test scores more informative

We also find that scores on the higher-stakes ENEM exam became more informative about students' academic potential. Specifically, the adoption of the ENEM by federal universities increased the correlation between test scores and college persistence and graduation outcomes by roughly 10-30%, depending on the outcome of interest. Scores in all exam subjects became more informative for predicting students’ long-run outcomes (Figure 2, Panel A). Importantly, this increase in predictive power holds true both overall and among students who attended the same college programmes, suggesting that our findings reflect a genuine increase in score informativeness rather than simply a causal impact of scores on programme placement.

Why were high-stakes ENEM scores more predictive of college success? One possibility is that high-stakes scores became more correlated with individual characteristics that contribute to college success, such as family income. We examine how demographic controls impacted the informativeness of ENEM scores for college outcomes and find that demographic variables explain some, but not all, of the increase in informativeness (Figure 2, Panel B).

Another possibility is that the higher stakes incentivised students to develop a broader range of academically relevant skills. Leveraging unique question-level data, we show that the higher-stakes test led to an improvement in private school students' performance across a wide range of skills. Crucially, the skills where we observed the largest improvements also tended to be more predictive of college outcomes (Figure 2, Panel C). This suggests that test prep for the higher-stakes ENEM was not confined to narrowly targeted skills that merely raise exam scores; rather, the score gains reflected a broad set of skills that are informative for academic potential.

Figure 2: Effects of ENEM adoption on the informativeness of ENEM scores for longer-run outcomes

Panel A. Effect on the informativeness of subject-specific ENEM scores

Effect on the informativeness of subject-specific ENEM scores

Panel B. Effect on informativeness controlling for demographics

Effect on informativeness controlling for demographics

Panel C. Informativeness for college persistence vs. impact of exam stakes on private/public gap by skill

Informativeness for college persistence vs. impact of exam stakes on private/public gap by skill

Notes: Panel A shows the impacts of ENEM adoption on the informativeness of subject-specific ENEM scores for longer-run outcomes. Panel B shows the impacts of ENEM adoption on the informativeness of average ENEM scores for longer-run outcomes after controlling for demographic characteristics. Panel C shows the relationship between the informativeness of ENEM exam skills for college persistence (y-axis) and the effect of ENEM stakes on the private/public gap (x-axis).

Conclusions: The dual goals of diversity and informativeness in selecting talent

Our research shows that high-stakes entrance exams provide better information about students' potential for academic success, but they also exacerbate existing inequalities in educational access. These findings highlight a crucial trade-off for college admission officers who rely on test scores as an instrument to identify talent while also making efforts to diversify the student body. This trade-off may extend to other high-stakes settings such as hiring processes at prestigious firms, where decision-makers must balance identifying top talent with promoting diversity.

One potential solution to this challenge is the introduction of a second policy instrument. Colleges could design maximally informative test scores to identify academic potential while employing a separate tool to improve diversity. For example, Brazilian universities reserve slots for disadvantaged applicants while selecting top scorers within these groups (Machado et al. 2024, Oliveira et al. 2024). However, institutions in countries like the United States face legal constraints in addressing this trade-off. We hope that future research will uncover new policy tools to mitigate this important trade-off.

References

Frankel, A and N Kartik (2019), "Muddled information." Journal of Political Economy 127(4): 1739–1776.

Frankel, A and N Kartik (2022), "Improving information from manipulable data." Journal of the European Economic Association 20(1): 79–115

Goodhart, C (1975), "Problems of monetary management: The UK experience in papers in monetary economics." Monetary Economics 1.

Reyes, G and E Riehl and R Xu (2024), "Stakes and signals: An empirical investigation of muddled information in standardized testing." NBER Working Paper #32608

Machado, C, G Reyes and E Riehl (2024), "The impacts of large-scale affirmative action at elite universities." VoxDev. https://voxdev.org/topic/education/impacts-large-scale-affirmative-action-elite-universities

Oliveira, R, A Santos and E Severnini (2024), "Affirmative action in Brazil’s higher education system." VoxDev. https://voxdev.org/topic/education/affirmative-action-brazils-higher-education-system