AI hiring tools exhibit complex gender and racial biases

As artificial intelligence becomes more widely used in talent recruitment, employers must seriously consider the biases these models may introduce in hiring decisions. A study of five leading large language models reveals that intersectionality, rigorous piloting, and human oversight are necessary to reduce algorithmic bias.

The growing adoption of artificial intelligence (AI) in hiring processes is transforming how firms recruit talent. With over half of US companies now investing in AI-based recruiting tools (USC Annenberg 2023), Large Language Models (LLMs) increasingly influence who gets hired. But what happens when these algorithms, rather than humans, make initial screening decisions? Our research reveals surprising patterns of bias that could significantly reshape labour market opportunities across gender and racial lines.

Using a large-scale randomised experiment, we find that leading AI models systematically favour female candidates while disadvantaging black male applicants, even when qualifications are identical (An et al. 2025). These biases could affect employment opportunities for hundreds of thousands of workers, raising important questions for policymakers, researchers, and firms.

Using experimental design to test AI bias in resume screening

To measure AI bias in hiring, we needed to isolate the causal effect of gender and race on AI evaluations. We constructed approximately 361,000 fictitious resumes where candidates' work experience, education, and skills were randomly assigned, while names were chosen to signal specific gender and racial identities. We then instructed five LLMs—GPT-3.5 Turbo, GPT-4o, Gemini 1.5 Flash, Claude 3.5 Sonnet, and Llama 3-70b—to score these resumes on a 0-100 scale.

This experimental approach follows methodologies used in audit studies of human hiring discrimination, but at a much larger scale. By randomising all qualification-related characteristics, any systematic differences in scores can be attributed directly to the LLMs' responses to gender and racial signals.

AI models exhibit unexpected patterns of bias

Our analysis reveals that AI models exhibit consistent intersectional bias patterns that differ markedly from typical human biases documented in previous labour market studies.

Figure 1: Regression coefficients: GPT-3.5 Turbo score differences across social groups

Notes: This figure presents the regression coefficients that compare the score differences across different social groups by GPT-3.5 Turbo. The first part compares minority candidates (female or black) to white male candidates. The second part compares females to males, and black to white candidates. The third part compares each minority group to white males. The blue diamonds mark the estimated coefficients, and the dashed grey lines show the 95% confidence intervals.

As Figure 1 demonstrates, GPT-3.5 Turbo exhibits clear social biases when evaluating otherwise identical resumes.

All models award significantly higher scores to female candidates regardless of race. For GPT-3.5 Turbo, female candidates receive scores approximately 0.45 points higher than otherwise identical male candidates.
Most models award lower scores to black male candidates compared to white male candidates with identical qualifications. For GPT-3.5 Turbo, this penalty is approximately 0.30 points.
These combine effects create a clear hierarchy: Black female candidates score highest (+0.379 points above White males), followed by White female candidates (+0.223), then White male candidates (baseline), with Black male candidates scoring lowest (-0.303 points).

These biases remain robust across different contexts, and the pro-female and anti-Black male biases persist across job types (from male-dominated to female-dominated occupations), candidate locations, and political contexts. Most importantly, these bias patterns are consistent across all five models. While the magnitude varies across models, the pro-female and anti-Black male biases appear systematically across LLMs from different developers. The consistency of these findings suggests these biases are deeply embedded in how current AI systems evaluate candidates.

Real-world implications of AI bias for labour markets

While differences of 0.3-0.5 points on a 100-point scale may seem small, they translate into meaningful employment impacts at critical decision thresholds, as illustrated in Figure 2.

Figure 2: Estimated differences in the probability of being hired, based on gpt-3.5 turbo score

Notes: This figure presents the estimated probability of being hired by a certain minority social group relative to the benchmark group across different score thresholds. The dashed grey lines show 95% confidence intervals.

Assuming employers use a threshold of 80/100 for advancing candidates (where approximately 35% of applicants qualify), GPT-3.5 Turbo's biases would increase black and white female candidates' advancement probability by 1.7 and 1.4 percentage points, respectively, while decreasing black male candidates' chances by 1.4 percentage points.

Applied to the US labour force, these effects could impact approximately 190,000 black women, 820,000 white women, and 150,000 Black men, even if AI tools were only used for entry-level positions.

What drives AI bias in hiring?

Our findings align with the theory of intersectionality, which emphasises that gender and racial identities interact in complex ways. The particular disadvantage faced by black men, as opposed to all black candidates, suggests that AI systems have internalised specific stereotypes about black males.

Two mechanisms likely contribute to these bias patterns. First, the training data used to develop LLMs—drawn substantially from internet content—may overrepresent certain social views. Second, the human feedback and debiasing procedures used by AI developers may have overcompensated for certain biases while introducing others. The consistency across models from different developers suggests these issues are widespread rather than company specific.

Policy implications: AI and labour markets

Our findings have several important implications for policymakers, regulators, and firms:

Current anti-discrimination frameworks often treat gender and race as separate categories. Our results demonstrate that AI biases operate intersectionally, e.g. black women face different outcomes than black men or white women. Regulatory approaches must evolve to address these complex patterns.
Firms adopting AI hiring tools should conduct impact assessments before widespread deployment. These assessments should analyse effects across intersectional groups using a methodology similar to our experimental approach.
Human oversight remains essential, especially for candidates from groups facing algorithmic disadvantages. While AI tools may reduce certain human biases, they introduce new patterns of discrimination that require monitoring.

Future research on AI and labour markets

As AI transforms hiring practices, understanding these questions becomes increasingly urgent. Our evidence suggests that the transition from human to AI decision-making will redistribute employment opportunities across social groups in complex ways.

Future research should investigate what drives these bias patterns—whether it be training data, debiasing procedures, or other factors. How do these biases interact with occupational segregation? How might labour markets evolve as these tools become more widespread?

Importantly, our findings raise critical questions about the global implications of AI recruiting technologies. Most current models are developed and trained on US data, reflecting US social categories and labour market structures. When deployed internationally, these systems may import US-specific biases or interact unpredictably with local social hierarchies. Research must examine how AI hiring tools perform across diverse cultural contexts, legal frameworks, and labour markets. The global adoption of US-developed AI systems could potentially homogenise hiring practices across culturally distinct regions or amplify locally irrelevant biases.

These questions highlight the need for ongoing monitoring, transparent reporting, and inclusive governance as AI reshapes how firms connect with talent globally. Without proper oversight and localisation, these tools could reinforce some inequities while creating new ones, fundamentally altering who gets access to economic opportunity across diverse societies.

References

An, J., Huang, D., Lin, C., & Tai, M. (2025). "Measuring gender and racial biases in large language models: Intersectional evidence from automated resume evaluation". PNAS Nexus, 4(3), pgaf089.

USC Annenberg School for Communication and Journalism (2023). "How Artificial Intelligence (AI) in HR Is Changing Hiring". https://communicationmgmt.usc.edu/blog/ai-in-hr-how-artificial-intelligence-is-changing-hiring