How Does Bias Form in AI Models, and What Preventative Strategies Can Stop It Early in the Pipeline?
This paper examines bias in AI and focuses on trying to problem solve earlier in the pipeline. While most other papers focus on correcting bias after the model has been deployed, this paper takes a different approach. This paper focuses on three main sources: biased data collection, human labeling practices, and model training practices.
STEM RESEARCHAIARTIFICIAL INTELIGENCE
Jiya Mahalavat
7/28/20254 min read
Abstract
As artificial intelligence becomes increasingly embedded in high-stakes decision-making systems, concerns over algorithmic bias have intensified. Most existing research and mitigation techniques focus on identifying and correcting bias after model deployment. This study proposes a shift in perspective, toward bias prevention by analyzing how and where bias enters the AI development pipeline. Specifically, it explores three main sources: biased data collection, human labeling practices, and model training processes. A hands-on experiment was conducted using a sentiment classifier trained on the Sentiment140 dataset, where gender-based bias was artificially introduced. The model’s predictions revealed significant disparities that were later reduced using fairness-aware mitigation strategies. These findings suggest that bias can and should be addressed at earlier stages of AI development. This paper contributes a proactive approach to fairness in machine learning by offering technical insights and practical solutions for building more equitable AI systems from the ground up.
Introduction
Amazon once built an AI hiring tool that automatically downgraded resumes from women. Why? Because the data was trained on favored men, which reflects years of biased hiring history. Unfortunately, Amazon’s failed hiring tool is not an isolated case. While most current research focuses on detecting bias in models after they’ve been deployed, this approach is often too late, and by then, the damage may already be done. This makes us wonder, what if we could stop AI bias before it starts? This research investigates how bias forms during the early stages of the AI development pipeline, particularly in data collection, labeling, and model training—and explores strategies to prevent it.
Literature Review/Background
Artificial intelligence systems are often assumed to be objective, yet numerous studies show that they can reflect and even amplify human bias. In their foundational paper Gender Shades, Buolamwini and Gebru found that commercial facial recognition systems misclassified darker-skinned women up to 34.7% of the time, compared to less than 1% for lighter-skinned men. This disparity highlighted how biased training datasets, especially those lacking diversity, can cause AI to disproportionately harm marginalized groups (Buolamwini and Gebru). These examples demonstrate that bias often originates in the earliest stages of the AI development process, particularly in data collection and selection—and that such bias can go undetected until the model is already in use.
Methodology
This experiment used a text classification model to explore how training data bias affects AI predictions. A subset of the Sentiment140 dataset, which contains tweets labeled as positive or negative, was used to train three versions of a logistic regression classifier. In the control group, the original dataset was used without modifications. In the biased version, additional negative tweets were injected containing female-associated names such as "Mary" and "Emily," while positive tweets more often mentioned male names. The final version involved balancing this bias by introducing positive tweets with female names. All models were evaluated using accuracy and a custom test set containing gendered names. The results demonstrated that the biased model showed a clear tendency to associate female names with negative sentiment, while the balanced model reduced that disparity—supporting the idea that early-stage data manipulation can meaningfully reduce bias.
Results
The models trained on the three datasets—clean, biased, and fixed—produced very different outputs. The biased model performed similarly to the clean model in terms of accuracy (roughly 80–85%), but when tested on custom sentences containing gendered names, its predictions revealed significant skew. For example, phrases like “Emily did a great job” were more likely to be classified as negative by the biased model, while “David is the worst” was still classified as positive. This demonstrated that the model had learned to associate female names with negative sentiment, simply due to the imbalance in the training data.
The fixed model, which introduced balancing by including positive examples with female names, reduced this skew. It still occasionally misclassified gendered names, but performed more fairly overall. This suggests that data balancing, even done in a very simple way, can mitigate bias significantly. Accuracy remained relatively stable across all three models, meaning fairness can be improved without sacrificing performance.
A bar graph comparing prediction outcomes by name showed that the biased model misclassified female names negatively at a much higher rate than male names. In contrast, the fixed model brought the predictions closer to neutral across both groups, demonstrating the impact of early-stage intervention.
Discussion
These results provide strong evidence that bias in AI models can stem not just from the algorithms themselves, but from patterns present in training data. The biased model learned harmful associations based solely on exposure to more negative tweets containing female names. This supports previous research by Buolamwini & Gebru (2018), which argues that bias often originates before model training, it originates during data collection and preparation.
What’s especially important is that this study used very basic tools—a simple model, common dataset, and straightforward techniques—yet still revealed deep and replicable bias patterns. This demonstrates that even beginner AI practitioners must think carefully about fairness from the start.
Conclusion
This study showed that artificial intelligence models can learn and reproduce social biases when trained on imbalanced datasets. Through a simple sentiment analysis experiment, it became clear that models trained on biased data, predicted negative sentiment more often for female-associated names. However, when the training data was adjusted to be more balanced, those biased predictions decreased significantly.
This supports the idea that early intervention in the AI development process, especially at the dataset level, can play a key role in creating fairer systems. The findings encourage developers to proactively audit and balance their data before model training begins.
Works Cited
Dastin, Jeffrey. “Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women.” Reuters, 10 Oct. 2018, www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G.
Buolamwini, Joy, and Timnit Gebru. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” Proceedings of Machine Learning Research, vol. 81, 2018, pp. 1–15, http://gendershades.org/.
Mehrabi, Ninareh, et al. “A Survey on Bias and Fairness in Machine Learning.” ACM Computing Surveys, vol. 54, no. 6, 2021, pp. 1–35. arXiv, https://arxiv.org/abs/1908.09635.
Google PAIR (People + AI Research). “How to Identify and Fix AI Bias.” Google AI, 2020, https://pair-code.github.io/what-if-tool/ai-bias/.
Lohr, Steve. “Facial Recognition Is Accurate, If You’re a White Guy.” The New York Times, 9 Feb. 2018, www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html.