Self-Debiasing

Also known as: Model Self-Debiasing, Autonomous Debiasing

A class of techniques where AI systems, particularly large language models, are prompted or configured to identify and reduce their own biased outputs without external model modification or retraining. Self-debiasing approaches include prompting models to reflect on whether their responses contain stereotypes, instructing them to remove bias from their answers, or using counterfactual prompting to expose and correct biased reasoning. In accessibility contexts, self-debiasing is particularly important for mitigating ability bias — stereotypical portrayals of people with disabilities that LLMs may generate due to imbalanced training data. While self-debiasing can significantly improve model outputs, it is not always reliable and may sometimes introduce new errors or overcorrections.

Category: artificial intelligence · AI fairness

Related: Ability Bias · LLM Self-Reflection · Prompt Chaining · Algorithmic Bias · Counterfactual Prompting

Sources

https://dl.acm.org/doi/10.1145/3744257.3744268