Search:
Match:
1 results
Research#llm🔬 ResearchAnalyzed: Dec 25, 2025 10:16

Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?

Published:Dec 25, 2025 05:00
1 min read
ArXiv NLP

Analysis

This paper explores the feasibility of removing demographic bias from language models without sacrificing their ability to recognize demographic information. The research uses a multi-task evaluation setup and compares attribution-based and correlation-based methods for identifying bias features. The key finding is that targeted feature ablations, particularly using sparse autoencoders in Gemma-2-9B, can reduce bias without significantly degrading recognition performance. However, the study also highlights the importance of dimension-specific interventions, as some debiasing techniques can inadvertently increase bias in other areas. The research suggests that demographic bias stems from task-specific mechanisms rather than inherent demographic markers, paving the way for more precise and effective debiasing strategies.
Reference

demographic bias arises from task-specific mechanisms rather than absolute demographic markers