Search:
Match:
1 results
Paper#llm🔬 ResearchAnalyzed: Jan 3, 2026 18:49

Improving Mixture-of-Experts with Expert-Router Coupling

Published:Dec 29, 2025 13:03
1 min read
ArXiv

Analysis

This paper addresses a key limitation in Mixture-of-Experts (MoE) models: the misalignment between the router's decisions and the experts' capabilities. The proposed Expert-Router Coupling (ERC) loss offers a computationally efficient method to tightly couple the router and experts, leading to improved performance and providing insights into expert specialization. The fixed computational cost, independent of batch size, is a significant advantage over previous methods.
Reference

The ERC loss enforces two constraints: (1) Each expert must exhibit higher activation for its own proxy token than for the proxy tokens of any other expert. (2) Each proxy token must elicit stronger activation from its corresponding expert than from any other expert.