Engineers Reproduce Famous LLM '拠' Bug in Open-Source Gemma Model

research#llm📝 Blog|Analyzed: Apr 7, 2026 20:18
Published: Apr 7, 2026 10:25
1 min read
Zenn LLM

Analysis

This exciting research demystifies a fascinating bug in Large Language Models by successfully reproducing it in Google's open-source Gemma 4. The findings provide invaluable insights into how LLM inference works and offers a clear path for developers to manage similar anomalies.
Reference / Citation
View Original
"The cause is a combination of three elements. The tokenizer cannot compress repetitions of '拠'. While '人人' becomes one token, '拠拠' is not in the vocabulary, causing identical tokens to appear endlessly. This identical token repetition triggers a self-reinforcing loop."
Z
Zenn LLMApr 7, 2026 10:25
* Cited for critical analysis under Article 32.