Engineers Reproduce Famous LLM '拠' Bug in Open-Source Gemma Model

research #llm 📝 Blog|Analyzed: Apr 7, 2026 20:18•

Published: Apr 7, 2026 10:25

•

1 min read

Analysis

This exciting research demystifies a fascinating bug in Large Language Models by successfully reproducing it in Google's open-source Gemma 4. The findings provide invaluable insights into how LLM inference works and offers a clear path for developers to manage similar anomalies.

Key Takeaways

•The '拠' bug is not unique to Chinese characters and can be reproduced with most characters, including letters and punctuation, making it a general LLM inference challenge.
•A 'repetition penalty' parameter in LLMs is key to breaking out of these infinite loops, proving the problem is manageable through standard tuning techniques.
•Hallucinations following an escape are not memorized text but fabrications, where the model regenerates patterns like news article formats without recalling specific facts.

Reference / Citation

View Original

"The cause is a combination of three elements. The tokenizer cannot compress repetitions of '拠'. While '人人' becomes one token, '拠拠' is not in the vocabulary, causing identical tokens to appear endlessly. This identical token repetition triggers a self-reinforcing loop."

Zenn LLMApr 7, 2026 10:25

* Cited for critical analysis under Article 32.

Older

AI's Emotional Intelligence: From Single Neurons to Functional Feelings in LLMs

Newer

Building AI Chatbots with Memory: A Thunkable Guide