Andrej Karpathy on Reinforcement Learning from Verifiable Rewards (RLVR)

Research #llm 📝 Blog|Analyzed: Dec 25, 2025 13:22•

Published: Dec 19, 2025 23:07

•

2 min read

Analysis

This article quotes Andrej Karpathy on the emergence of Reinforcement Learning from Verifiable Rewards (RLVR) as a significant advancement in LLMs. Karpathy suggests that training LLMs with automatically verifiable rewards, particularly in environments like math and code puzzles, leads to the spontaneous development of reasoning-like strategies. These strategies involve breaking down problems into intermediate calculations and employing various problem-solving techniques. The DeepSeek R1 paper is cited as an example. This approach represents a shift towards more verifiable and explainable AI, potentially mitigating issues of "black box" decision-making in LLMs. The focus on verifiable rewards could lead to more robust and reliable AI systems.

Key Takeaways

•RLVR is a promising approach for improving LLM reasoning.
•Verifiable rewards can lead to more explainable AI.
•DeepSeek R1 is an example of successful RLVR implementation.

Reference / Citation

View Original

"In 2025, Reinforcement Learning from Verifiable Rewards (RLVR) emerged as the de facto new major stage to add to this mix. By training LLMs against automatically verifiable rewards across a number of environments (e.g. think math/code puzzles), the LLMs spontaneously develop strategies that look like "reasoning" to humans - they learn to break down problem solving into intermediate calculations and they learn a number of problem solving strategies for going back and forth to figure things out (see DeepSeek R1 paper for examples)."

Simon WillisonDec 19, 2025 23:07

* Cited for critical analysis under Article 32.

Older

Focus on Learning, Not Teaching: A Shift in Educational Perspective

Newer

Sam Rose Explains LLMs with Visual Essay

Related Analysis

Research

Andrej Karpathy on Reinforcement Learning from Verifiable Rewards (RLVR)

Analysis

Key Takeaways

Related Analysis

Human AI Detection

Deep Learning Book Implementation Focus

Personalizing Gemini

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics