Claude Opus 4.5 vs. GPT-5.2 Codex vs. Gemini 3 Pro on real-world coding tasks
Analysis
Key Takeaways
- •Gemini 3 Pro showed the best performance in the coding task, excelling in caching and fallback mechanisms.
- •Claude Opus 4.5 was reliable but had some UI issues.
- •GPT-5.2 Codex was the least dependable.
- •The evaluation focused on real-world feature implementation and practical aspects like cost and time.
- •The study used a real-world Next.js project for evaluation.
“Gemini 3 Pro performed the best. It set up the fallback and cache effectively, with repeated generations returning in milliseconds from the cache. The run cost $0.45, took 7 minutes and 14 seconds, and used about 746K input (including cache reads) + ~11K output.”