Search: A100-80GB - ai.jp.net

Research #AI Code Generation 👥 CommunityAnalyzed: Jan 3, 2026 06:20

Fine-tuned CodeLlama-34B Beats GPT-4 on HumanEval

Published:Aug 25, 2023 22:08

•

1 min read

•

Hacker News

Analysis

The article reports on fine-tuning CodeLlama-34B and CodeLlama-34B-Python on a proprietary dataset to achieve higher pass@1 scores on HumanEval compared to GPT-4. The authors emphasize the use of instruction-answer pairs in their dataset, native fine-tuning, and the application of OpenAI's decontamination methodology to ensure result validity. The training process involved DeepSpeed ZeRO 3, Flash Attention 2, and 32 A100-80GB GPUs, completing in three hours. The article highlights a significant achievement in code generation capabilities.

Key Takeaways

•Fine-tuned CodeLlama models outperform GPT-4 on HumanEval.
•The models were trained on a proprietary dataset of instruction-answer pairs.
•OpenAI's decontamination methodology was applied to ensure result validity.
•Training utilized DeepSpeed ZeRO 3, Flash Attention 2, and 32 A100-80GB GPUs.

Reference

“We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67%.”

Permalink Hacker News

Fine-tuned CodeLlama-34B Beats GPT-4 on HumanEval

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics