Fine-tuned CodeLlama-34B Beats GPT-4 on HumanEval
Published:Aug 25, 2023 22:08
•1 min read
•Hacker News
Analysis
The article reports on fine-tuning CodeLlama-34B and CodeLlama-34B-Python on a proprietary dataset to achieve higher pass@1 scores on HumanEval compared to GPT-4. The authors emphasize the use of instruction-answer pairs in their dataset, native fine-tuning, and the application of OpenAI's decontamination methodology to ensure result validity. The training process involved DeepSpeed ZeRO 3, Flash Attention 2, and 32 A100-80GB GPUs, completing in three hours. The article highlights a significant achievement in code generation capabilities.
Key Takeaways
- •Fine-tuned CodeLlama models outperform GPT-4 on HumanEval.
- •The models were trained on a proprietary dataset of instruction-answer pairs.
- •OpenAI's decontamination methodology was applied to ensure result validity.
- •Training utilized DeepSpeed ZeRO 3, Flash Attention 2, and 32 A100-80GB GPUs.
Reference
“We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67%.”