Search:
Match:
1 results

Fine-tuned CodeLlama-34B Beats GPT-4 on HumanEval

Published:Aug 25, 2023 22:08
1 min read
Hacker News

Analysis

The article reports on fine-tuning CodeLlama-34B and CodeLlama-34B-Python on a proprietary dataset to achieve higher pass@1 scores on HumanEval compared to GPT-4. The authors emphasize the use of instruction-answer pairs in their dataset, native fine-tuning, and the application of OpenAI's decontamination methodology to ensure result validity. The training process involved DeepSpeed ZeRO 3, Flash Attention 2, and 32 A100-80GB GPUs, completing in three hours. The article highlights a significant achievement in code generation capabilities.
Reference

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67%.