LLM-Generated Code Reproducibility Study
Analysis
This paper addresses a critical concern regarding the reliability of AI-generated code. It investigates the reproducibility of code generated by LLMs, a crucial factor for software development. The study's focus on dependency management and the introduction of a three-layer framework provides a valuable methodology for evaluating the practical usability of LLM-generated code. The findings highlight significant challenges in achieving reproducible results, emphasizing the need for improvements in LLM coding agents and dependency handling.
Key Takeaways
- •LLM-generated code often fails to execute reproducibly due to dependency issues.
- •Significant differences in reproducibility exist across programming languages.
- •LLMs frequently miss or mismanage dependencies, leading to hidden dependencies.
- •The study provides a framework for evaluating the reproducibility of LLM-generated code.
“Only 68.3% of projects execute out-of-the-box, with substantial variation across languages (Python 89.2%, Java 44.0%). We also find a 13.5 times average expansion from declared to actual runtime dependencies, revealing significant hidden dependencies.”