LLM-Generated Code Reproducibility Study
Analysis
Key Takeaways
- •LLM-generated code often fails to execute reproducibly due to dependency issues.
- •Significant differences in reproducibility exist across programming languages.
- •LLMs frequently miss or mismanage dependencies, leading to hidden dependencies.
- •The study provides a framework for evaluating the reproducibility of LLM-generated code.
“Only 68.3% of projects execute out-of-the-box, with substantial variation across languages (Python 89.2%, Java 44.0%). We also find a 13.5 times average expansion from declared to actual runtime dependencies, revealing significant hidden dependencies.”