Localized Uncertainty for Code LLMs
Analysis
Key Takeaways
- •Proposes techniques to localize potentially misaligned code generated by LLMs.
- •Introduces a dataset of "Minimal Intent Aligning Patches" for evaluation.
- •Compares white-box and black-box approaches for uncertainty calibration.
- •Demonstrates that a small supervisor model can effectively estimate edited lines.
- •Discusses generalizability and connections to AI oversight and control.
“Probes with a small supervisor model can achieve low calibration error and Brier Skill Score of approx 0.2 estimating edited lines on code generated by models many orders of magnitude larger.”