Search:
Match:
1 results

Analysis

This article introduces AXIOM, a method for evaluating Large Language Models (LLMs) used as judges for code. It uses rule-based perturbation to create test cases and multisource quality calibration to improve the reliability of the evaluation. The research focuses on the application of LLMs in code evaluation, a critical area for software development and AI-assisted coding.
Reference