KernelEvolve: Automated Kernel Optimization for Heterogeneous AI Accelerators
Published:Dec 29, 2025 06:31
•1 min read
•ArXiv
Analysis
This paper addresses the critical challenge of optimizing deep learning recommendation models (DLRM) for diverse hardware architectures. KernelEvolve offers an agentic kernel coding framework that automates kernel generation and optimization, significantly reducing development time and improving performance across various GPUs and custom AI accelerators. The focus on heterogeneous hardware and automated optimization is crucial for scaling AI workloads.
Key Takeaways
- •KernelEvolve automates kernel generation and optimization for DLRM across heterogeneous hardware.
- •The framework uses a graph-based search with a selection policy and fitness function for optimization.
- •It achieves significant performance improvements and reduces development time.
- •KernelEvolve supports various GPUs (NVIDIA, AMD) and Meta's AI accelerators.
Reference
“KernelEvolve reduces development time from weeks to hours and achieves substantial performance improvements over PyTorch baselines.”