Efficient LLM Orchestration Framework

Published:Dec 26, 2025 22:42
1 min read
ArXiv

Analysis

This paper addresses the practical challenges of self-hosting large language models (LLMs), which is becoming increasingly important for organizations. The proposed framework, Pick and Spin, offers a scalable and economical solution by integrating Kubernetes, adaptive scaling, and a hybrid routing module. The evaluation across multiple models, datasets, and inference strategies demonstrates significant improvements in success rates, latency, and cost compared to static deployments. This is a valuable contribution to the field, providing a practical approach to LLM deployment and management.

Reference

Pick and Spin achieves up to 21.6% higher success rates, 30% lower latency, and 33% lower GPU cost per query compared with static deployments of the same models.