Research Paper#Large Language Models (LLMs), Orchestration, Kubernetes🔬 ResearchAnalyzed: Jan 3, 2026 20:05
Efficient LLM Orchestration Framework
Published:Dec 26, 2025 22:42
•1 min read
•ArXiv
Analysis
This paper addresses the practical challenges of self-hosting large language models (LLMs), which is becoming increasingly important for organizations. The proposed framework, Pick and Spin, offers a scalable and economical solution by integrating Kubernetes, adaptive scaling, and a hybrid routing module. The evaluation across multiple models, datasets, and inference strategies demonstrates significant improvements in success rates, latency, and cost compared to static deployments. This is a valuable contribution to the field, providing a practical approach to LLM deployment and management.
Key Takeaways
Reference
“Pick and Spin achieves up to 21.6% higher success rates, 30% lower latency, and 33% lower GPU cost per query compared with static deployments of the same models.”