Search: 成功率の向上、レイテンシの低下、GPUコストの削減を示しています。 - ai.jp.net

Research Paper #Large Language Models (LLMs), Orchestration, Kubernetes 🔬 ResearchAnalyzed: Jan 3, 2026 20:05

Efficient LLM Orchestration Framework

Published:Dec 26, 2025 22:42

•

1 min read

•

ArXiv

Analysis

This paper addresses the practical challenges of self-hosting large language models (LLMs), which is becoming increasingly important for organizations. The proposed framework, Pick and Spin, offers a scalable and economical solution by integrating Kubernetes, adaptive scaling, and a hybrid routing module. The evaluation across multiple models, datasets, and inference strategies demonstrates significant improvements in success rates, latency, and cost compared to static deployments. This is a valuable contribution to the field, providing a practical approach to LLM deployment and management.

Key Takeaways

•Pick and Spin is a practical framework for self-hosted LLM orchestration.
•It uses Kubernetes, adaptive scaling, and hybrid routing.
•Demonstrates improved success rates, lower latency, and reduced GPU cost.
•Evaluated on multiple LLMs and datasets.

Reference

“Pick and Spin achieves up to 21.6% higher success rates, 30% lower latency, and 33% lower GPU cost per query compared with static deployments of the same models.”

Permalink ArXiv

Efficient LLM Orchestration Framework

Analysis

Key Takeaways

📬 Get AI News Delivered

Browse by Category

Trending Topics

📬 Get AI News Delivered

Browse by Category

Trending Topics