Splitwise: Adaptive Edge-Cloud LLM Inference with DRL

Paper#llm🔬 Research|Analyzed: Jan 3, 2026 16:08
Published: Dec 29, 2025 08:57
1 min read
ArXiv

Analysis

This paper addresses the challenge of deploying large language models (LLMs) on edge devices, balancing latency, energy consumption, and accuracy. It proposes Splitwise, a novel framework using Lyapunov-assisted deep reinforcement learning (DRL) for dynamic partitioning of LLMs across edge and cloud resources. The approach is significant because it offers a more fine-grained and adaptive solution compared to static partitioning methods, especially in environments with fluctuating bandwidth. The use of Lyapunov optimization ensures queue stability and robustness, which is crucial for real-world deployments. The experimental results demonstrate substantial improvements in latency and energy efficiency.
Reference / Citation
View Original
"Splitwise reduces end-to-end latency by 1.4x-2.8x and cuts energy consumption by up to 41% compared with existing partitioners."
A
ArXivDec 29, 2025 08:57
* Cited for critical analysis under Article 32.