Splitwise: Adaptive Edge-Cloud LLM Inference with DRL

Paper #llm 🔬 Research|Analyzed: Jan 3, 2026 16:08•

Published: Dec 29, 2025 08:57

•

1 min read

Analysis

This paper addresses the challenge of deploying large language models (LLMs) on edge devices, balancing latency, energy consumption, and accuracy. It proposes Splitwise, a novel framework using Lyapunov-assisted deep reinforcement learning (DRL) for dynamic partitioning of LLMs across edge and cloud resources. The approach is significant because it offers a more fine-grained and adaptive solution compared to static partitioning methods, especially in environments with fluctuating bandwidth. The use of Lyapunov optimization ensures queue stability and robustness, which is crucial for real-world deployments. The experimental results demonstrate substantial improvements in latency and energy efficiency.