LIME: Collaborative LLM Inference on Edge Devices

Research Paper#Large Language Models (LLMs), Edge Computing, Inference Optimization🔬 Research|Analyzed: Jan 4, 2026 00:01
Published: Dec 26, 2025 02:41
1 min read
ArXiv

Analysis

This paper addresses the challenge of running large language models (LLMs) on resource-constrained edge devices. It proposes LIME, a collaborative system that uses pipeline parallelism and model offloading to enable lossless inference, meaning it maintains accuracy while improving speed. The focus on edge devices and the use of techniques like fine-grained scheduling and memory adaptation are key contributions. The paper's experimental validation on heterogeneous Nvidia Jetson devices with LLaMA3.3-70B-Instruct is significant, demonstrating substantial speedups over existing methods.
Reference / Citation
View Original
"LIME achieves 1.7x and 3.7x speedups over state-of-the-art baselines under sporadic and bursty request patterns respectively, without compromising model accuracy."
A
ArXivDec 26, 2025 02:41
* Cited for critical analysis under Article 32.