MSched: Proactive Memory Scheduling for GPU Multitasking

Published:Dec 31, 2025 05:18
1 min read
ArXiv

Analysis

This paper addresses the critical memory bottleneck in modern GPUs, particularly with the increasing demands of large-scale tasks like LLMs. It proposes MSched, an OS-level scheduler that proactively manages GPU memory by predicting and preparing working sets. This approach aims to mitigate the performance degradation caused by demand paging, which is a common technique for extending GPU memory but suffers from significant slowdowns due to poor locality. The core innovation lies in leveraging the predictability of GPU memory access patterns to optimize page placement and reduce page fault overhead. The results demonstrate substantial performance improvements over demand paging, making MSched a significant contribution to GPU resource management.

Reference

MSched outperforms demand paging by up to 11.05x for scientific and deep learning workloads, and 57.88x for LLM under memory oversubscription.