MSched: Proactive Memory Scheduling for GPU Multitasking
Analysis
This paper addresses the critical memory bottleneck in modern GPUs, particularly with the increasing demands of large-scale tasks like LLMs. It proposes MSched, an OS-level scheduler that proactively manages GPU memory by predicting and preparing working sets. This approach aims to mitigate the performance degradation caused by demand paging, which is a common technique for extending GPU memory but suffers from significant slowdowns due to poor locality. The core innovation lies in leveraging the predictability of GPU memory access patterns to optimize page placement and reduce page fault overhead. The results demonstrate substantial performance improvements over demand paging, making MSched a significant contribution to GPU resource management.
Key Takeaways
- •Addresses the GPU memory bottleneck, especially for large-scale tasks.
- •Proposes MSched, an OS-level scheduler for proactive memory management.
- •Leverages predictability of GPU memory access patterns.
- •Achieves significant performance improvements over demand paging.
- •Focuses on optimizing page placement and reducing page fault overhead.
“MSched outperforms demand paging by up to 11.05x for scientific and deep learning workloads, and 57.88x for LLM under memory oversubscription.”