Research#llm📝 BlogAnalyzed: Dec 28, 2025 19:00

Which are the best coding + tooling agent models for vLLM for 128GB memory?

Published:Dec 28, 2025 18:02
1 min read
r/LocalLLaMA

Analysis

This post from r/LocalLLaMA discusses the challenge of finding coding-focused LLMs that fit within a 128GB memory constraint. The user is looking for models around 100B parameters, as there seems to be a gap between smaller (~30B) and larger (~120B+) models. They inquire about the feasibility of using compression techniques like GGUF or AWQ on 120B models to make them fit. The post also raises a fundamental question about whether a model's storage size exceeding available RAM makes it unusable. This highlights the practical limitations of running large language models on consumer-grade hardware and the need for efficient compression and quantization methods. The question is relevant to anyone trying to run LLMs locally for coding tasks.

Reference

Is there anything ~100B and a bit under that performs well?