#1 · Alibaba Cloud
Qwen 2.5 Coder 14B Instruct
One of the most practical local coding-first models for single-GPU workstations.
This model should fit on 12GB with Q4_K_M quantization.
Its main strength aligns with your coding goal.
It supports up to 128K context, covering your 32K target.
Recommended stack: Ollama, llama.cpp, or vLLM.
Best for
Local coding, repo Q&A, patch generation, coding copilots.
Avoid if
You only have sub-10GB VRAM available.