
AIOpenCode
1 min read
Deploying Qwen3-Coder-30B-A3B on 8GB GPU with Docker
A 30B model on an 8GB GPU sounds impossible, but quantization and llama.cpp make it work. This guide shows how to run it with Docker and use it in OpenCode.
Browse all articles tagged with llama.cpp. Found 1 article covering this topic.

A 30B model on an 8GB GPU sounds impossible, but quantization and llama.cpp make it work. This guide shows how to run it with Docker and use it in OpenCode.