Run Qwen3.6 MTP GGUFs locally ~1.4–2.2× faster with no accuracy loss and with only 18gb VRAM

8 upvotes 6 comments

The change is a result of [MTP](https://github.com/ggml-org/llama.cpp/pull/22673) support landing in llama.cpp. The Qwen3.6 Unsloth GGUFs are now out of experimental mode, with llama.cpp has merged many PRs, and MTP is now properly supported in Unsloth.

https://unsloth.ai/docs/models/qwen3.6#mtp-guide