LIVE
Loading live headlines…
Home Trending World Technology Entertainment Gaming Sports Music Science Lifestyle Business About Contact
c/artificial_intel by u/yogthos 2d ago huggingface.co

Run Qwen3.6 MTP GGUFs locally ~1.4–2.2× faster with no accuracy loss and with only 18gb VRAM

8 upvotes 6 comments
The change is a result of [MTP](https://github.com/ggml-org/llama.cpp/pull/22673) support landing in llama.cpp. The Qwen3.6 Unsloth GGUFs are now out of experimental mode, with llama.cpp has merged many PRs, and MTP is now properly supported in Unsloth.

https://unsloth.ai/docs/models/qwen3.6#mtp-guide
Visit source Open discussion