llama.cpp: don't sleep on --split-mode tensor
In case you missed it, 2-3 weeks ago, experimental tensor-parallelism support was merged into llama.cpp.
In a nutshell, this allows in multi-GPU setups to not only combine the VRAM of the cards but also their computing power. The results depend a lot on the specific setup and model, but on my 3x RTX 2000e Ada rig running Qwen3.6-35b it almost doubled generation throughput (these are low-powered cards which are not very powerful on their own).
The option to turn it on is `--split-mode tensor`.
It's not yet officially documented, I assume because it's still experimental. But since [#22362](https://github.com/ggml-org/llama.cpp/pull/22362) was merged yesterday, in my case it now also work for the latest Qwen3.6 models.
In a nutshell, this allows in multi-GPU setups to not only combine the VRAM of the cards but also their computing power. The results depend a lot on the specific setup and model, but on my 3x RTX 2000e Ada rig running Qwen3.6-35b it almost doubled generation throughput (these are low-powered cards which are not very powerful on their own).
The option to turn it on is `--split-mode tensor`.
It's not yet officially documented, I assume because it's still experimental. But since [#22362](https://github.com/ggml-org/llama.cpp/pull/22362) was merged yesterday, in my case it now also work for the latest Qwen3.6 models.