LIVE
Loading live headlines…
Home Trending World Technology Entertainment Gaming Sports Music Science Lifestyle Business About Contact
c/localllama by u/Fmstrat 3w ago

Intel B70: LLama.cpp SYCL vs LLama.cpp OpenVino vs LLM-Scaler

10 upvotes 0 comments
In case anyone is interested, I decided to test out LLama.cpp's new OpenVino backend to see how it compares on Intel GPUs. At first glance, it stomps all over the previous best-case, SYCL, but lags behind LLM-Scaler (Intel's VLLM fork), likely just due to the hardware optimizations against GPTQ/Int4. Interestingly tg512 was fastest on SYCL, but in real world, the prompt processing always seems the be the indicator on this card.

As usual with Intel, model selection is... poor. It took a while to even find a model that was in the validated OpenVino list that would not only run properly, but also have a counterpart that was "close enough" for LLM Scaler.

```
## Llama.cpp OpenVino
llama-benchy http://localhost:8000/v1 bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M

| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) |
|:---------------------------------------------------|-------:|-----------------:|-------------:|---------------:|---------------:|----------------:|
| bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M | pp2048 | 3845.61 ± 524.73 | | 659.99 ± 56.95 | 489.07 ± 56.95 | 739.42 ± 56.84 |
| bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M | tg512 | 40.89 ± 0.55 | 44.33 ± 1.25 | | | |

## Llama.cpp SYCL
llama-benchy http://localhost:8000/v1 bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M

| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) |
|:---------------------------------------------------|-------:|---------------:|-------------:|----------------:|----------------:|----------------:|
| bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M | pp2048 | 844.64 ± 19.25 | | 2199.90 ± 23.63 | 2178.96 ± 23.63 | 2229.67 ± 24.84 |
| bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M | tg512 | 73.87 ± 1.17 | 78.00 ± 2.16 | | | |

## LLM-Scaler
llama-benchy http://localhost:8000/v1 jakiAJK/DeepSeek-R1-Distill-Llama-8B_GPTQ-int4

| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) |
|:--------------------------------------------------|-------:|-----------------:|-------------:|---------------:|---------------:|----------------:|
| jakiAJK/DeepSeek-R1-Distill-Llama-8B_GPTQ-int4 | pp2048 | 7875.52 ± 642.20 | | 268.09 ± 20.50 | 240.11 ± 20.50 | 268.34 ± 20.45 |
| jakiAJK/DeepSeek-R1-Distill-Llama-8B_GPTQ-int4 | tg512 | 52.75 ± 0.10 | 54.00 ± 0.00 | | | |
```
Open discussion