would you laugh at me if I ran gemma-4-26b on a 4 core Xeon, with 32GB RAM, no GPU?

I have spent a few days tweaking this setup to attain these results:

| **Model** | **Prompt (tok/s)** | **Generation (tok/s)** |
| --------------------- | ------------------ | ---------------------- |
| `gemma-26b-moe` | 8.9 | 6.4 |
| `qwen3.5-4b-no-think` | 21.5 | 8.4 |

Although modest, It is great for local parsing and analysis of my self-hosted homelab data where sending logs to external APIs is not desirable.

Typical workflows:

- **Log analysis:** Piping `journalctl` output to the API for error triage and root cause hypothesis generation.
- **Configuration synthesis:** Generating AdGuard Home rewrite rules, nginx location blocks, or `fstab` entries based on defined parameters.
- **Troubleshooting constraints:** Querying for failure modes specific to the local topology (e.g., NFS mount failures over a 1 Gbps unmanaged switch, Tailscale DERUP routing behind CGNAT).
- **Alert context:** Correlating Beszel/Uptime Kuma notifications with service-specific knowledge (e.g., "mediabox CPU spike while SabNZBd is extracting").

More from c/localllama