LIVE
Loading live headlines…
Home Trending World Technology Entertainment Gaming Sports Music Science Lifestyle Business About Contact
c/homelab by u/smiletolerantly 1w ago github.com

hass-closest-intent: Fuzzy intent matcher for HomeAssistant. Garbled STT output in, actual intent out.

9 upvotes 0 comments
cross-posted from: https://awful.systems/post/8238756

> Basically, STT quality has kept me from switching to HomeAssistant's voice assistant features. The default matcher (Hassil) is waaaaaaay to strict, and LLMs are slow, constly, and/or a privacy nightmare, plus I don't like them.
>
> I really thought there would be something available that just matches your STT output to the configured intents, but apparently not, so I've built in myself.
>
> Finally convinced my GF to throw Alexa in the bin :)
>
> Here's an excerpt from the README, and feel free to AMA:
>
> ### 🌲 Problem statement and solution
>
> Speech-To-Text (STT) output, especially fast and local STT output, is often simply *bad*.
> HomeAssistant's own [Hassil](https://github.com/OHF-Voice/hassil) is *incredibly* picky:
> your STT output must match *exactly* to one of the configured intents.
>
> There's two paths forward from this: Upgrade your hardware to support better STT, or
> try to figure out what the speaker *probably meant* to say from the garbled output.
>
> This project does the latter.
>
> With this custom integration, "Lights on in **live in room**" will actually turn on the lights in your **living room**.
> So will, for that matter, "lighrts on inn livainriomm".
>
> Short demo, first with `closest-intent`, then with bare Hassil:
>
> ![demo gif](https://raw.githubusercontent.com/charludo/hass-closest-intent/refs/heads/main/custom_components/closest_intent/brand/demo.gif)
>
>  
>
> ### 📜 Highlights
>
> - Pattern expansion. Expanding `<expansion_rules>`, `(alternatives|to)`, and `[optional|alternatives]` all work, including on HASS-defined lists like your home's areas and entities!
> - Slot extraction. Both for wildcard slots (like for adding something to the shopping list, where the `{item}` is a wildcard), and against slots like `{timer_hours:hours}` with a fixed set of possibilities.
> - Fuzzy slot resolution. For list-like slots and expansion rules (including your areas and entities!), fuzzy match the slot values to the available options. Allows "livikroom" to be corrected to "living room".
> - Actual intent handling still done by Hassil. `closest-intent` simply corrects your STT output or typos to the closest matching intent, and then forwards a nice, canonical sentence to Hassil, who then deals with the intent just like if you had spoken/typed perfectly.
> - 100% LLM-free. Just uses relatively simple fuzzy matching of the input against your intents, plus some clever-ish (well... working, at least) tricks to improve the results.
> - Fallback agent support. OK, I said 100% LLM-free, but if you absolutely want to, you can use one as fallback. More on this below.
> - Is fast :) (as in: basically instant for a couple hundred configured custom intents).
>
> > **Note:** `closest-intent` is completely language-agnostic. All the examples in this `README` are in English, but you can use it with any language you like; personally, I use it in German.
>
> &#160;
>
> ### 📋 Examples
>
> Here's some examples of things I said, what my STT (`wyoming-faster-whisper-base`) understood, what HomeAssistant was able to do/answer after passing the STT output through `closest-intent`, and what the same STT output would have resulted in with just bare Hassil.
>
> > **Note:** These are actual results I got when speaking the "what was said" sentences in my phone.
> > I'm a native German speaker, and so I do have an accent, but this pretty closely matches my experience when using the German-language version of whisper.
> > The "bare Hassil" responses are what I got after 1:1 pasting the STT output into the voice assist chat window with `closest-intent` disabled.
>
> | what was said | STT output | with Closest Intent | bare Hassil |
> | --- | --- | --- | --- |
> | `start cleaning` | `Star cleaning.` | ✅ Cleaning started. | ❌ Sorry, I couldn't understand that |
> | `stop cleaning` | `Stop clenching!` | ✅ Cleaning stopped. | ❌ Sorry, I am not aware of any device called clenching |
> | `vacuum the living room` | `Vacuum Believing Room` | ✅ Cleaning the living room. | ❌ Sorry, I am unaware of any floor called Believing Room |
> | `clean the office` | `King the Office` | ✅ Cleaning the office. | ❌ Sorry, there are multiple devices called Office *(author's note: no there aren't, wtf?)* |
> | `vacuum the kitchen` | `Back here in the kitchen.` | ✅ Cleaning the kitchen. | ❌ Sorry, I couldn't understand that |
> | `how warm is it in the bedroom` | `Our all is in the best room.` | ✅ In the bedroom, the temperature is currently.... | ❌ Sorry, I am not aware of any area called best room |
> | `add milk to the shopping list` | `Add milk to the chauvinist.` | ✅ "milk" added. | ❌ Sorry, I am not aware of any device called chauvinist |
> | `put call dentist on my todo list` | `put call dentist on my tudu list` | ✅ "call dentist" added. | ❌ Sorry, I am not aware of any device called tudu |
> | `turn on the water pump` | `turn on the what her pump` | ✅ Turned on the water pump. | ❌ Sorry, I am not aware of any device called what her pump |
> | `play some music` | `Place on music` | ✅ Playing music. | ❌ Sorry, I am not aware of any area called music |
> | `resume the music` | `Renew Music` | ✅ Resuming. | ❌ Sorry, I couldn't understand that |
> | `pause the music` | `Post music` | ✅ Paused. | ❌ Sorry, I couldn't understand that |
> | `next track` | `next rack` | ✅ Next track. | ❌ Sorry, I am not aware of any device called rack |
> | `enable shuffle` | `an able shuffling` | ✅ Shuffle enabled. | ❌ Sorry, I couldn't understand that |
> | `disable shuffle` | `Disable to schaffen.` | ✅ Shuffle disabled. | ❌ Sorry, I am not aware of any device called Disable |
> | `restart the player` | `Reset the plan.` | ✅ Restarting the player. | ❌ Sorry, I am not aware of any area called Reset |
> | `play a random album` | `Player random album` | ✅ Playing a random album. | ❌ Sorry, I couldn't understand that |
> | `play a random artist` | `Player and Immartist.` | ✅ Playing a random artist. | ❌ Sorry, I couldn't understand that |
> | `play the latest tracks` | `Plan the ladder tracks.` | ✅ Playing recently added tracks. | ❌ Sorry, I am not aware of any area called Plan |
> | `play recently played songs` | `Player recently played so...` | ✅ Playing recently heard tracks. | ❌ Sorry, I couldn't understand that |
> | `play playlist NieR` | `Play playlist NEAR!` | ✅ Playing the playlist NieR. | ❌ Sorry, I couldn't understand that |
> | `play my daily briefing` | `and play my daily breathing` | ✅ Here is your daily briefing: ... | ❌ Sorry, I am not aware of any area called and play |
> | `what time is it` | `What the hell is it?` | ✅ It is 16:36. | ✅ It is 16:36. *(author's note: okay, know what? earned. did not expect that.)* |
> | `what day is it today` | `One day is today.` | ✅ Today is Friday. | ✅/❌ May 8th, 2026 *(author's note: that's the output for "What **date** is it?", but, eh, close enough)* |
> | `make the tv brighter` | `Make that CV brighter.` | ✅ Screen is now bright. | ❌ Sorry, I couldn't understand that |
> | `set the screen darker` | `The screen doctor.` | ✅ Screen is now dark. | ❌ Sorry, I am not aware of any device called screen doctor |
> | `what's the weather today` | `What's the matter with you?` | ✅ Today, the weather is... | ❌ It is 16:36. *(author's note: wait, WHAT?)* |
> | `how's the weather tomorrow morning` | `How's the better tomorrow?` | ✅ Tomorrow morning, it will be... | ❌ Sorry, I am not aware of any area called How's |
> | `what's the weather this week` | `What's the matter this weak` | ✅ Monday:..., Tuesday:..., | ❌ It is 16:36. *(author's note: sigh...)* |
> | `how's the weather at 5 o'clock` | `cast the red there at 5 o'clock` | ✅ At 5 o'clock, it will be... | ❌ Sorry, I am not aware of any area called cast |
> | `how windy is it right now` | `how windy is IR low` | ✅ The wind is currently blowing with... | ❌ No timers. |
> | `how windy will it be tonight` | `How will you be tonight?` | ✅ Tonight, the wind speed will be around... | ❌ Sorry, I couldn't understand that |
> | `how hot will it get today` | `How hard will it get today?` | ✅ Today, temperatures will reach up to... | ❌ Sorry, I couldn't understand that |
> | `will it rain today` | `with it right today` | ✅ No rain is expected today. | ❌ Sorry, I couldn't understand that |
>
> ...you get the idea.
>
> &#160;
>
> ### 💡 How it works
>
> `closest-intent` is registered in HomeAssistant as a conversation agent.
> On startup, it parses (by default) all user-defined intents (or optionally, also the builtins ones). In this process, it also expands all rules, like `<expansion_rule>`, `(alternatives|to)`, and `[optionals]`, and notes where `{slots}` are located, and whether they are wildcards or belong to some list (like areas, entities, or the numbers 1-100).
>
> When a user request comes in (via voice command or the chat box), `closest-intent` fuzzy-matches that request against those expanded rules.
> If the rule does not contain a slot, it is picked immediately.
> If it does contain a slot, `closest-intent` performs a sequence of fancy magic steps to find the best-fitting slot value among a range of possible positions within the top-scoring matched sentences.
> In practice, this often means "smallest slot-value on a word-boundary", but the extraction is not limited to that.
>
> With the best match found, we then reconstruct the "canonical form", i.e. a sentence that Hassil will actually understand.
> If in your configured intents, "Play some music." exists, and `closest-intent` got "Place on music" and matched that to the intent,
> it will simply forward "Play some music." to Hassil. If the intent contained a slot, the extracted value will be substituted.
>
> This guarantees that the sentence passed to Hassil will actually be understood, and allows us to not have to worry at all about performing actions, running scripts,...
>
> If *no* matching intent could be found, we pass the exact input we got to the configured fallback agent.
> By default, that is simply Hassil (which again allows us to be lazy and not worry about proper error responses), or another agent, like a LLM.
Visit source Open discussion